[GitHub] [beam] pawelpasterz commented on a change in pull request #10882: Implement java precommit dataflow examples tests to run on java 11

2020-03-02 Thread GitBox
pawelpasterz commented on a change in pull request #10882: Implement java 
precommit dataflow examples tests to run on java 11
URL: https://github.com/apache/beam/pull/10882#discussion_r386849053
 
 

 ##
 File path: 
.test-infra/jenkins/job_PreCommit_Java_Examples_Dataflow_Java11.groovy
 ##
 @@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import PrecommitJobBuilder
+import CommonJobProperties as commonJobProperties
+
+PrecommitJobBuilder builder = new PrecommitJobBuilder(
+scope: this,
+nameBase: 'Java_Examples_Dataflow_Java11',
+gradleTask: ':clean', // Do nothing here
+gradleSwitches: ['-PdisableSpotlessCheck=true'], // spotless checked 
in separate pre-commit
+triggerPathPatterns: [
+'^model/.*$',
+'^sdks/java/.*$',
+'^runners/google-cloud-dataflow-java/.*$',
+'^examples/java/.*$',
+'^examples/kotlin/.*$',
+'^release/.*$',
+],
+timeoutMins: 30,
+)
+builder.build {
+publishers {
+archiveJunit('**/build/test-results/**/*.xml')
+}
+
+steps {
+gradle {
+rootBuildScriptDir(commonJobProperties.checkoutDir)
+tasks 'javaExamplesDataflowPrecommit'
+switches 
"-Dorg.gradle.java.home=${commonJobProperties.JAVA_8_HOME}"
+switches '-PdisableSpotlessCheck=true'
+switches '-x runLegacyWorkerPreCommitTest'
+switches '-x runApplianceTest'
+switches '-x runWindmillTest'
+commonJobProperties.setGradleSwitches(delegate)
+}
+
+gradle {
+rootBuildScriptDir(commonJobProperties.checkoutDir)
+tasks 
':runners:google-cloud-dataflow-java:examples:runLegacyWorkerPreCommitTest'
+tasks 
':runners:google-cloud-dataflow-java:examples-streaming:runApplianceTest'
+tasks 
':runners:google-cloud-dataflow-java:examples-streaming:runWindmillTest'
+switches '-x shadowJar'
 
 Review comment:
   @kennknowles thanks for all comments, first I thought I will answer to all 
your comments but then I decided to gather it all in just one. I can see there 
is lot of knowledge that is missing to me right now, can we contact outside 
this pr scope? (ex email?) I would very appreciate if you will share some 
knowledge with me. What do you think? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chadrik commented on issue #11022: [BEAM-7746] Resolve typing issues in filesystem

2020-03-02 Thread GitBox
chadrik commented on issue #11022: [BEAM-7746] Resolve typing issues in 
filesystem
URL: https://github.com/apache/beam/pull/11022#issuecomment-593803091
 
 
   R: @robertwb 
   R: @udim 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chadrik commented on a change in pull request #11022: [BEAM-7746] Resolve typing issues in filesystem

2020-03-02 Thread GitBox
chadrik commented on a change in pull request #11022: [BEAM-7746] Resolve 
typing issues in filesystem
URL: https://github.com/apache/beam/pull/11022#discussion_r386809980
 
 

 ##
 File path: sdks/python/apache_beam/io/localfilesystem.py
 ##
 @@ -136,10 +137,17 @@ def _path_open(
   mode,
   mime_type='application/octet-stream',
   compression_type=CompressionTypes.AUTO):
+# type: (...) -> BinaryIO
+
 """Helper functions to open a file in the provided mode.
 """
 compression_type = FileSystem._get_compression_type(path, compression_type)
-raw_file = open(path, mode)
+if mode == 'r':
+  mode = 'rb'
+elif mode == 'w':
+  mode = 'wb'
 
 Review comment:
   I saw this pattern in other filesystems so I thought I'd add it here to make 
it completely obvious that the subsequent cast to `BinaryIO` (from `IO[Any]`) 
is safe and warranted. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] stale[bot] commented on issue #9899: [BEAM-8511] [WIP] KinesisIO.Read enhanced fanout

2020-03-02 Thread GitBox
stale[bot] commented on issue #9899: [BEAM-8511] [WIP] KinesisIO.Read enhanced 
fanout
URL: https://github.com/apache/beam/pull/9899#issuecomment-593796235
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] Hannah-Jiang commented on issue #11023: [BEAM-9413]Fix post commit failure caused by docker migration

2020-03-02 Thread GitBox
Hannah-Jiang commented on issue #11023: [BEAM-9413]Fix post commit failure 
caused by docker migration
URL: https://github.com/apache/beam/pull/11023#issuecomment-593789141
 
 
   R: @tvalentyn 
   Cc: @amaliujia , is it possible to cherrypick this PR to 2.20?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] Hannah-Jiang commented on issue #11023: [BEAM-9413]Fix post commit failure caused by docker migration

2020-03-02 Thread GitBox
Hannah-Jiang commented on issue #11023: [BEAM-9413]Fix post commit failure 
caused by docker migration
URL: https://github.com/apache/beam/pull/11023#issuecomment-593788560
 
 
   Run Python Dataflow ValidatesContainer


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] Hannah-Jiang opened a new pull request #11023: [BEAM-9413]Fix post commit failure caused by docker migration

2020-03-02 Thread GitBox
Hannah-Jiang opened a new pull request #11023: [BEAM-9413]Fix post commit 
failure caused by docker migration
URL: https://github.com/apache/beam/pull/11023
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[GitHub] [beam] chadrik commented on a change in pull request #11022: [BEAM-7746] Resolve typing issues in filesystem

2020-03-02 Thread GitBox
chadrik commented on a change in pull request #11022: [BEAM-7746] Resolve 
typing issues in filesystem
URL: https://github.com/apache/beam/pull/11022#discussion_r386809980
 
 

 ##
 File path: sdks/python/apache_beam/io/localfilesystem.py
 ##
 @@ -136,10 +137,17 @@ def _path_open(
   mode,
   mime_type='application/octet-stream',
   compression_type=CompressionTypes.AUTO):
+# type: (...) -> BinaryIO
+
 """Helper functions to open a file in the provided mode.
 """
 compression_type = FileSystem._get_compression_type(path, compression_type)
-raw_file = open(path, mode)
+if mode == 'r':
+  mode = 'rb'
+elif mode == 'w':
+  mode = 'wb'
 
 Review comment:
   I saw this pattern another filesystem so I thought I'd add it here to make 
it completely obvious that the subsequent cast to `BinaryIO` (from I`O[Any]`) 
is safe and warranted. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chadrik commented on a change in pull request #11022: [BEAM-7746] Resolve typing issues in filesystem

2020-03-02 Thread GitBox
chadrik commented on a change in pull request #11022: [BEAM-7746] Resolve 
typing issues in filesystem
URL: https://github.com/apache/beam/pull/11022#discussion_r386809980
 
 

 ##
 File path: sdks/python/apache_beam/io/localfilesystem.py
 ##
 @@ -136,10 +137,17 @@ def _path_open(
   mode,
   mime_type='application/octet-stream',
   compression_type=CompressionTypes.AUTO):
+# type: (...) -> BinaryIO
+
 """Helper functions to open a file in the provided mode.
 """
 compression_type = FileSystem._get_compression_type(path, compression_type)
-raw_file = open(path, mode)
+if mode == 'r':
+  mode = 'rb'
+elif mode == 'w':
+  mode = 'wb'
 
 Review comment:
   I saw this pattern in another filesystem so I thought I'd add it here to 
make it completely obvious that the subsequent cast to `BinaryIO` (from 
`IO[Any]`) is safe and warranted. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chadrik commented on a change in pull request #11022: [BEAM-7746] Resolve typing issues in filesystem

2020-03-02 Thread GitBox
chadrik commented on a change in pull request #11022: [BEAM-7746] Resolve 
typing issues in filesystem
URL: https://github.com/apache/beam/pull/11022#discussion_r386809980
 
 

 ##
 File path: sdks/python/apache_beam/io/localfilesystem.py
 ##
 @@ -136,10 +137,17 @@ def _path_open(
   mode,
   mime_type='application/octet-stream',
   compression_type=CompressionTypes.AUTO):
+# type: (...) -> BinaryIO
+
 """Helper functions to open a file in the provided mode.
 """
 compression_type = FileSystem._get_compression_type(path, compression_type)
-raw_file = open(path, mode)
+if mode == 'r':
+  mode = 'rb'
+elif mode == 'w':
+  mode = 'wb'
 
 Review comment:
   I saw this pattern in another filesystem so I thought I'd add it here to 
make it completely obvious that the subsequent cast to `BinaryIO` (from 
I`O[Any]`) is safe and warranted. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chadrik opened a new pull request #11022: [BEAM-7746] Resolve typing issues in filesystem

2020-03-02 Thread GitBox
chadrik opened a new pull request #11022: [BEAM-7746] Resolve typing issues in 
filesystem
URL: https://github.com/apache/beam/pull/11022
 
 
   In order to properly type some of the complexities around the `filesystem` 
API I had to address 2 main problems:
   
   - properly type `CompressedFile`
 - add `Compressor` and `Decompressor` protocols
 - make it a `BinaryIO` subclass.  this is necessary to guarantee that all 
the file-like things that are floating around these modules actually present 
the same interface.  unfortunately, this is not a protocol, so it must be 
subclassed.  this required a few breaking changes:
   - `writeable` renamed to `writable`
   - `closed` method changed to a property
   - `seekable` property changed a method
   - `read` arg changed name
   - `write` arg changed name
   -  add  a new `create_buffered()` classmethod to `DownloaderStream` and 
`UploaderStream`
  - simplifies the creation of buffered wrappers to the classes
  - this was motivated primarily by the desire to consolidate this pattern 
into one place so that I could handle the unfortunate need to `cast` these 
types to `BinaryIO`.   see notes on `DownloaderStream.create_buffered()` for 
why this was necessary.
   
   --- 
   
   This PR makes one significant runtime change that must be vetted.
   
   `s3io.S3IO.open()` originally created a buffered `DownloaderStream` like 
this:
   
   ```python
 return io.BufferedReader(
 DownloaderStream(downloader, mode=mode), 
 buffer_size=read_buffer_size)
   ```
   
   All other IO clients pass `read_buffer_size` to `DownloaderStream`, like 
this example from `gcsio`:
   
   ```python
 return io.BufferedReader(
 DownloaderStream(downloader, read_buffer_size=read_buffer_size, 
mode=mode),
 buffer_size=read_buffer_size)
   ```
   
   I was hoping to standardize on that behavior in order to simplify 
`create_buffered`, but I don't know whether this difference was intentional or 
not, or what the impact of changing it would be. 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 

[GitHub] [beam] robertwb commented on issue #11021: Remove excessive logging.

2020-03-02 Thread GitBox
robertwb commented on issue #11021: Remove excessive logging.
URL: https://github.com/apache/beam/pull/11021#issuecomment-593764230
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on issue #11021: Remove excessive logging.

2020-03-02 Thread GitBox
robertwb commented on issue #11021: Remove excessive logging.
URL: https://github.com/apache/beam/pull/11021#issuecomment-593764204
 
 
   Run PythonLint PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on a change in pull request #10915: [BEAM-8335] Add PCollection to DataFrame logic for InteractiveRunner.

2020-03-02 Thread GitBox
robertwb commented on a change in pull request #10915: [BEAM-8335] Add 
PCollection to DataFrame logic for InteractiveRunner.
URL: https://github.com/apache/beam/pull/10915#discussion_r386794290
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/utils.py
 ##
 @@ -0,0 +1,113 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Utilities to be used in  Interactive Beam.
 
 Review comment:
   Sorry, I missed this before merging. I think the extra space was 
"in[space][space]Interactive", not the whitespace that yapf inserted. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chadrik commented on issue #11007: [BEAM-7746] Support Timestamp/Duration equality testing with arbitrary objects

2020-03-02 Thread GitBox
chadrik commented on issue #11007: [BEAM-7746] Support Timestamp/Duration 
equality testing with arbitrary objects
URL: https://github.com/apache/beam/pull/11007#issuecomment-593760530
 
 
   R: @robertwb 
   R: @udim 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] acrites commented on issue #10988: [BEAM-9382] Clean up of TestStreamTranscriptTests.

2020-03-02 Thread GitBox
acrites commented on issue #10988: [BEAM-9382] Clean up of 
TestStreamTranscriptTests.
URL: https://github.com/apache/beam/pull/10988#issuecomment-593753721
 
 
   Along the lines of what Kenn is saying, I had originally thought that these 
tests weren't really testing whether or not PaneInfo.final gets set correctly 
in the various triggering strategies, but more so that it was outputting the 
correct elements. Maybe we want to test all aspects in a single test though.
   
   I'm fine with sickbaying the tests for Python direct runner if you're ok 
with having a large number of tests inactive until we getting around to fixing 
direct runner. Maybe that'll motivate us to get it fixed faster...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] lukecwik commented on issue #10897: [BEAM-2939] Java UnboundedSource SDF wrapper

2020-03-02 Thread GitBox
lukecwik commented on issue #10897: [BEAM-2939] Java UnboundedSource SDF wrapper
URL: https://github.com/apache/beam/pull/10897#issuecomment-593745452
 
 
   Tested internally within Google for UW and it passed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] lukecwik merged pull request #10897: [BEAM-2939] Java UnboundedSource SDF wrapper

2020-03-02 Thread GitBox
lukecwik merged pull request #10897: [BEAM-2939] Java UnboundedSource SDF 
wrapper
URL: https://github.com/apache/beam/pull/10897
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[beam] branch master updated (a29fdff -> bd7755f)

2020-03-02 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from a29fdff  [BEAM-8335] Add PCollection to DataFrame logic for 
InteractiveRunner. (#10915)
 add bd7755f  [BEAM-2939] Java UnboundedSource SDF wrapper (#10897)

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/beam/sdk/io/Read.java | 416 +
 .../apache/beam/fn/harness/FnApiDoFnRunner.java|  45 ++-
 .../beam/fn/harness/FnApiDoFnRunnerTest.java   |  25 +-
 3 files changed, 471 insertions(+), 15 deletions(-)



[GitHub] [beam] kennknowles commented on issue #10988: [BEAM-9382] Clean up of TestStreamTranscriptTests.

2020-03-02 Thread GitBox
kennknowles commented on issue #10988: [BEAM-9382] Clean up of 
TestStreamTranscriptTests.
URL: https://github.com/apache/beam/pull/10988#issuecomment-593740435
 
 
   Since we are doing drive by comments, perhaps a clear comment describing 
exactly what the test is trying to verify would allow us to be sure to have all 
the necessary verifications but no extraneous info.
   
   Also I assume you mean the Python direct runner (each language has a direct 
runner for that language) since I think the Java direct runner does support 
pane info.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] stale[bot] commented on issue #9942: [BEAM-3288] Do not drop data after trigger finishes

2020-03-02 Thread GitBox
stale[bot] commented on issue #9942: [BEAM-3288] Do not drop data after trigger 
finishes
URL: https://github.com/apache/beam/pull/9942#issuecomment-593739632
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chamikaramj commented on issue #10951: [BEAM-8575] Modified the test to work for different runners.

2020-03-02 Thread GitBox
chamikaramj commented on issue #10951: [BEAM-8575] Modified the test to work 
for different runners.
URL: https://github.com/apache/beam/pull/10951#issuecomment-593731682
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] udim commented on issue #10915: [BEAM-8335] Add PCollection to DataFrame logic for InteractiveRunner.

2020-03-02 Thread GitBox
udim commented on issue #10915: [BEAM-8335] Add PCollection to DataFrame logic 
for InteractiveRunner.
URL: https://github.com/apache/beam/pull/10915#issuecomment-593731439
 
 
   I believe this PR broke precommits. Please take care to run tests on Jenkins 
before merging.
   
   ```
   > Task :sdks:python:test-suites:tox:pycommon:docs
   ...
   
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/apache_beam/runners/interactive/utils.py:docstring
 of apache_beam.runners.interactive.utils.elements_to_df:3: WARNING: Inline 
interpreted text or phrase reference start-string without end-string.
   ```
   https://builds.apache.org/job/beam_PreCommit_Python_Cron/2463/
   
   Please fix or revert ASAP.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chunyang commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK

2020-03-02 Thread GitBox
chunyang commented on issue #10979: [BEAM-8841] Support writing data to 
BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593729608
 
 
   I am able to run the integration test 
`apache_beam.io.gcp.bigquery_file_loads_test:BigQueryFileLoadsIT` but for some 
reason if I use the same procedure to run tests from 
`apache_beam.io.gcp.bigquery_write_it_test.py:BigQueryWriteIntegrationTests`, I 
get the following error:
   
   ```
   [chuck.yang ~/src/beam/sdks/python cyang/avro-bigqueryio+]
   % ./scripts/run_integration_test.sh --test_opts 
"--tests=apache_beam.io.gcp.bigquery_write_it_test.py:BigQueryWriteIntegrationTests.test_big_query_write_without_schema
 --nocapture" --project ... --gcs_location gs://... --kms_key_name "" 
--streaming false
   >>> RUNNING integration tests with pipeline options: 
--runner=TestDataflowRunner --project=... --staging_location=gs://... 
--temp_location=gs://... --output=gs://... 
--sdk_location=build/apache-beam.tar.gz 
--requirements_file=postcommit_requirements.txt --num_workers=1 --sleep_secs=20
   >>>   test options: 
--tests=apache_beam.io.gcp.bigquery_write_it_test.py:BigQueryWriteIntegrationTests.test_big_query_write_without_schema
 --nocapture
   
/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/setuptools/dist.py:476:
 UserWarning: Normalizing '2.21.0.dev' to '2.21.0.dev0'
 normalized_version,
   running nosetests
   running egg_info
   INFO:gen_protos:Skipping proto regeneration: all files up to date
   writing apache_beam.egg-info/PKG-INFO
   writing dependency_links to apache_beam.egg-info/dependency_links.txt
   writing entry points to apache_beam.egg-info/entry_points.txt
   writing requirements to apache_beam.egg-info/requires.txt
   writing top-level names to apache_beam.egg-info/top_level.txt
   reading manifest file 'apache_beam.egg-info/SOURCES.txt'
   reading manifest template 'MANIFEST.in'
   warning: no files found matching 'README.md'
   warning: no files found matching 'NOTICE'
   warning: no files found matching 'LICENSE'
   writing manifest file 'apache_beam.egg-info/SOURCES.txt'
   Failure: ImportError (No module named 'apache_beam') ... ERROR
   
   ==
   ERROR: Failure: ImportError (No module named 'apache_beam')
   --
   Traceback (most recent call last):
 File 
"/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/nose/failure.py",
 line 39, in runTest
   raise self.exc_val.with_traceback(self.tb)
 File 
"/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/nose/loader.py",
 line 418, in loadTestsFromName
   addr.filename, addr.module)
 File 
"/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/nose/importer.py",
 line 47, in importFromPath
   return self.importFromDir(dir_path, fqname)
 File 
"/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/nose/importer.py",
 line 79, in importFromDir
   fh, filename, desc = find_module(part, path)
 File "/usr/lib/python3.7/imp.py", line 296, in find_module
   raise ImportError(_ERR_MSG.format(name), name=name)
   ImportError: No module named 'apache_beam'
   
   --
   XML: nosetests-.xml
   --
   XML: /home/chuck.yang/src/beam/sdks/python/nosetests.xml
   --
   Ran 1 test in 0.002s
   
   FAILED (errors=1)
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] bumblebee-coming commented on issue #10951: [BEAM-8575] Modified the test to work for different runners.

2020-03-02 Thread GitBox
bumblebee-coming commented on issue #10951: [BEAM-8575] Modified the test to 
work for different runners.
URL: https://github.com/apache/beam/pull/10951#issuecomment-593729091
 
 
   I first added a new matcher to test there are any number of 15s. Later I 
realized I can partition the PCollection into three (<15, ==15, >15) and use 
the existing match is_empty and is_not_empty to verify that PCollections (==15, 
>15).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chunyang commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK

2020-03-02 Thread GitBox
chunyang commented on issue #10979: [BEAM-8841] Support writing data to 
BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593722179
 
 
   I think I need to fix a few of the integration tests that don't provide a 
schema or use `SCHEMA_AUTODETECT`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chadrik commented on issue #10810: [BEAM-9274] Support running yapf in a git pre-commit hook

2020-03-02 Thread GitBox
chadrik commented on issue #10810: [BEAM-9274] Support running yapf in a git 
pre-commit hook
URL: https://github.com/apache/beam/pull/10810#issuecomment-593720763
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chamikaramj commented on issue #11018: [BEAM-9415] fix postcommit xvr tests

2020-03-02 Thread GitBox
chamikaramj commented on issue #11018: [BEAM-9415] fix postcommit xvr tests
URL: https://github.com/apache/beam/pull/11018#issuecomment-593720543
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] udim commented on issue #11016: Reduce warnings in pytest runs.

2020-03-02 Thread GitBox
udim commented on issue #11016: Reduce warnings in pytest runs.
URL: https://github.com/apache/beam/pull/11016#issuecomment-593719479
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] kennknowles commented on a change in pull request #10949: [BEAM-9371] Add SideInputLoadTest to Java SDK

2020-03-02 Thread GitBox
kennknowles commented on a change in pull request #10949: [BEAM-9371] Add 
SideInputLoadTest to Java SDK
URL: https://github.com/apache/beam/pull/10949#discussion_r386745270
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SideInputLoadTest.java
 ##
 @@ -0,0 +1,243 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.loadtests;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Random;
+import org.apache.beam.sdk.io.synthetic.SyntheticStep;
+import org.apache.beam.sdk.options.Default;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.Validation;
+import org.apache.beam.sdk.testutils.metrics.ByteMonitor;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.View;
+import org.apache.beam.sdk.transforms.windowing.FixedWindows;
+import org.apache.beam.sdk.transforms.windowing.Window;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+import org.joda.time.Duration;
+import org.joda.time.Instant;
+
+/**
+ * Load test for operations involving side inputs.
+ *
+ * The purpose of this test is to measure cost of materialization or lookup 
of side inputs. It
+ * uses synthetic sources and {@link SyntheticStep} which can be parametrized 
to generate records
+ * with various sizes of keys and values, impose delays in the pipeline and 
simulate other
+ * performance challenges.
+ *
+ * To run the test manually, use the following command:
+ *
+ * 
+ *   ./gradlew :sdks:java:testing:load-tests:run -PloadTest.args='
+ *--sourceOptions={"numRecords":2000, ...}
+ *--sideInputType=ITERABLE
+ *--accessPercentage=1
+ *--windowCount=200
+ * 
+ */
+public class SideInputLoadTest extends LoadTest {
+
+  private static final String METRICS_NAMESPACE = "sideinput";
+  private static final Instant TIME = new Instant();
+
+  public SideInputLoadTest(String[] args) throws IOException {
+super(args, Options.class, METRICS_NAMESPACE);
+  }
+
+  @Override
+  void loadTest() throws IOException {
+Optional syntheticStep = 
createStep(options.getStepOptions());
+PCollection> input =
+pipeline
+.apply(readFromSource(sourceOptions))
+.apply(ParDo.of(new AddTimestamps()))
+.apply("Collect start time metrics", ParDo.of(runtimeMonitor))
+.apply(ParDo.of(new ByteMonitor(METRICS_NAMESPACE, 
"totalBytes.count")));
+
+performTestWithSideInput(
+input, 
SideInputMaterializationType.valueOf(options.getSideInputType()), 
syntheticStep);
+  }
+
+  private void performTestWithSideInput(
+  PCollection> input,
+  SideInputMaterializationType sideInputType,
+  Optional syntheticStep) {
+switch (sideInputType) {
+  case ITERABLE:
+performTestWithIterable(input, syntheticStep);
+break;
+  case HASHMAP:
+performTestWithHashMap(input, syntheticStep);
+break;
+  case LIST:
+performTestWithList(input, syntheticStep);
+break;
+}
+  }
+
+  private void performTestWithList(
+  PCollection> input, Optional 
syntheticStep) {
+applyStepIfPresent(input, "Synthetic step", syntheticStep);
+PCollectionView>> sideInput = 
input.apply(View.asList());
+input
+.apply(ParDo.of(new 
SideInputTestWithList(sideInput)).withSideInputs(sideInput))
+.apply("Collect end time metrics", ParDo.of(runtimeMonitor));
+  }
+
+  private void performTestWithHashMap(
+  PCollection> input, Optional 
syntheticStep) {
+applyStepIfPresent(input, "Synthetic step", syntheticStep);
+PCollectionView> sideInput = input.apply(View.asMap());
+input
+.apply(ParDo.of(new 
SideInputTestWithHashMap(sideInput)).withSideInputs(sideInput))
+.apply("Collect end time metrics", ParDo.of(runtimeMonitor));
+  }
+
+  private void performTestWithIterable(
+  PCollection> input, Optional 
syntheticStep) {
+applyStepIfPresent(input, 

[GitHub] [beam] kennknowles commented on a change in pull request #10949: [BEAM-9371] Add SideInputLoadTest to Java SDK

2020-03-02 Thread GitBox
kennknowles commented on a change in pull request #10949: [BEAM-9371] Add 
SideInputLoadTest to Java SDK
URL: https://github.com/apache/beam/pull/10949#discussion_r386744848
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SideInputLoadTest.java
 ##
 @@ -0,0 +1,243 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.loadtests;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Random;
+import org.apache.beam.sdk.io.synthetic.SyntheticStep;
+import org.apache.beam.sdk.options.Default;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.Validation;
+import org.apache.beam.sdk.testutils.metrics.ByteMonitor;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.View;
+import org.apache.beam.sdk.transforms.windowing.FixedWindows;
+import org.apache.beam.sdk.transforms.windowing.Window;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+import org.joda.time.Duration;
+import org.joda.time.Instant;
+
+/**
+ * Load test for operations involving side inputs.
+ *
+ * The purpose of this test is to measure cost of materialization or lookup 
of side inputs. It
+ * uses synthetic sources and {@link SyntheticStep} which can be parametrized 
to generate records
+ * with various sizes of keys and values, impose delays in the pipeline and 
simulate other
+ * performance challenges.
+ *
+ * To run the test manually, use the following command:
+ *
+ * 
+ *   ./gradlew :sdks:java:testing:load-tests:run -PloadTest.args='
+ *--sourceOptions={"numRecords":2000, ...}
+ *--sideInputType=ITERABLE
+ *--accessPercentage=1
+ *--windowCount=200
+ * 
+ */
+public class SideInputLoadTest extends LoadTest {
+
+  private static final String METRICS_NAMESPACE = "sideinput";
+  private static final Instant TIME = new Instant();
+
+  public SideInputLoadTest(String[] args) throws IOException {
+super(args, Options.class, METRICS_NAMESPACE);
+  }
+
+  @Override
+  void loadTest() throws IOException {
+Optional syntheticStep = 
createStep(options.getStepOptions());
+PCollection> input =
+pipeline
+.apply(readFromSource(sourceOptions))
+.apply(ParDo.of(new AddTimestamps()))
+.apply("Collect start time metrics", ParDo.of(runtimeMonitor))
+.apply(ParDo.of(new ByteMonitor(METRICS_NAMESPACE, 
"totalBytes.count")));
+
+performTestWithSideInput(
+input, 
SideInputMaterializationType.valueOf(options.getSideInputType()), 
syntheticStep);
+  }
+
+  private void performTestWithSideInput(
+  PCollection> input,
+  SideInputMaterializationType sideInputType,
+  Optional syntheticStep) {
+switch (sideInputType) {
+  case ITERABLE:
+performTestWithIterable(input, syntheticStep);
+break;
+  case HASHMAP:
+performTestWithHashMap(input, syntheticStep);
+break;
+  case LIST:
+performTestWithList(input, syntheticStep);
+break;
+}
+  }
+
+  private void performTestWithList(
+  PCollection> input, Optional 
syntheticStep) {
+applyStepIfPresent(input, "Synthetic step", syntheticStep);
+PCollectionView>> sideInput = 
input.apply(View.asList());
+input
+.apply(ParDo.of(new 
SideInputTestWithList(sideInput)).withSideInputs(sideInput))
+.apply("Collect end time metrics", ParDo.of(runtimeMonitor));
+  }
+
+  private void performTestWithHashMap(
+  PCollection> input, Optional 
syntheticStep) {
+applyStepIfPresent(input, "Synthetic step", syntheticStep);
+PCollectionView> sideInput = input.apply(View.asMap());
+input
+.apply(ParDo.of(new 
SideInputTestWithHashMap(sideInput)).withSideInputs(sideInput))
+.apply("Collect end time metrics", ParDo.of(runtimeMonitor));
+  }
+
+  private void performTestWithIterable(
+  PCollection> input, Optional 
syntheticStep) {
+applyStepIfPresent(input, 

[GitHub] [beam] kennknowles commented on a change in pull request #10949: [BEAM-9371] Add SideInputLoadTest to Java SDK

2020-03-02 Thread GitBox
kennknowles commented on a change in pull request #10949: [BEAM-9371] Add 
SideInputLoadTest to Java SDK
URL: https://github.com/apache/beam/pull/10949#discussion_r386743300
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SideInputLoadTest.java
 ##
 @@ -0,0 +1,243 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.loadtests;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Random;
+import org.apache.beam.sdk.io.synthetic.SyntheticStep;
+import org.apache.beam.sdk.options.Default;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.Validation;
+import org.apache.beam.sdk.testutils.metrics.ByteMonitor;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.View;
+import org.apache.beam.sdk.transforms.windowing.FixedWindows;
+import org.apache.beam.sdk.transforms.windowing.Window;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+import org.joda.time.Duration;
+import org.joda.time.Instant;
+
+/**
+ * Load test for operations involving side inputs.
+ *
+ * The purpose of this test is to measure cost of materialization or lookup 
of side inputs. It
+ * uses synthetic sources and {@link SyntheticStep} which can be parametrized 
to generate records
+ * with various sizes of keys and values, impose delays in the pipeline and 
simulate other
+ * performance challenges.
+ *
+ * To run the test manually, use the following command:
+ *
+ * 
+ *   ./gradlew :sdks:java:testing:load-tests:run -PloadTest.args='
+ *--sourceOptions={"numRecords":2000, ...}
+ *--sideInputType=ITERABLE
+ *--accessPercentage=1
+ *--windowCount=200
+ * 
+ */
+public class SideInputLoadTest extends LoadTest {
+
+  private static final String METRICS_NAMESPACE = "sideinput";
+  private static final Instant TIME = new Instant();
+
+  public SideInputLoadTest(String[] args) throws IOException {
+super(args, Options.class, METRICS_NAMESPACE);
+  }
+
+  @Override
+  void loadTest() throws IOException {
+Optional syntheticStep = 
createStep(options.getStepOptions());
+PCollection> input =
+pipeline
+.apply(readFromSource(sourceOptions))
+.apply(ParDo.of(new AddTimestamps()))
+.apply("Collect start time metrics", ParDo.of(runtimeMonitor))
+.apply(ParDo.of(new ByteMonitor(METRICS_NAMESPACE, 
"totalBytes.count")));
+
+performTestWithSideInput(
+input, 
SideInputMaterializationType.valueOf(options.getSideInputType()), 
syntheticStep);
+  }
+
+  private void performTestWithSideInput(
+  PCollection> input,
+  SideInputMaterializationType sideInputType,
+  Optional syntheticStep) {
+switch (sideInputType) {
+  case ITERABLE:
+performTestWithIterable(input, syntheticStep);
+break;
+  case HASHMAP:
+performTestWithHashMap(input, syntheticStep);
+break;
+  case LIST:
+performTestWithList(input, syntheticStep);
+break;
+}
+  }
+
+  private void performTestWithList(
+  PCollection> input, Optional 
syntheticStep) {
+applyStepIfPresent(input, "Synthetic step", syntheticStep);
+PCollectionView>> sideInput = 
input.apply(View.asList());
+input
+.apply(ParDo.of(new 
SideInputTestWithList(sideInput)).withSideInputs(sideInput))
+.apply("Collect end time metrics", ParDo.of(runtimeMonitor));
+  }
+
+  private void performTestWithHashMap(
+  PCollection> input, Optional 
syntheticStep) {
+applyStepIfPresent(input, "Synthetic step", syntheticStep);
+PCollectionView> sideInput = input.apply(View.asMap());
+input
+.apply(ParDo.of(new 
SideInputTestWithHashMap(sideInput)).withSideInputs(sideInput))
+.apply("Collect end time metrics", ParDo.of(runtimeMonitor));
+  }
+
+  private void performTestWithIterable(
+  PCollection> input, Optional 
syntheticStep) {
+applyStepIfPresent(input, 

[GitHub] [beam] kennknowles commented on a change in pull request #10949: [BEAM-9371] Add SideInputLoadTest to Java SDK

2020-03-02 Thread GitBox
kennknowles commented on a change in pull request #10949: [BEAM-9371] Add 
SideInputLoadTest to Java SDK
URL: https://github.com/apache/beam/pull/10949#discussion_r386745742
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SideInputLoadTest.java
 ##
 @@ -0,0 +1,243 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.loadtests;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Random;
+import org.apache.beam.sdk.io.synthetic.SyntheticStep;
+import org.apache.beam.sdk.options.Default;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.Validation;
+import org.apache.beam.sdk.testutils.metrics.ByteMonitor;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.View;
+import org.apache.beam.sdk.transforms.windowing.FixedWindows;
+import org.apache.beam.sdk.transforms.windowing.Window;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+import org.joda.time.Duration;
+import org.joda.time.Instant;
+
+/**
+ * Load test for operations involving side inputs.
+ *
+ * The purpose of this test is to measure cost of materialization or lookup 
of side inputs. It
+ * uses synthetic sources and {@link SyntheticStep} which can be parametrized 
to generate records
+ * with various sizes of keys and values, impose delays in the pipeline and 
simulate other
+ * performance challenges.
+ *
+ * To run the test manually, use the following command:
+ *
+ * 
+ *   ./gradlew :sdks:java:testing:load-tests:run -PloadTest.args='
+ *--sourceOptions={"numRecords":2000, ...}
+ *--sideInputType=ITERABLE
+ *--accessPercentage=1
+ *--windowCount=200
+ * 
+ */
+public class SideInputLoadTest extends LoadTest {
+
+  private static final String METRICS_NAMESPACE = "sideinput";
+  private static final Instant TIME = new Instant();
+
+  public SideInputLoadTest(String[] args) throws IOException {
+super(args, Options.class, METRICS_NAMESPACE);
+  }
+
+  @Override
+  void loadTest() throws IOException {
+Optional syntheticStep = 
createStep(options.getStepOptions());
+PCollection> input =
+pipeline
+.apply(readFromSource(sourceOptions))
+.apply(ParDo.of(new AddTimestamps()))
+.apply("Collect start time metrics", ParDo.of(runtimeMonitor))
+.apply(ParDo.of(new ByteMonitor(METRICS_NAMESPACE, 
"totalBytes.count")));
+
+performTestWithSideInput(
+input, 
SideInputMaterializationType.valueOf(options.getSideInputType()), 
syntheticStep);
+  }
+
+  private void performTestWithSideInput(
+  PCollection> input,
+  SideInputMaterializationType sideInputType,
+  Optional syntheticStep) {
+switch (sideInputType) {
+  case ITERABLE:
+performTestWithIterable(input, syntheticStep);
+break;
+  case HASHMAP:
+performTestWithHashMap(input, syntheticStep);
+break;
+  case LIST:
+performTestWithList(input, syntheticStep);
+break;
+}
+  }
+
+  private void performTestWithList(
+  PCollection> input, Optional 
syntheticStep) {
+applyStepIfPresent(input, "Synthetic step", syntheticStep);
+PCollectionView>> sideInput = 
input.apply(View.asList());
+input
+.apply(ParDo.of(new 
SideInputTestWithList(sideInput)).withSideInputs(sideInput))
+.apply("Collect end time metrics", ParDo.of(runtimeMonitor));
+  }
+
+  private void performTestWithHashMap(
+  PCollection> input, Optional 
syntheticStep) {
+applyStepIfPresent(input, "Synthetic step", syntheticStep);
+PCollectionView> sideInput = input.apply(View.asMap());
+input
+.apply(ParDo.of(new 
SideInputTestWithHashMap(sideInput)).withSideInputs(sideInput))
+.apply("Collect end time metrics", ParDo.of(runtimeMonitor));
+  }
+
+  private void performTestWithIterable(
+  PCollection> input, Optional 
syntheticStep) {
+applyStepIfPresent(input, 

[GitHub] [beam] kennknowles commented on a change in pull request #10949: [BEAM-9371] Add SideInputLoadTest to Java SDK

2020-03-02 Thread GitBox
kennknowles commented on a change in pull request #10949: [BEAM-9371] Add 
SideInputLoadTest to Java SDK
URL: https://github.com/apache/beam/pull/10949#discussion_r386740644
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SideInputLoadTest.java
 ##
 @@ -0,0 +1,243 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.loadtests;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Random;
+import org.apache.beam.sdk.io.synthetic.SyntheticStep;
+import org.apache.beam.sdk.options.Default;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.Validation;
+import org.apache.beam.sdk.testutils.metrics.ByteMonitor;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.View;
+import org.apache.beam.sdk.transforms.windowing.FixedWindows;
+import org.apache.beam.sdk.transforms.windowing.Window;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+import org.joda.time.Duration;
+import org.joda.time.Instant;
+
+/**
+ * Load test for operations involving side inputs.
+ *
+ * The purpose of this test is to measure cost of materialization or lookup 
of side inputs. It
+ * uses synthetic sources and {@link SyntheticStep} which can be parametrized 
to generate records
+ * with various sizes of keys and values, impose delays in the pipeline and 
simulate other
+ * performance challenges.
+ *
+ * To run the test manually, use the following command:
+ *
+ * 
+ *   ./gradlew :sdks:java:testing:load-tests:run -PloadTest.args='
+ *--sourceOptions={"numRecords":2000, ...}
+ *--sideInputType=ITERABLE
+ *--accessPercentage=1
+ *--windowCount=200
+ * 
+ */
+public class SideInputLoadTest extends LoadTest {
+
+  private static final String METRICS_NAMESPACE = "sideinput";
 
 Review comment:
   What are the metrics? Should they be described in the Javadoc? I read the 
superclass and did not see much explanation there either.
   
   The code here makes OK sense but is not enough information to use it. So I 
am just wondering what the experience is for someone who wants to use the class 
to build a new load test.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] kennknowles commented on a change in pull request #10949: [BEAM-9371] Add SideInputLoadTest to Java SDK

2020-03-02 Thread GitBox
kennknowles commented on a change in pull request #10949: [BEAM-9371] Add 
SideInputLoadTest to Java SDK
URL: https://github.com/apache/beam/pull/10949#discussion_r386743669
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SideInputLoadTest.java
 ##
 @@ -0,0 +1,243 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.loadtests;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Random;
+import org.apache.beam.sdk.io.synthetic.SyntheticStep;
+import org.apache.beam.sdk.options.Default;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.Validation;
+import org.apache.beam.sdk.testutils.metrics.ByteMonitor;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.View;
+import org.apache.beam.sdk.transforms.windowing.FixedWindows;
+import org.apache.beam.sdk.transforms.windowing.Window;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+import org.joda.time.Duration;
+import org.joda.time.Instant;
+
+/**
+ * Load test for operations involving side inputs.
+ *
+ * The purpose of this test is to measure cost of materialization or lookup 
of side inputs. It
+ * uses synthetic sources and {@link SyntheticStep} which can be parametrized 
to generate records
+ * with various sizes of keys and values, impose delays in the pipeline and 
simulate other
+ * performance challenges.
+ *
+ * To run the test manually, use the following command:
+ *
+ * 
+ *   ./gradlew :sdks:java:testing:load-tests:run -PloadTest.args='
+ *--sourceOptions={"numRecords":2000, ...}
+ *--sideInputType=ITERABLE
+ *--accessPercentage=1
+ *--windowCount=200
+ * 
+ */
+public class SideInputLoadTest extends LoadTest {
+
+  private static final String METRICS_NAMESPACE = "sideinput";
+  private static final Instant TIME = new Instant();
+
+  public SideInputLoadTest(String[] args) throws IOException {
+super(args, Options.class, METRICS_NAMESPACE);
+  }
+
+  @Override
+  void loadTest() throws IOException {
+Optional syntheticStep = 
createStep(options.getStepOptions());
+PCollection> input =
+pipeline
+.apply(readFromSource(sourceOptions))
+.apply(ParDo.of(new AddTimestamps()))
+.apply("Collect start time metrics", ParDo.of(runtimeMonitor))
+.apply(ParDo.of(new ByteMonitor(METRICS_NAMESPACE, 
"totalBytes.count")));
+
+performTestWithSideInput(
+input, 
SideInputMaterializationType.valueOf(options.getSideInputType()), 
syntheticStep);
+  }
+
+  private void performTestWithSideInput(
+  PCollection> input,
+  SideInputMaterializationType sideInputType,
+  Optional syntheticStep) {
+switch (sideInputType) {
+  case ITERABLE:
+performTestWithIterable(input, syntheticStep);
+break;
+  case HASHMAP:
+performTestWithHashMap(input, syntheticStep);
+break;
+  case LIST:
+performTestWithList(input, syntheticStep);
+break;
+}
+  }
+
+  private void performTestWithList(
+  PCollection> input, Optional 
syntheticStep) {
+applyStepIfPresent(input, "Synthetic step", syntheticStep);
+PCollectionView>> sideInput = 
input.apply(View.asList());
+input
+.apply(ParDo.of(new 
SideInputTestWithList(sideInput)).withSideInputs(sideInput))
+.apply("Collect end time metrics", ParDo.of(runtimeMonitor));
+  }
+
+  private void performTestWithHashMap(
+  PCollection> input, Optional 
syntheticStep) {
+applyStepIfPresent(input, "Synthetic step", syntheticStep);
+PCollectionView> sideInput = input.apply(View.asMap());
+input
+.apply(ParDo.of(new 
SideInputTestWithHashMap(sideInput)).withSideInputs(sideInput))
+.apply("Collect end time metrics", ParDo.of(runtimeMonitor));
+  }
+
+  private void performTestWithIterable(
+  PCollection> input, Optional 
syntheticStep) {
+applyStepIfPresent(input, 

[GitHub] [beam] robertwb commented on issue #11021: Remove excessive logging.

2020-03-02 Thread GitBox
robertwb commented on issue #11021: Remove excessive logging.
URL: https://github.com/apache/beam/pull/11021#issuecomment-593713484
 
 
   R: @ruwang may be a candidate for a cherry-pick


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb opened a new pull request #11021: Remove excessive logging.

2020-03-02 Thread GitBox
robertwb opened a new pull request #11021: Remove excessive logging.
URL: https://github.com/apache/beam/pull/11021
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[GitHub] [beam] rohdesamuel commented on issue #11005: [BEAM-8335] Modify the StreamingCache to subclass the CacheManager

2020-03-02 Thread GitBox
rohdesamuel commented on issue #11005: [BEAM-8335] Modify the StreamingCache to 
subclass the CacheManager 
URL: https://github.com/apache/beam/pull/11005#issuecomment-593703837
 
 
   R: @robertwb 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK

2020-03-02 Thread GitBox
pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery 
via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593703663
 
 
   Run Python 3.7 PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] KevinGG commented on issue #11020: [BEAM-7926] Update Data Visualization

2020-03-02 Thread GitBox
KevinGG commented on issue #11020: [BEAM-7926] Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#issuecomment-593701514
 
 
   R: @rohdesamuel 
   R: @pabloem 
   
   PTAL, thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK

2020-03-02 Thread GitBox
pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery 
via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593700905
 
 
   Run Python 3.7 PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] rohdesamuel commented on a change in pull request #10497: [BEAM-8335] Add the ReverseTestStream

2020-03-02 Thread GitBox
rohdesamuel commented on a change in pull request #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#discussion_r386726726
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/transform_evaluator.py
 ##
 @@ -471,7 +517,14 @@ def process_element(self, element):
 # We can either have the _TestStream or the _WatermarkController to emit
 # the elements. We chose to emit in the _WatermarkController so that the
 # element is emitted at the correct watermark value.
-for event in self.test_stream.events(self.current_index):
+events = []
+if self.watermark == MIN_TIMESTAMP:
 
 Review comment:
   I added a comment that explains this loop and its idempotency.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] rohdesamuel commented on a change in pull request #10497: [BEAM-8335] Add the ReverseTestStream

2020-03-02 Thread GitBox
rohdesamuel commented on a change in pull request #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#discussion_r386726510
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/transform_evaluator.py
 ##
 @@ -471,7 +517,14 @@ def process_element(self, element):
 # We can either have the _TestStream or the _WatermarkController to emit
 # the elements. We chose to emit in the _WatermarkController so that the
 # element is emitted at the correct watermark value.
-for event in self.test_stream.events(self.current_index):
+events = []
+if self.watermark == MIN_TIMESTAMP:
 
 Review comment:
   Yep, this can be called more than once, and I verified that it's okay for it 
to be called multiple times.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] pabloem commented on issue #11019: Reducing the number of API calls to BQ table.get

2020-03-02 Thread GitBox
pabloem commented on issue #11019: Reducing the number of API calls to BQ 
table.get
URL: https://github.com/apache/beam/pull/11019#issuecomment-593693425
 
 
   Run Python 3.7 PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] KevinGG opened a new pull request #11020: [BEAM-7926] Update Data Visualization

2020-03-02 Thread GitBox
KevinGG opened a new pull request #11020: [BEAM-7926] Update Data Visualization
URL: https://github.com/apache/beam/pull/11020
 
 
   1. Added include_window_info and visualize_data as **kwargs passed into
  `show`.
   2. Updated javascripts to make the data visualization smooth and
  resilient to DOM changes. Now datatable is loaded dynamically without
  flickering nor changing of user's page/search state; javascripts also
  work when refreshing the browser.
   3. Resolved the jQuery+Datatable loading issue by forcing chained
  loading. Any customized javascripts relying on jQuery should only use
  `window.jquery341`. Always carry out a check for `window.jquery341`.
  Run javascripts in the last onload of the jQuery loading chain if
  `window.jquery341` is not available.
   4. All HTML imports are chained at onload of webcomponents (if HTML import
  is not supported) or plainly imported (if HTML import supported) in a
  single place in document.head. This makes HTML import resilient to
  DOM changes caused by normal notebook usages.
   5. Updated some logging statements.
   6. Added `show_graph` API to render DAG of a pipeline. `pipeline.run`
  does not render DAG now.
   
   Change-Id: Id2ca548860fb2d30e1557a35e7b14d2e61b5f1a4
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 

[GitHub] [beam] chadrik commented on issue #10810: [BEAM-9274] Support running yapf in a git pre-commit hook

2020-03-02 Thread GitBox
chadrik commented on issue #10810: [BEAM-9274] Support running yapf in a git 
pre-commit hook
URL: https://github.com/apache/beam/pull/10810#issuecomment-593691076
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb merged pull request #10915: [BEAM-8335] Add PCollection to DataFrame logic for InteractiveRunner.

2020-03-02 Thread GitBox
robertwb merged pull request #10915: [BEAM-8335] Add PCollection to DataFrame 
logic for InteractiveRunner.
URL: https://github.com/apache/beam/pull/10915
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[beam] branch master updated (a167255 -> a29fdff)

2020-03-02 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from a167255  Merge pull request #10989 from lukecwik/beam9397
 add a29fdff  [BEAM-8335] Add PCollection to DataFrame logic for 
InteractiveRunner. (#10915)

No new revisions were added by this update.

Summary of changes:
 .../apache_beam/runners/interactive/utils.py   | 52 ++
 .../apache_beam/runners/interactive/utils_test.py  | 84 ++
 2 files changed, 136 insertions(+)
 create mode 100644 sdks/python/apache_beam/runners/interactive/utils.py
 create mode 100644 sdks/python/apache_beam/runners/interactive/utils_test.py



[GitHub] [beam] rohdesamuel commented on a change in pull request #10497: [BEAM-8335] Add the ReverseTestStream

2020-03-02 Thread GitBox
rohdesamuel commented on a change in pull request #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#discussion_r386723318
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -314,3 +355,239 @@ def from_runner_api_parameter(ptransform, payload, 
context):
 coder=coder,
 events=[Event.from_runner_api(e, coder) for e in payload.events],
 output_tags=output_tags)
+
+
+class TimingInfo(object):
+  def __init__(self, processing_time, watermark):
+self._processing_time = timestamp.Timestamp.of(processing_time)
+self._watermark = timestamp.Timestamp.of(watermark)
+
+  @property
+  def processing_time(self):
+return self._processing_time
+
+  @property
+  def watermark(self):
+return self._watermark
+
+  def __repr__(self):
+return '({}, {}, {})'.format(
+self.event_timestamp, self.processing_time, self.watermark)
+
+
+class PairWithTiming(PTransform):
+  """Pairs the input element with timing information.
+
+  Input: element; output: KV(element, timing information)
+  Where timing information := (processing time, watermark)
+
+  This is used in the ReverseTestStream implementation to replay watermark
+  advancements.
+  """
+
+  URN = "beam:transform:pair_with_timing:v1"
+
+  def expand(self, pcoll):
+return pvalue.PCollection.from_(pcoll)
+
+
+class ReverseTestStream(PTransform):
+  """A Transform that can create TestStream events from a stream of elements.
+
+  This currently assumes that this the pipeline being run on a single machine
+  and elements come in order and are outputted in the same order that they came
+  in.
+  """
+  class Format(Enum):
+TEST_STREAM_EVENTS = 1
+TEST_STREAM_FILE_RECORDS = 2
+SERIALIZED_TEST_STREAM_FILE_RECORDS = 3
+
+  def __init__(
+  self, sample_resolution_sec, output_tag, coder=None, output_format=None):
+self._sample_resolution_sec = sample_resolution_sec
+self._output_tag = output_tag
+self._output_format = output_format if output_format \
+  else ReverseTestStream.Format.TEST_STREAM_EVENTS
+self._coder = coder if coder else beam.coders.FastPrimitivesCoder()
+
+  def expand(self, pcoll):
+generator = (
+_TestStreamFileRecordGenerator(coder=self._coder) if (
+self._output_format in (
+self.Format.TEST_STREAM_FILE_RECORDS,
+self.Format.SERIALIZED_TEST_STREAM_FILE_RECORDS)) else
+_TestStreamEventGenerator())
+
+ret = (
+pcoll
+| beam.WindowInto(beam.window.GlobalWindows())
+
+# First get the initial timing information. This will be used to start
+# the periodic timers which will generate processing time and watermark
+# advancements every `sample_resolution_sec`.
+| 'initial timing' >> PairWithTiming()
+
+# Next, map every element to the same key so that only a single timer 
is
+# started for this given ReverseTestStream.
+| beam.Map(lambda x: (0, x))
+
+# Next, pass-through each element which will be paired with its timing
+# info in the next step. Also, start the periodic timers. We use timers
+# in this situation to capture watermark advancements that occur when
+# there are no elements being produced upstream.
+| beam.ParDo(
+_WatermarkEventGenerator(
+output_tag=self._output_tag,
+sample_resolution_sec=self._sample_resolution_sec))
+
+# Next, retrieve the timing information for watermark events that were
+# generated in the previous step. This is because elements generated
+# through the timers don't have their timing information yet.
+| 'timing info for watermarks' >> PairWithTiming()
+
+# Format the events properly.
+| beam.ParDo(generator))
+
+if self._output_format == self.Format.SERIALIZED_TEST_STREAM_FILE_RECORDS:
+
+  def serializer(e):
+return e.SerializeToString()
+
+  ret = ret | 'serializer' >> beam.Map(serializer)
+
+return ret
+
+
+class _WatermarkEventGenerator(beam.DoFn):
+  # Used to return the initial timing information.
+  EXECUTE_ONCE_STATE = beam.transforms.userstate.BagStateSpec(
+  name='execute_once_state', coder=beam.coders.FastPrimitivesCoder())
+  WATERMARK_TRACKER = TimerSpec('watermark_tracker', TimeDomain.REAL_TIME)
+
+  def __init__(self, output_tag, sample_resolution_sec=0.1):
+self._output_tag = output_tag
+self._sample_resolution_sec = sample_resolution_sec
+
+  @on_timer(WATERMARK_TRACKER)
+  def on_watermark_tracker(
+  self,
+  timestamp=beam.DoFn.TimestampParam,
+  window=beam.DoFn.WindowParam,
+  watermark_tracker=beam.DoFn.TimerParam(WATERMARK_TRACKER)):
+next_sample_time = (timestamp.micros * 1e-6) + self._sample_resolution_sec
+watermark_tracker.set(next_sample_time)
 
 Review comment:
   Added state in the 

[GitHub] [beam] rohdesamuel commented on a change in pull request #10497: [BEAM-8335] Add the ReverseTestStream

2020-03-02 Thread GitBox
rohdesamuel commented on a change in pull request #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#discussion_r386723371
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -314,3 +355,239 @@ def from_runner_api_parameter(ptransform, payload, 
context):
 coder=coder,
 events=[Event.from_runner_api(e, coder) for e in payload.events],
 output_tags=output_tags)
+
+
+class TimingInfo(object):
+  def __init__(self, processing_time, watermark):
+self._processing_time = timestamp.Timestamp.of(processing_time)
+self._watermark = timestamp.Timestamp.of(watermark)
+
+  @property
+  def processing_time(self):
+return self._processing_time
+
+  @property
+  def watermark(self):
+return self._watermark
+
+  def __repr__(self):
+return '({}, {}, {})'.format(
+self.event_timestamp, self.processing_time, self.watermark)
+
+
+class PairWithTiming(PTransform):
+  """Pairs the input element with timing information.
+
+  Input: element; output: KV(element, timing information)
+  Where timing information := (processing time, watermark)
+
+  This is used in the ReverseTestStream implementation to replay watermark
+  advancements.
+  """
+
+  URN = "beam:transform:pair_with_timing:v1"
+
+  def expand(self, pcoll):
+return pvalue.PCollection.from_(pcoll)
+
+
+class ReverseTestStream(PTransform):
+  """A Transform that can create TestStream events from a stream of elements.
+
+  This currently assumes that this the pipeline being run on a single machine
+  and elements come in order and are outputted in the same order that they came
+  in.
+  """
+  class Format(Enum):
+TEST_STREAM_EVENTS = 1
+TEST_STREAM_FILE_RECORDS = 2
+SERIALIZED_TEST_STREAM_FILE_RECORDS = 3
+
+  def __init__(
+  self, sample_resolution_sec, output_tag, coder=None, output_format=None):
+self._sample_resolution_sec = sample_resolution_sec
+self._output_tag = output_tag
+self._output_format = output_format if output_format \
+  else ReverseTestStream.Format.TEST_STREAM_EVENTS
+self._coder = coder if coder else beam.coders.FastPrimitivesCoder()
+
+  def expand(self, pcoll):
+generator = (
+_TestStreamFileRecordGenerator(coder=self._coder) if (
+self._output_format in (
+self.Format.TEST_STREAM_FILE_RECORDS,
+self.Format.SERIALIZED_TEST_STREAM_FILE_RECORDS)) else
+_TestStreamEventGenerator())
+
+ret = (
+pcoll
+| beam.WindowInto(beam.window.GlobalWindows())
+
+# First get the initial timing information. This will be used to start
+# the periodic timers which will generate processing time and watermark
+# advancements every `sample_resolution_sec`.
+| 'initial timing' >> PairWithTiming()
+
+# Next, map every element to the same key so that only a single timer 
is
+# started for this given ReverseTestStream.
+| beam.Map(lambda x: (0, x))
+
+# Next, pass-through each element which will be paired with its timing
+# info in the next step. Also, start the periodic timers. We use timers
+# in this situation to capture watermark advancements that occur when
+# there are no elements being produced upstream.
+| beam.ParDo(
+_WatermarkEventGenerator(
+output_tag=self._output_tag,
+sample_resolution_sec=self._sample_resolution_sec))
+
+# Next, retrieve the timing information for watermark events that were
+# generated in the previous step. This is because elements generated
+# through the timers don't have their timing information yet.
+| 'timing info for watermarks' >> PairWithTiming()
+
+# Format the events properly.
+| beam.ParDo(generator))
+
+if self._output_format == self.Format.SERIALIZED_TEST_STREAM_FILE_RECORDS:
+
+  def serializer(e):
+return e.SerializeToString()
+
+  ret = ret | 'serializer' >> beam.Map(serializer)
+
+return ret
+
+
+class _WatermarkEventGenerator(beam.DoFn):
+  # Used to return the initial timing information.
+  EXECUTE_ONCE_STATE = beam.transforms.userstate.BagStateSpec(
+  name='execute_once_state', coder=beam.coders.FastPrimitivesCoder())
+  WATERMARK_TRACKER = TimerSpec('watermark_tracker', TimeDomain.REAL_TIME)
+
+  def __init__(self, output_tag, sample_resolution_sec=0.1):
+self._output_tag = output_tag
+self._sample_resolution_sec = sample_resolution_sec
+
+  @on_timer(WATERMARK_TRACKER)
+  def on_watermark_tracker(
+  self,
+  timestamp=beam.DoFn.TimestampParam,
+  window=beam.DoFn.WindowParam,
+  watermark_tracker=beam.DoFn.TimerParam(WATERMARK_TRACKER)):
+next_sample_time = (timestamp.micros * 1e-6) + self._sample_resolution_sec
+watermark_tracker.set(next_sample_time)
+
+# Generate two events, the delta since 

[GitHub] [beam] pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK

2020-03-02 Thread GitBox
pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery 
via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593679995
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] pabloem commented on issue #11019: Reducing the number of API calls to BQ table.get

2020-03-02 Thread GitBox
pabloem commented on issue #11019: Reducing the number of API calls to BQ 
table.get
URL: https://github.com/apache/beam/pull/11019#issuecomment-593679725
 
 
   Run Python 3.7 PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] pabloem commented on a change in pull request #10497: [BEAM-8335] Add the ReverseTestStream

2020-03-02 Thread GitBox
pabloem commented on a change in pull request #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#discussion_r386714777
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/transform_evaluator.py
 ##
 @@ -471,7 +517,14 @@ def process_element(self, element):
 # We can either have the _TestStream or the _WatermarkController to emit
 # the elements. We chose to emit in the _WatermarkController so that the
 # element is emitted at the correct watermark value.
-for event in self.test_stream.events(self.current_index):
+events = []
+if self.watermark == MIN_TIMESTAMP:
 
 Review comment:
   I wonder if we intend this to only be called once - and if it's possible for 
it to be called more than once.
   In our discussion, it seems that:
   1) It is possible for this to be called more than once - right?
   2) It may be okay for it to be called more htan once.
   
   Let's confirm which of those are true


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] rohdesamuel commented on a change in pull request #10497: [BEAM-8335] Add the ReverseTestStream

2020-03-02 Thread GitBox
rohdesamuel commented on a change in pull request #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#discussion_r386711043
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -314,3 +355,239 @@ def from_runner_api_parameter(ptransform, payload, 
context):
 coder=coder,
 events=[Event.from_runner_api(e, coder) for e in payload.events],
 output_tags=output_tags)
+
+
+class TimingInfo(object):
+  def __init__(self, processing_time, watermark):
+self._processing_time = timestamp.Timestamp.of(processing_time)
+self._watermark = timestamp.Timestamp.of(watermark)
+
+  @property
+  def processing_time(self):
+return self._processing_time
+
+  @property
+  def watermark(self):
+return self._watermark
+
+  def __repr__(self):
+return '({}, {}, {})'.format(
+self.event_timestamp, self.processing_time, self.watermark)
+
+
+class PairWithTiming(PTransform):
+  """Pairs the input element with timing information.
+
+  Input: element; output: KV(element, timing information)
+  Where timing information := (processing time, watermark)
+
+  This is used in the ReverseTestStream implementation to replay watermark
+  advancements.
+  """
+
+  URN = "beam:transform:pair_with_timing:v1"
+
+  def expand(self, pcoll):
+return pvalue.PCollection.from_(pcoll)
+
+
+class ReverseTestStream(PTransform):
+  """A Transform that can create TestStream events from a stream of elements.
+
+  This currently assumes that this the pipeline being run on a single machine
+  and elements come in order and are outputted in the same order that they came
+  in.
+  """
+  class Format(Enum):
+TEST_STREAM_EVENTS = 1
+TEST_STREAM_FILE_RECORDS = 2
+SERIALIZED_TEST_STREAM_FILE_RECORDS = 3
+
+  def __init__(
+  self, sample_resolution_sec, output_tag, coder=None, output_format=None):
+self._sample_resolution_sec = sample_resolution_sec
+self._output_tag = output_tag
+self._output_format = output_format if output_format \
+  else ReverseTestStream.Format.TEST_STREAM_EVENTS
+self._coder = coder if coder else beam.coders.FastPrimitivesCoder()
+
+  def expand(self, pcoll):
+generator = (
+_TestStreamFileRecordGenerator(coder=self._coder) if (
+self._output_format in (
+self.Format.TEST_STREAM_FILE_RECORDS,
+self.Format.SERIALIZED_TEST_STREAM_FILE_RECORDS)) else
+_TestStreamEventGenerator())
+
+ret = (
+pcoll
+| beam.WindowInto(beam.window.GlobalWindows())
+
+# First get the initial timing information. This will be used to start
+# the periodic timers which will generate processing time and watermark
+# advancements every `sample_resolution_sec`.
+| 'initial timing' >> PairWithTiming()
+
+# Next, map every element to the same key so that only a single timer 
is
+# started for this given ReverseTestStream.
+| beam.Map(lambda x: (0, x))
+
+# Next, pass-through each element which will be paired with its timing
+# info in the next step. Also, start the periodic timers. We use timers
+# in this situation to capture watermark advancements that occur when
+# there are no elements being produced upstream.
+| beam.ParDo(
+_WatermarkEventGenerator(
+output_tag=self._output_tag,
+sample_resolution_sec=self._sample_resolution_sec))
+
+# Next, retrieve the timing information for watermark events that were
+# generated in the previous step. This is because elements generated
+# through the timers don't have their timing information yet.
+| 'timing info for watermarks' >> PairWithTiming()
+
+# Format the events properly.
+| beam.ParDo(generator))
+
+if self._output_format == self.Format.SERIALIZED_TEST_STREAM_FILE_RECORDS:
+
+  def serializer(e):
+return e.SerializeToString()
+
+  ret = ret | 'serializer' >> beam.Map(serializer)
+
+return ret
+
+
+class _WatermarkEventGenerator(beam.DoFn):
+  # Used to return the initial timing information.
+  EXECUTE_ONCE_STATE = beam.transforms.userstate.BagStateSpec(
+  name='execute_once_state', coder=beam.coders.FastPrimitivesCoder())
+  WATERMARK_TRACKER = TimerSpec('watermark_tracker', TimeDomain.REAL_TIME)
+
+  def __init__(self, output_tag, sample_resolution_sec=0.1):
+self._output_tag = output_tag
+self._sample_resolution_sec = sample_resolution_sec
+
+  @on_timer(WATERMARK_TRACKER)
+  def on_watermark_tracker(
+  self,
+  timestamp=beam.DoFn.TimestampParam,
+  window=beam.DoFn.WindowParam,
+  watermark_tracker=beam.DoFn.TimerParam(WATERMARK_TRACKER)):
+next_sample_time = (timestamp.micros * 1e-6) + self._sample_resolution_sec
+watermark_tracker.set(next_sample_time)
+
+# Generate two events, the delta since 

[GitHub] [beam] chadrik commented on issue #10810: [BEAM-9274] Support running yapf in a git pre-commit hook

2020-03-02 Thread GitBox
chadrik commented on issue #10810: [BEAM-9274] Support running yapf in a git 
pre-commit hook
URL: https://github.com/apache/beam/pull/10810#issuecomment-593672795
 
 
   Run Python2_PVR_Flink PreCommit
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] lukecwik commented on issue #10988: [BEAM-9382] Clean up of TestStreamTranscriptTests.

2020-03-02 Thread GitBox
lukecwik commented on issue #10988: [BEAM-9382] Clean up of 
TestStreamTranscriptTests.
URL: https://github.com/apache/beam/pull/10988#issuecomment-593667609
 
 
   Drive by comment:
   Shouldn't we disable the test for the direct runner if it isn't supported 
and file a bug instead of editing what the test does?
   Alternatively, could we fix the direct runner?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chamikaramj commented on issue #10477: [BEAM-8932][Cleanup] Cleanup pubsubio by removing optionality and adding defaults to builders.

2020-03-02 Thread GitBox
chamikaramj commented on issue #10477: [BEAM-8932][Cleanup] Cleanup pubsubio by 
removing optionality and adding defaults to builders.
URL: https://github.com/apache/beam/pull/10477#issuecomment-593662732
 
 
   Run Beam PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chamikaramj commented on issue #10477: [BEAM-8932][Cleanup] Cleanup pubsubio by removing optionality and adding defaults to builders.

2020-03-02 Thread GitBox
chamikaramj commented on issue #10477: [BEAM-8932][Cleanup] Cleanup pubsubio by 
removing optionality and adding defaults to builders.
URL: https://github.com/apache/beam/pull/10477#issuecomment-593662585
 
 
   LGTM. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chamikaramj commented on issue #10477: [BEAM-8932][Cleanup] Cleanup pubsubio by removing optionality and adding defaults to builders.

2020-03-02 Thread GitBox
chamikaramj commented on issue #10477: [BEAM-8932][Cleanup] Cleanup pubsubio by 
removing optionality and adding defaults to builders.
URL: https://github.com/apache/beam/pull/10477#issuecomment-593662682
 
 
   Run Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chamikaramj commented on issue #10477: [BEAM-8932][Cleanup] Cleanup pubsubio by removing optionality and adding defaults to builders.

2020-03-02 Thread GitBox
chamikaramj commented on issue #10477: [BEAM-8932][Cleanup] Cleanup pubsubio by 
removing optionality and adding defaults to builders.
URL: https://github.com/apache/beam/pull/10477#issuecomment-593662632
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] rohdesamuel commented on a change in pull request #10497: [BEAM-8335] Add the ReverseTestStream

2020-03-02 Thread GitBox
rohdesamuel commented on a change in pull request #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#discussion_r386694935
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/transform_evaluator.py
 ##
 @@ -421,8 +424,12 @@ def process_element(self, element):
   main_output = list(self._outputs)[0]
   bundle = self._evaluation_context.create_bundle(main_output)
   for tv in event.timestamped_values:
-bundle.output(
-GlobalWindows.windowed_value(tv.value, timestamp=tv.timestamp))
+# Unreify the value into the correct window.
+try:
+  bundle.output(WindowedValue(**tv.value))
+except TypeError:
+  bundle.output(
+  GlobalWindows.windowed_value(tv.value, timestamp=tv.timestamp))
 
 Review comment:
   Added a test for this


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] rohdesamuel commented on a change in pull request #10497: [BEAM-8335] Add the ReverseTestStream

2020-03-02 Thread GitBox
rohdesamuel commented on a change in pull request #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#discussion_r386694347
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/transform_evaluator.py
 ##
 @@ -432,6 +439,45 @@ def finish_bundle(self):
 self, self.bundles, [], None, {None: self._watermark})
 
 
+class PairWithTimingEvaluator(_TransformEvaluator):
+  """TransformEvaluator for the PairWithTiming transform.
+
+  This transform takes an element as an input and outputs
+  KV(element, `TimingInfo`). Where the `TimingInfo` contains both the
+  processing time timestamp and watermark.
+  """
+  def __init__(
+  self,
+  evaluation_context,
+  applied_ptransform,
+  input_committed_bundle,
+  side_inputs):
+assert not side_inputs
+super(PairWithTimingEvaluator, self).__init__(
+evaluation_context,
+applied_ptransform,
+input_committed_bundle,
+side_inputs)
+
+  def start_bundle(self):
+main_output = list(self._outputs)[0]
+self.bundle = self._evaluation_context.create_bundle(main_output)
+
+  def process_element(self, element):
+watermark_manager = self._evaluation_context._watermark_manager
+watermarks = watermark_manager.get_watermarks(self._applied_ptransform)
 
 Review comment:
   Done, moved to start_bundle


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] ihji commented on a change in pull request #11018: [BEAM-9415] fix postcommit xvr tests

2020-03-02 Thread GitBox
ihji commented on a change in pull request #11018: [BEAM-9415] fix postcommit 
xvr tests
URL: https://github.com/apache/beam/pull/11018#discussion_r386693638
 
 

 ##
 File path: sdks/java/testing/expansion-service/build.gradle
 ##
 @@ -28,7 +28,7 @@ dependencies {
   compile project(path: ":runners:core-construction-java")
   compile project(path: ":sdks:java:io:parquet")
   compile project(path: ":sdks:java:core", configuration: "shadow")
-  provided library.java.hadoop_client
+  testRuntime library.java.hadoop_client
 
 Review comment:
   Yes. test expansion service jar is also used for SDK container dependency. 
hadoop client library should be included in the shadow jar.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] rohdesamuel commented on a change in pull request #10497: [BEAM-8335] Add the ReverseTestStream

2020-03-02 Thread GitBox
rohdesamuel commented on a change in pull request #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#discussion_r386693162
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -314,3 +355,239 @@ def from_runner_api_parameter(ptransform, payload, 
context):
 coder=coder,
 events=[Event.from_runner_api(e, coder) for e in payload.events],
 output_tags=output_tags)
+
+
+class TimingInfo(object):
+  def __init__(self, processing_time, watermark):
+self._processing_time = timestamp.Timestamp.of(processing_time)
+self._watermark = timestamp.Timestamp.of(watermark)
+
+  @property
+  def processing_time(self):
+return self._processing_time
+
+  @property
+  def watermark(self):
+return self._watermark
+
+  def __repr__(self):
+return '({}, {}, {})'.format(
+self.event_timestamp, self.processing_time, self.watermark)
+
+
+class PairWithTiming(PTransform):
+  """Pairs the input element with timing information.
+
+  Input: element; output: KV(element, timing information)
+  Where timing information := (processing time, watermark)
+
+  This is used in the ReverseTestStream implementation to replay watermark
+  advancements.
+  """
+
+  URN = "beam:transform:pair_with_timing:v1"
+
+  def expand(self, pcoll):
+return pvalue.PCollection.from_(pcoll)
+
+
+class ReverseTestStream(PTransform):
+  """A Transform that can create TestStream events from a stream of elements.
+
+  This currently assumes that this the pipeline being run on a single machine
+  and elements come in order and are outputted in the same order that they came
+  in.
+  """
+  class Format(Enum):
+TEST_STREAM_EVENTS = 1
+TEST_STREAM_FILE_RECORDS = 2
+SERIALIZED_TEST_STREAM_FILE_RECORDS = 3
+
+  def __init__(
+  self, sample_resolution_sec, output_tag, coder=None, output_format=None):
+self._sample_resolution_sec = sample_resolution_sec
+self._output_tag = output_tag
+self._output_format = output_format if output_format \
+  else ReverseTestStream.Format.TEST_STREAM_EVENTS
+self._coder = coder if coder else beam.coders.FastPrimitivesCoder()
+
+  def expand(self, pcoll):
+generator = (
+_TestStreamFileRecordGenerator(coder=self._coder) if (
+self._output_format in (
+self.Format.TEST_STREAM_FILE_RECORDS,
+self.Format.SERIALIZED_TEST_STREAM_FILE_RECORDS)) else
+_TestStreamEventGenerator())
+
+ret = (
+pcoll
+| beam.WindowInto(beam.window.GlobalWindows())
+
+# First get the initial timing information. This will be used to start
+# the periodic timers which will generate processing time and watermark
+# advancements every `sample_resolution_sec`.
+| 'initial timing' >> PairWithTiming()
+
+# Next, map every element to the same key so that only a single timer 
is
+# started for this given ReverseTestStream.
+| beam.Map(lambda x: (0, x))
+
+# Next, pass-through each element which will be paired with its timing
+# info in the next step. Also, start the periodic timers. We use timers
+# in this situation to capture watermark advancements that occur when
+# there are no elements being produced upstream.
+| beam.ParDo(
+_WatermarkEventGenerator(
+output_tag=self._output_tag,
+sample_resolution_sec=self._sample_resolution_sec))
+
+# Next, retrieve the timing information for watermark events that were
+# generated in the previous step. This is because elements generated
+# through the timers don't have their timing information yet.
+| 'timing info for watermarks' >> PairWithTiming()
+
+# Format the events properly.
+| beam.ParDo(generator))
+
+if self._output_format == self.Format.SERIALIZED_TEST_STREAM_FILE_RECORDS:
+
+  def serializer(e):
+return e.SerializeToString()
+
+  ret = ret | 'serializer' >> beam.Map(serializer)
+
+return ret
+
+
+class _WatermarkEventGenerator(beam.DoFn):
+  # Used to return the initial timing information.
+  EXECUTE_ONCE_STATE = beam.transforms.userstate.BagStateSpec(
+  name='execute_once_state', coder=beam.coders.FastPrimitivesCoder())
+  WATERMARK_TRACKER = TimerSpec('watermark_tracker', TimeDomain.REAL_TIME)
+
+  def __init__(self, output_tag, sample_resolution_sec=0.1):
+self._output_tag = output_tag
+self._sample_resolution_sec = sample_resolution_sec
+
+  @on_timer(WATERMARK_TRACKER)
+  def on_watermark_tracker(
+  self,
+  timestamp=beam.DoFn.TimestampParam,
+  window=beam.DoFn.WindowParam,
+  watermark_tracker=beam.DoFn.TimerParam(WATERMARK_TRACKER)):
+next_sample_time = (timestamp.micros * 1e-6) + self._sample_resolution_sec
+watermark_tracker.set(next_sample_time)
+
+# Generate two events, the delta since 

[GitHub] [beam] chamikaramj commented on a change in pull request #11018: [BEAM-9415] fix postcommit xvr tests

2020-03-02 Thread GitBox
chamikaramj commented on a change in pull request #11018: [BEAM-9415] fix 
postcommit xvr tests
URL: https://github.com/apache/beam/pull/11018#discussion_r386692400
 
 

 ##
 File path: sdks/java/testing/expansion-service/build.gradle
 ##
 @@ -28,7 +28,7 @@ dependencies {
   compile project(path: ":runners:core-construction-java")
   compile project(path: ":sdks:java:io:parquet")
   compile project(path: ":sdks:java:core", configuration: "shadow")
-  provided library.java.hadoop_client
+  testRuntime library.java.hadoop_client
 
 Review comment:
   Is this change related ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] pabloem commented on issue #11019: Reducing the number of API calls to BQ table.get

2020-03-02 Thread GitBox
pabloem commented on issue #11019: Reducing the number of API calls to BQ 
table.get
URL: https://github.com/apache/beam/pull/11019#issuecomment-593657783
 
 
   Run Python 3.7 PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] pabloem opened a new pull request #11019: Reducing the number of API calls to BQ table.get

2020-03-02 Thread GitBox
pabloem opened a new pull request #11019: Reducing the number of API calls to 
BQ table.get
URL: https://github.com/apache/beam/pull/11019
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[GitHub] [beam] lukecwik merged pull request #10989: [BEAM-9397] Pass all supported StartBundleContext/FinishBundleContext except output receiver parameters to start bundle/finish bundle methods.

2020-03-02 Thread GitBox
lukecwik merged pull request #10989: [BEAM-9397] Pass all supported 
StartBundleContext/FinishBundleContext except output receiver parameters to 
start bundle/finish bundle methods.
URL: https://github.com/apache/beam/pull/10989
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[beam] branch master updated: [BEAM-9397] Pass all but output receiver parameters to start bundle/finish bundle methods.

2020-03-02 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
 new 360c4fe  [BEAM-9397] Pass all but output receiver parameters to start 
bundle/finish bundle methods.
 new a167255  Merge pull request #10989 from lukecwik/beam9397
360c4fe is described below

commit 360c4fe092413bbe9fb16ebfbe2d2e39fe31cb73
Author: Luke Cwik 
AuthorDate: Thu Feb 27 10:05:00 2020 -0800

[BEAM-9397] Pass all but output receiver parameters to start bundle/finish 
bundle methods.

The remaining work is covered by BEAM-1287.
---
 .../construction/SplittableParDoNaiveBounded.java  |  14 +-
 .../apache/beam/runners/core/SimpleDoFnRunner.java | 308 +++--
 .../core/SplittableParDoViaKeyedWorkItems.java |  12 +-
 .../java/org/apache/beam/sdk/transforms/DoFn.java  |   6 +
 .../org/apache/beam/sdk/transforms/DoFnTester.java |  10 +
 .../sdk/transforms/reflect/DoFnSignatures.java |   8 +-
 .../sdk/transforms/reflect/DoFnSignaturesTest.java |  17 +-
 .../apache/beam/fn/harness/FnApiDoFnRunner.java|  10 +
 8 files changed, 113 insertions(+), 272 deletions(-)

diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDoNaiveBounded.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDoNaiveBounded.java
index 30a44da..92a443a 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDoNaiveBounded.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDoNaiveBounded.java
@@ -146,6 +146,11 @@ public class SplittableParDoNaiveBounded {
 }
 
 @Override
+public PipelineOptions pipelineOptions() {
+  return c.getPipelineOptions();
+}
+
+@Override
 public String getErrorContext() {
   return "SplittableParDoNaiveBounded/StartBundle";
 }
@@ -200,19 +205,24 @@ public class SplittableParDoNaiveBounded {
 public void output(
 @Nullable OutputT output, Instant timestamp, BoundedWindow 
window) {
   throw new UnsupportedOperationException(
-  "Output from FinishBundle for SDF is not supported");
+  "Output from FinishBundle for SDF is not supported in 
naive implementation");
 }
 
 @Override
 public  void output(
 TupleTag tag, T output, Instant timestamp, 
BoundedWindow window) {
   throw new UnsupportedOperationException(
-  "Output from FinishBundle for SDF is not supported");
+  "Output from FinishBundle for SDF is not supported in 
naive implementation");
 }
   };
 }
 
 @Override
+public PipelineOptions pipelineOptions() {
+  return c.getPipelineOptions();
+}
+
+@Override
 public String getErrorContext() {
   return "SplittableParDoNaiveBounded/StartBundle";
 }
diff --git 
a/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java
 
b/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java
index 71efa12..a37644a 100644
--- 
a/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java
+++ 
b/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java
@@ -168,7 +168,7 @@ public class SimpleDoFnRunner implements 
DoFnRunner implements 
DoFnRunner 
implements DoFnRunner.StartBundleContext
-  implements DoFnInvoker.ArgumentProvider {
-private DoFnStartBundleContext() {
-  fn.super();
-}
+  /** An {@link DoFnInvoker.ArgumentProvider} for {@link DoFn.StartBundle 
@StartBundle}. */
+  private class DoFnStartBundleArgumentProvider
+  extends DoFnInvoker.BaseArgumentProvider {
+/** A concrete implementation of {@link DoFn.StartBundleContext}. */
+private class Context extends DoFn.StartBundleContext {
+  private Context() {
+fn.super();
+  }
 
-@Override
-public PipelineOptions getPipelineOptions() {
-  return options;
+  @Override
+  public PipelineOptions getPipelineOptions() {
+return options;
+  }
 }
 
-@Override
-public BoundedWindow window() {
-  throw new UnsupportedOperationException(
-  "Cannot access window outside of @ProcessElement and @OnTimer 
methods.");
-}
-
-@Override
-public PaneInfo paneInfo(DoFn doFn) {
-  throw new UnsupportedOperationException(
-  "Cannot access paneInfo outside of @ProcessElement methods.");
-}
+

[GitHub] [beam] chadrik commented on issue #10810: [BEAM-9274] Support running yapf in a git pre-commit hook

2020-03-02 Thread GitBox
chadrik commented on issue #10810: [BEAM-9274] Support running yapf in a git 
pre-commit hook
URL: https://github.com/apache/beam/pull/10810#issuecomment-593654495
 
 
   >> If it's opt-out, what happens when the user doesn't set up their local 
repo (per the instructions above)?
   
   > Nothing. Same as happens now. If there are errors they will discover them 
in Jenkins.
   
   sorry, misread the question as "if it's opt-in".  
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chadrik commented on a change in pull request #10810: [BEAM-9274] Support running yapf in a git pre-commit hook

2020-03-02 Thread GitBox
chadrik commented on a change in pull request #10810: [BEAM-9274] Support 
running yapf in a git pre-commit hook
URL: https://github.com/apache/beam/pull/10810#discussion_r386686435
 
 

 ##
 File path: .pre-commit-config.yaml
 ##
 @@ -0,0 +1,32 @@
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+repos:
+  - repo: https://github.com/pre-commit/mirrors-yapf
+# this rev is a release tag in the repo above and corresponds with a yapf
+# version. make sure this matches the version of yapf in tox.ini.
+rev: v0.29.0
 
 Review comment:
   done!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] ihji commented on issue #11018: [BEAM-9415] fix postcommit xvr tests

2020-03-02 Thread GitBox
ihji commented on issue #11018: [BEAM-9415] fix postcommit xvr tests
URL: https://github.com/apache/beam/pull/11018#issuecomment-593646636
 
 
   Updated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK

2020-03-02 Thread GitBox
pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery 
via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593643579
 
 
   restest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[beam] branch master updated (06b3111 -> 6b4b99c)

2020-03-02 Thread pabloem
This is an automated email from the ASF dual-hosted git repository.

pabloem pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 06b3111  Merge pull request #11014: [BEAM-8925] Tika version update to 
1.23
 add 6b4b99c  Merge pull request #10968 from [BEAM-9381] Adding display 
data to BoundedSource SDF

No new revisions were added by this update.

Summary of changes:
 sdks/python/apache_beam/io/iobase.py | 3 +++
 1 file changed, 3 insertions(+)



[GitHub] [beam] pabloem merged pull request #10968: [BEAM-9381] Adding display data to BoundedSource SDF

2020-03-02 Thread GitBox
pabloem merged pull request #10968: [BEAM-9381] Adding display data to 
BoundedSource SDF
URL: https://github.com/apache/beam/pull/10968
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chadrik commented on issue #10734: [BEAM-8979] reintroduce mypy-protobuf stub generation

2020-03-02 Thread GitBox
chadrik commented on issue #10734: [BEAM-8979] reintroduce mypy-protobuf stub 
generation
URL: https://github.com/apache/beam/pull/10734#issuecomment-593633730
 
 
   Oh, You have to create your own venv and install build-requirements.txt
   into it.  Yeah, pyproject.toml would solve this.
   
   
   On Mon, Mar 2, 2020 at 1:29 PM Udi Meiri  wrote:
   
   > It works through the Gradle tasks' virtualenvs, so Jenkins is fine. See
   > bug for workaround/solution. Perhaps using pyproject.toml is the way to go.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or unsubscribe
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] udim commented on issue #10734: [BEAM-8979] reintroduce mypy-protobuf stub generation

2020-03-02 Thread GitBox
udim commented on issue #10734: [BEAM-8979] reintroduce mypy-protobuf stub 
generation
URL: https://github.com/apache/beam/pull/10734#issuecomment-593632171
 
 
   It works through the Gradle tasks' virtualenvs, so Jenkins is fine. See bug 
for workaround/solution. Perhaps using pyproject.toml is the way to go.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chadrik commented on issue #10734: [BEAM-8979] reintroduce mypy-protobuf stub generation

2020-03-02 Thread GitBox
chadrik commented on issue #10734: [BEAM-8979] reintroduce mypy-protobuf stub 
generation
URL: https://github.com/apache/beam/pull/10734#issuecomment-593625714
 
 
   Arrg!  This was working!  Right?  Do you have any idea if something changed? 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chadrik commented on issue #10810: [BEAM-9274] Support running yapf in a git pre-commit hook

2020-03-02 Thread GitBox
chadrik commented on issue #10810: [BEAM-9274] Support running yapf in a git 
pre-commit hook
URL: https://github.com/apache/beam/pull/10810#issuecomment-593623997
 
 
   > Is this opt-in? 
   
   completely opt-in. 
   
   >If it's opt-out, what happens when the user doesn't set up their local repo 
(per the instructions above)?
   
   Nothing.  Same as happens now.  If there are errors they will discover them 
in Jenkins.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chamikaramj commented on issue #11018: [BEAM-9415] fix postcommit xvr tests

2020-03-02 Thread GitBox
chamikaramj commented on issue #11018: [BEAM-9415] fix postcommit xvr tests
URL: https://github.com/apache/beam/pull/11018#issuecomment-593622896
 
 
   Could you add more details about the breakage and the fix to the PR 
description and JIRA ? (since we are updating External.java here).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] rohdesamuel commented on a change in pull request #10915: [BEAM-8335] Add PCollection to DataFrame logic for InteractiveRunner.

2020-03-02 Thread GitBox
rohdesamuel commented on a change in pull request #10915: [BEAM-8335] Add 
PCollection to DataFrame logic for InteractiveRunner.
URL: https://github.com/apache/beam/pull/10915#discussion_r386646890
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/utils_test.py
 ##
 @@ -0,0 +1,182 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import unittest
+
+import numpy as np
+import pandas as pd
+
+from apache_beam.runners.interactive import utils
+from apache_beam.utils.windowed_value import WindowedValue
+
+
+class ParseToDataframeTest(unittest.TestCase):
 
 Review comment:
   Ack, kept tests that test the WindowValue cases.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] rohdesamuel commented on a change in pull request #10915: [BEAM-8335] Add PCollection to DataFrame logic for InteractiveRunner.

2020-03-02 Thread GitBox
rohdesamuel commented on a change in pull request #10915: [BEAM-8335] Add 
PCollection to DataFrame logic for InteractiveRunner.
URL: https://github.com/apache/beam/pull/10915#discussion_r386646690
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/utils.py
 ##
 @@ -0,0 +1,55 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Utilities to be used in  Interactive Beam.
+"""
+
+from __future__ import absolute_import
+
+import pandas as pd
+
+from apache_beam.utils.windowed_value import WindowedValue
+
+
+def elements_to_df(elements, include_window_info=False):
+  """Parses the given elements into a Dataframe.
+
+  If the elements are a list of `WindowedValue`s, then it will break out the
+  elements into their own DataFrame and return it. If include_window_info is
+  True, then it will concatenate the windowing information onto the elements
+  DataFrame.
+  """
+
+  rows = []
+  windowed_values = []
+  for e in elements:
+if isinstance(e, WindowedValue):
+  rows.append(e.value)
+else:
+  rows.append(e)
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] rohdesamuel commented on a change in pull request #10915: [BEAM-8335] Add PCollection to DataFrame logic for InteractiveRunner.

2020-03-02 Thread GitBox
rohdesamuel commented on a change in pull request #10915: [BEAM-8335] Add 
PCollection to DataFrame logic for InteractiveRunner.
URL: https://github.com/apache/beam/pull/10915#discussion_r386646757
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/utils.py
 ##
 @@ -0,0 +1,55 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Utilities to be used in  Interactive Beam.
+"""
+
+from __future__ import absolute_import
+
+import pandas as pd
+
+from apache_beam.utils.windowed_value import WindowedValue
+
+
+def elements_to_df(elements, include_window_info=False):
+  """Parses the given elements into a Dataframe.
+
+  If the elements are a list of `WindowedValue`s, then it will break out the
+  elements into their own DataFrame and return it. If include_window_info is
+  True, then it will concatenate the windowing information onto the elements
+  DataFrame.
+  """
+
+  rows = []
+  windowed_values = []
 
 Review comment:
   Changed to "windowing_info"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] ihji commented on issue #10621: [BEAM-9056] Staging artifacts from environment

2020-03-02 Thread GitBox
ihji commented on issue #10621: [BEAM-9056] Staging artifacts from environment
URL: https://github.com/apache/beam/pull/10621#issuecomment-593616556
 
 
   @chamikaramj @robertwb Comments addressed. PTAL. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] ihji commented on a change in pull request #10621: [BEAM-9056] Staging artifacts from environment

2020-03-02 Thread GitBox
ihji commented on a change in pull request #10621: [BEAM-9056] Staging 
artifacts from environment
URL: https://github.com/apache/beam/pull/10621#discussion_r386643699
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java
 ##
 @@ -261,14 +263,20 @@ public String registerCoder(Coder coder) throws 
IOException {
* return the same unique ID.
*/
   public String registerEnvironment(Environment env) {
+String environmentId;
 String existing = environmentIds.get(env);
 if (existing != null) {
-  return existing;
+  environmentId = existing;
+} else {
+  String name = uniqify(env.getUrn(), environmentIds.values());
+  environmentIds.put(env, name);
+  componentsBuilder.putEnvironments(name, env);
+  environmentId = name;
 }
-String name = uniqify(env.getUrn(), environmentIds.values());
-environmentIds.put(env, name);
-componentsBuilder.putEnvironments(name, env);
-return name;
+if (defaultEnvironmentId == null) {
 
 Review comment:
   Created a separate ticket: https://issues.apache.org/jira/browse/BEAM-9425


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] ihji commented on issue #11018: [BEAM-9415] fix postcommit xvr tests

2020-03-02 Thread GitBox
ihji commented on issue #11018: [BEAM-9415] fix postcommit xvr tests
URL: https://github.com/apache/beam/pull/11018#issuecomment-593614770
 
 
   R: @chamikaramj 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] ihji opened a new pull request #11018: [BEAM-9415] fix postcommit xvr tests

2020-03-02 Thread GitBox
ihji opened a new pull request #11018: [BEAM-9415] fix postcommit xvr tests
URL: https://github.com/apache/beam/pull/11018
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[GitHub] [beam] Hk-tang opened a new pull request #11017: Changed to using StandardCharsets.UTF_8

2020-03-02 Thread GitBox
Hk-tang opened a new pull request #11017: Changed to using 
StandardCharsets.UTF_8
URL: https://github.com/apache/beam/pull/11017
 
 
   Using new String(bytes, StandardCharsets.UTF_8) avoids having to catch the 
UnsupportedEncodingException
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[GitHub] [beam] pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK

2020-03-02 Thread GitBox
pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery 
via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593609967
 
 
   jenkins is the worst : )


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK

2020-03-02 Thread GitBox
pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery 
via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593609918
 
 
   restest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK

2020-03-02 Thread GitBox
pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery 
via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593606703
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] pabloem commented on issue #10968: [BEAM-9381] Adding display data to BoundedSource SDF

2020-03-02 Thread GitBox
pabloem commented on issue #10968: [BEAM-9381] Adding display data to 
BoundedSource SDF
URL: https://github.com/apache/beam/pull/10968#issuecomment-593607043
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] chunyang commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK

2020-03-02 Thread GitBox
chunyang commented on issue #10979: [BEAM-8841] Support writing data to 
BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593605041
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] alexvanboxel commented on issue #10413: [BEAM-9035] Typed options for Row Schema and Field

2020-03-02 Thread GitBox
alexvanboxel commented on issue #10413: [BEAM-9035] Typed options for Row 
Schema and Field
URL: https://github.com/apache/beam/pull/10413#issuecomment-593591725
 
 
   Verified that the failed Python 2 build isn't due to the PR. This PR is 
ready for review. I have 2 PR's based on this one ready: one for AVRO and other 
for Protobuf.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] ibzib commented on a change in pull request #11011: [website] Update link to environment_type (SDK harness configuration)

2020-03-02 Thread GitBox
ibzib commented on a change in pull request #11011: [website] Update link to 
environment_type (SDK harness configuration)
URL: https://github.com/apache/beam/pull/11011#discussion_r386614175
 
 

 ##
 File path: website/src/roadmap/portability.md
 ##
 @@ -165,7 +165,3 @@ Please see the [Spark Runner page]({{ site.baseurl 
}}/documentation/runners/spar
 how to run portable pipelines on top of Spark.
 
 Python streaming mode is not yet supported on Spark.
-
-## SDK Harness Configuration {#sdk-harness-config}
 
 Review comment:
   Can we leave this?
   a) Links to this header could be left floating around out there in my emails 
etc.
   b) I also think it's useful to link these two pages.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] udim opened a new pull request #11016: Reduce warnings in pytest runs.

2020-03-02 Thread GitBox
udim opened a new pull request #11016: Reduce warnings in pytest runs.
URL: https://github.com/apache/beam/pull/11016
 
 
   From 2233 down to 412.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[GitHub] [beam] udim commented on issue #10822: [BEAM-7746] Minor typing updates / fixes

2020-03-02 Thread GitBox
udim commented on issue #10822: [BEAM-7746] Minor typing updates / fixes
URL: https://github.com/apache/beam/pull/10822#issuecomment-593584144
 
 
   I want to leave some commits and squash others. There is no way to do that 
in the GH UI.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on a change in pull request #10915: [BEAM-8335] Add PCollection to DataFrame logic for InteractiveRunner.

2020-03-02 Thread GitBox
robertwb commented on a change in pull request #10915: [BEAM-8335] Add 
PCollection to DataFrame logic for InteractiveRunner.
URL: https://github.com/apache/beam/pull/10915#discussion_r386591818
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/utils.py
 ##
 @@ -0,0 +1,55 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Utilities to be used in  Interactive Beam.
+"""
+
+from __future__ import absolute_import
+
+import pandas as pd
+
+from apache_beam.utils.windowed_value import WindowedValue
+
+
+def elements_to_df(elements, include_window_info=False):
+  """Parses the given elements into a Dataframe.
+
+  If the elements are a list of `WindowedValue`s, then it will break out the
+  elements into their own DataFrame and return it. If include_window_info is
+  True, then it will concatenate the windowing information onto the elements
+  DataFrame.
+  """
+
+  rows = []
+  windowed_values = []
+  for e in elements:
+if isinstance(e, WindowedValue):
+  rows.append(e.value)
+else:
+  rows.append(e)
 
 Review comment:
   Remove this case. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on a change in pull request #10915: [BEAM-8335] Add PCollection to DataFrame logic for InteractiveRunner.

2020-03-02 Thread GitBox
robertwb commented on a change in pull request #10915: [BEAM-8335] Add 
PCollection to DataFrame logic for InteractiveRunner.
URL: https://github.com/apache/beam/pull/10915#discussion_r386592005
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/utils.py
 ##
 @@ -0,0 +1,55 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Utilities to be used in  Interactive Beam.
+"""
+
+from __future__ import absolute_import
+
+import pandas as pd
+
+from apache_beam.utils.windowed_value import WindowedValue
+
+
+def elements_to_df(elements, include_window_info=False):
+  """Parses the given elements into a Dataframe.
+
+  If the elements are a list of `WindowedValue`s, then it will break out the
+  elements into their own DataFrame and return it. If include_window_info is
+  True, then it will concatenate the windowing information onto the elements
+  DataFrame.
+  """
+
+  rows = []
+  windowed_values = []
 
 Review comment:
   windowing_info? (windowed_values sounds like it has the values themselves). 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] robertwb commented on a change in pull request #10915: [BEAM-8335] Add PCollection to DataFrame logic for InteractiveRunner.

2020-03-02 Thread GitBox
robertwb commented on a change in pull request #10915: [BEAM-8335] Add 
PCollection to DataFrame logic for InteractiveRunner.
URL: https://github.com/apache/beam/pull/10915#discussion_r386592264
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/utils_test.py
 ##
 @@ -0,0 +1,182 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import unittest
+
+import numpy as np
+import pandas as pd
+
+from apache_beam.runners.interactive import utils
+from apache_beam.utils.windowed_value import WindowedValue
+
+
+class ParseToDataframeTest(unittest.TestCase):
 
 Review comment:
   Remove tests that don't test our code. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   >