[GitHub] [beam] dennisylyung removed a comment on pull request #12583: [BEAM-10706] Fix duplicate key error in DynamoDBIO.Write

2020-11-16 Thread GitBox


dennisylyung removed a comment on pull request #12583:
URL: https://github.com/apache/beam/pull/12583#issuecomment-725986759


   @iemejia I made some changes to use a hashmap for deduplication, and to 
improve the test. 
   I separated the test for duplicate key, and used mockito to check whether 
the call to the aws client contains duplicate key. 
   However, as you have worried, each element is put into a single bundle. 
Hence each bundle consist of one element only, defeating the duplication test.
   I looked at the documentation of `DoFnTester` but it is deprecated. The docs 
suggests using `TestPipeline` instead, but I couldn't find anyway to control 
the bundling of elements. 
   Any idea?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on pull request #13361: Fix NPE in CountingSource

2020-11-16 Thread GitBox


boyuanzz commented on pull request #13361:
URL: https://github.com/apache/beam/pull/13361#issuecomment-728682733


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz opened a new pull request #13361: Fix NPE in CountingSource

2020-11-16 Thread GitBox


boyuanzz opened a new pull request #13361:
URL: https://github.com/apache/beam/pull/13361


   
   r: @y1chi 
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 
con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 Status](htt
 
ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build
 

[GitHub] [beam] udim merged pull request #13358: Update Beam Dataflow container versions for Python

2020-11-16 Thread GitBox


udim merged pull request #13358:
URL: https://github.com/apache/beam/pull/13358


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on pull request #13360: add validate runner dataflow v2 java badge

2020-11-16 Thread GitBox


ihji commented on pull request #13360:
URL: https://github.com/apache/beam/pull/13360#issuecomment-728673644


   R: @aaltay 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji opened a new pull request #13360: add validate runner dataflow v2 java badge

2020-11-16 Thread GitBox


ihji opened a new pull request #13360:
URL: https://github.com/apache/beam/pull/13360


   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 
con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 Status](htt
 
ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build
 

[GitHub] [beam] boyuanzz merged pull request #13356: [BEAM-11270] Dataflow Java on runner v2 tests are failing because sdk…

2020-11-16 Thread GitBox


boyuanzz merged pull request #13356:
URL: https://github.com/apache/beam/pull/13356


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] udim commented on pull request #13358: Update Beam Dataflow container versions for Python

2020-11-16 Thread GitBox


udim commented on pull request #13358:
URL: https://github.com/apache/beam/pull/13358#issuecomment-728661421


   Run Python_PVR_Flink PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] udim commented on pull request #13358: Update Beam Dataflow container versions for Python

2020-11-16 Thread GitBox


udim commented on pull request #13358:
URL: https://github.com/apache/beam/pull/13358#issuecomment-728660986


   Run Python_PVR_Flink PreCommit
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rezarokni commented on pull request #13112: [BEAM-11065] Apache Beam Template to ingest from Apache Kafka to Google Pub/Sub

2020-11-16 Thread GitBox


rezarokni commented on pull request #13112:
URL: https://github.com/apache/beam/pull/13112#issuecomment-728659985


   @kennknowles I wont be able to look at this until the end of the week at the 
earliest, is there someone else who can pick this up?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on pull request #13356: [BEAM-11270] Dataflow Java on runner v2 tests are failing because sdk…

2020-11-16 Thread GitBox


ihji commented on pull request #13356:
URL: https://github.com/apache/beam/pull/13356#issuecomment-728653835


   R: @boyuanzz 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on pull request #13359: [BEAM-11196] Cherry-pick #13303 to the 2.26.0 release branch.

2020-11-16 Thread GitBox


tvalentyn commented on pull request #13359:
URL: https://github.com/apache/beam/pull/13359#issuecomment-728578563


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] nehsyc commented on pull request #13292: [BEAM-10475]Add WithShardedKey variation of GroupIntoBatches transform in Python SDK.

2020-11-16 Thread GitBox


nehsyc commented on pull request #13292:
URL: https://github.com/apache/beam/pull/13292#issuecomment-728571398


   R: @boyuanzz 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #13211: [BEAM-8106] Separate Java8/11 container image build tasks

2020-11-16 Thread GitBox


TheNeuralBit commented on pull request #13211:
URL: https://github.com/apache/beam/pull/13211#issuecomment-728489376


   Run Java Dataflow V2 ValidatesRunner



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #13211: [BEAM-8106] Separate Java8/11 container image build tasks

2020-11-16 Thread GitBox


TheNeuralBit commented on pull request #13211:
URL: https://github.com/apache/beam/pull/13211#issuecomment-728488362


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit merged pull request #13340: [BEAM-11262] Remove numSleeps assertion in SpannerIOWriteTest

2020-11-16 Thread GitBox


TheNeuralBit merged pull request #13340:
URL: https://github.com/apache/beam/pull/13340


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] y1chi commented on a change in pull request #13350: [BEAM-11266] Python IO MongoDB: add bucket_auto aggregation option for bundling in Atlas.

2020-11-16 Thread GitBox


y1chi commented on a change in pull request #13350:
URL: https://github.com/apache/beam/pull/13350#discussion_r524486089



##
File path: sdks/python/apache_beam/io/mongodbio.py
##
@@ -241,6 +275,27 @@ def _get_split_keys(self, desired_chunk_size_in_mb, 
start_pos, end_pos):
   max={'_id': end_pos},
   maxChunkSize=desired_chunk_size_in_mb)['splitKeys'])
 
+  def _get_buckets(self, desired_chunk_size, start_pos, end_pos):
+if start_pos >= end_pos:
+  # single document not splittable
+  return []
+size = self.estimate_size()
+bucket_count = size // desired_chunk_size

Review comment:
   The split function will likely be called recursively for dynamic 
rebalancing, so for a range with start_pos and end_pos, it can be further split 
upon backend request, so it might not be reasonable to always use the total 
collection size divided by desired_chunk_size to calculate the bucket count. Is 
it possible to only get the buckets within the give _id range? and we can 
probably use an average document size times the number of documents to 
calculate the size of the range being split.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] y1chi commented on a change in pull request #13350: [BEAM-11266] Python IO MongoDB: add bucket_auto aggregation option for bundling in Atlas.

2020-11-16 Thread GitBox


y1chi commented on a change in pull request #13350:
URL: https://github.com/apache/beam/pull/13350#discussion_r524486089



##
File path: sdks/python/apache_beam/io/mongodbio.py
##
@@ -241,6 +275,27 @@ def _get_split_keys(self, desired_chunk_size_in_mb, 
start_pos, end_pos):
   max={'_id': end_pos},
   maxChunkSize=desired_chunk_size_in_mb)['splitKeys'])
 
+  def _get_buckets(self, desired_chunk_size, start_pos, end_pos):
+if start_pos >= end_pos:
+  # single document not splittable
+  return []
+size = self.estimate_size()
+bucket_count = size // desired_chunk_size

Review comment:
   The split function will likely be called recursively for dynamic 
rebalancing, so for a range with start_pos and end_pos, it can be further split 
upon backend request, so it might not be reasonable to always use the total 
collection size divide by desired_chunk_size to calculate the bucket count. Is 
it possible to only get the buckets within the give _id range? and we can 
probably use an average document size times the number of documents to 
calculate the size of the range being split.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kileys closed pull request #13289: [BEAM-9444] Add GCP BOM to all java projects

2020-11-16 Thread GitBox


kileys closed pull request #13289:
URL: https://github.com/apache/beam/pull/13289


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji removed a comment on pull request #13356: [BEAM-11270] Dataflow Java on runner v2 tests are failing because sdk…

2020-11-16 Thread GitBox


ihji removed a comment on pull request #13356:
URL: https://github.com/apache/beam/pull/13356#issuecomment-728433566


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on pull request #13356: [BEAM-11270] Dataflow Java on runner v2 tests are failing because sdk…

2020-11-16 Thread GitBox


ihji commented on pull request #13356:
URL: https://github.com/apache/beam/pull/13356#issuecomment-728433566


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] yifanmai commented on pull request #13359: [BEAM-11196] Ensure parent of fused stages is not one of its transforms

2020-11-16 Thread GitBox


yifanmai commented on pull request #13359:
URL: https://github.com/apache/beam/pull/13359#issuecomment-728433371


   R: @tvalentyn 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] yifanmai opened a new pull request #13359: [BEAM-11196] Ensure parent of fused stages is not one of its transforms

2020-11-16 Thread GitBox


yifanmai opened a new pull request #13359:
URL: https://github.com/apache/beam/pull/13359


   #13202 introduces a bug where the algorithm that determines the parents of 
fused stage can create loops, because the lowest common ancestor algorithm 
considers a transform to be its own parent. Fusing a single transform A will 
result cause the fused stage's parent to be set to A. Fusing a transform A with 
a descendant B will also cause fused stage's parent to be set to A. To fix 
this, if the parent of the fused stage would be one of transforms to be fused, 
set the parent to the parent of that transform instead.
   
   This is a cherry pick of #13303.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 
con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 Status](htt
 
ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 

[GitHub] [beam] TheNeuralBit commented on pull request #13211: [BEAM-8106] Separate Java8/11 container image build tasks

2020-11-16 Thread GitBox


TheNeuralBit commented on pull request #13211:
URL: https://github.com/apache/beam/pull/13211#issuecomment-728424003


   Run Java_Examples_Dataflow PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] udim commented on pull request #13358: Update Beam Dataflow container versions for Python

2020-11-16 Thread GitBox


udim commented on pull request #13358:
URL: https://github.com/apache/beam/pull/13358#issuecomment-728411154


   R: @TheNeuralBit @robertwb 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] udim opened a new pull request #13358: Update Beam Dataflow container versions for Python

2020-11-16 Thread GitBox


udim opened a new pull request #13358:
URL: https://github.com/apache/beam/pull/13358


   Continuation of https://github.com/apache/beam/pull/13323
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 
con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 Status](htt
 
ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build
 

[GitHub] [beam] veblush opened a new pull request #13357: [BEAM-8889] Upgrade GCSIO to 2.1.6 (Backport of #13311)

2020-11-16 Thread GitBox


veblush opened a new pull request #13357:
URL: https://github.com/apache/beam/pull/13357


   Backport of  #13311
   
   R:@kennknowles
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 
con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 Status](htt
 
ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build
 

[GitHub] [beam] TheNeuralBit commented on pull request #13347: [release-2.26.0][BEAM-11264] Add Reshuffle in pd.read_*

2020-11-16 Thread GitBox


TheNeuralBit commented on pull request #13347:
URL: https://github.com/apache/beam/pull/13347#issuecomment-728384639


   Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #13347: [release-2.26.0][BEAM-11264] Add Reshuffle in pd.read_*

2020-11-16 Thread GitBox


lostluck commented on pull request #13347:
URL: https://github.com/apache/beam/pull/13347#issuecomment-728382855


   Right, I knew this. Python and Java have dataflow in the precommits, which 
isn't how the Go SDK organises it's tests, which is why it's a surprise. 
   
   SGTM. Merging.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck merged pull request #13347: [release-2.26.0][BEAM-11264] Add Reshuffle in pd.read_*

2020-11-16 Thread GitBox


lostluck merged pull request #13347:
URL: https://github.com/apache/beam/pull/13347


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #13340: [BEAM-11262] Remove numSleeps assertion in SpannerIOWriteTest

2020-11-16 Thread GitBox


TheNeuralBit commented on pull request #13340:
URL: https://github.com/apache/beam/pull/13340#issuecomment-728382082


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles commented on pull request #13289: [BEAM-9444] Add GCP BOM to all java projects

2020-11-16 Thread GitBox


kennknowles commented on pull request #13289:
URL: https://github.com/apache/beam/pull/13289#issuecomment-728374842


   We chatted about that. I think the risk of forgetting to add the BOM to a 
module is less and the risk of messing up the deps of a module is higher. 
Eventually we will want dependency convergence but without some big picture 
testing and analysis strategy I would say let's skip this change.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck merged pull request #13272: [BEAM-11207] Metric Extraction via proto RPC API

2020-11-16 Thread GitBox


lostluck merged pull request #13272:
URL: https://github.com/apache/beam/pull/13272


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck merged pull request #13348: Go redundant type cleanup.

2020-11-16 Thread GitBox


lostluck merged pull request #13348:
URL: https://github.com/apache/beam/pull/13348


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #13347: [release-2.26.0][BEAM-11264] Add Reshuffle in pd.read_*

2020-11-16 Thread GitBox


TheNeuralBit commented on pull request #13347:
URL: https://github.com/apache/beam/pull/13347#issuecomment-728365075


   Well I can't find any documentation stating that Python tests will fail, 
just that Dataflow tests will fail. But looking at the last [Python PreCommit 
failure](https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/2316) it 
looks like the issue was hanging Dataflow tests:
   
   ```
   12:24:47 > Task :sdks:python:test-suites:dataflow:py36:preCommitIT_batch_V2
   12:24:47 
INFO:apache_beam.runners.dataflow.dataflow_runner:2020-11-16T20:24:29.045Z: 
JOB_MESSAGE_BASIC: Your project already contains 100 Dataflow-created metric 
descriptors, so new user metrics of the form custom.googleapis.com/* will not 
be created. However, all user metrics are also available in the metric 
dataflow.googleapis.com/job/user_counter. If you rely on the custom metrics, 
you can delete old / unused metric descriptors. See 
https://developers.google.com/apis-explorer/#p/monitoring/v3/monitoring.projects.metricDescriptors.list
 and 
https://developers.google.com/apis-explorer/#p/monitoring/v3/monitoring.projects.metricDescriptors.delete
   12:24:49 
   12:24:49 > Task :sdks:python:test-suites:dataflow:py37:preCommitIT_batch_V2
   12:24:49 
INFO:apache_beam.runners.dataflow.dataflow_runner:2020-11-16T20:24:46.063Z: 
JOB_MESSAGE_DETAILED: Autoscaling: Raised the number of workers to 1 based on 
the rate of progress in the currently running stage(s).
   12:25:17 
   12:25:17 > Task :sdks:python:test-suites:dataflow:py36:preCommitIT_batch_V2
   12:25:17 
INFO:apache_beam.runners.dataflow.dataflow_runner:2020-11-16T20:24:47.927Z: 
JOB_MESSAGE_DETAILED: Autoscaling: Raised the number of workers to 1 based on 
the rate of progress in the currently running stage(s).
   12:25:19 
   12:25:19 > Task :sdks:python:test-suites:dataflow:py37:preCommitIT_batch_V2
   12:25:19 
INFO:apache_beam.runners.dataflow.dataflow_runner:2020-11-16T20:25:16.445Z: 
JOB_MESSAGE_DETAILED: Workers have started successfully.
   12:25:19 
INFO:apache_beam.runners.dataflow.dataflow_runner:2020-11-16T20:25:16.480Z: 
JOB_MESSAGE_DETAILED: Workers have started successfully.
   12:25:47 
   12:25:47 > Task :sdks:python:test-suites:dataflow:py36:preCommitIT_batch_V2
   12:25:47 
INFO:apache_beam.runners.dataflow.dataflow_runner:2020-11-16T20:25:23.244Z: 
JOB_MESSAGE_DETAILED: Workers have started successfully.
   12:25:47 
INFO:apache_beam.runners.dataflow.dataflow_runner:2020-11-16T20:25:23.307Z: 
JOB_MESSAGE_DETAILED: Workers have started successfully.
   12:48:35 Build was aborted
   ```
   
   It looks like the same suites didn't fail on your verification branch 
though.. maybe something has changed?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on pull request #13356: [BEAM-11270] Dataflow Java on runner v2 tests are failing because sdk…

2020-11-16 Thread GitBox


ihji commented on pull request #13356:
URL: https://github.com/apache/beam/pull/13356#issuecomment-728364299


   Run Java Dataflow V2 ValidatesRunner



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on pull request #13356: [BEAM-11270] Dataflow Java on runner v2 tests are failing because sdk…

2020-11-16 Thread GitBox


ihji commented on pull request #13356:
URL: https://github.com/apache/beam/pull/13356#issuecomment-728363769


   run xvr_dataflow postcommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji opened a new pull request #13356: [BEAM-11270] Dataflow Java on runner v2 tests are failing because sdk…

2020-11-16 Thread GitBox


ihji opened a new pull request #13356:
URL: https://github.com/apache/beam/pull/13356


   … docker container is cleaned up incorrectly
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 
con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 Status](htt
 
ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[GitHub] [beam] TheNeuralBit commented on pull request #13347: [release-2.26.0][BEAM-11264] Add Reshuffle in pd.read_*

2020-11-16 Thread GitBox


TheNeuralBit commented on pull request #13347:
URL: https://github.com/apache/beam/pull/13347#issuecomment-728361839


   I think it's expected that Python test suites fail on the release branch 
until a Dataflow container release. Let me verify that



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles commented on pull request #13128: [BEAM-11265] Update quickstart-java.md

2020-11-16 Thread GitBox


kennknowles commented on pull request #13128:
URL: https://github.com/apache/beam/pull/13128#issuecomment-728361655


   Great. That's perfect.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #13128: [BEAM-11265] Update quickstart-java.md

2020-11-16 Thread GitBox


TheNeuralBit commented on pull request #13128:
URL: https://github.com/apache/beam/pull/13128#issuecomment-728359904


   > There are no accounts or anything needed for this, right?
   
   See my comment above, I think it's because we didn't want the quickstart 
project to require the GCP extension:
   
   > I don't think authentication is an issue since apache-beam-samples is open 
publicly, but I think the reason we don't use gs:// in all the examples is it 
requires extensions:google-cloud-platform. This isn't a problem for 
DataflowRunner though.
   
   Ultimately we adjusted this PR, now it just:
   - Updates the quickstart archetype by adding a sample.txt file with some 
shakespeare
   - Changes the file paths on the website to `/path/to/input`
   
   Once 2.27.0 is out we can update the website again to point users to the new 
sample.txt file that they'll have locally. This will probably just be for the 
DirectRunner since other runners most likely won't have access to the local 
filesystem.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rgruener removed a comment on pull request #13355: [BEAM-11272] Fix combiner label constructor arg

2020-11-16 Thread GitBox


rgruener removed a comment on pull request #13355:
URL: https://github.com/apache/beam/pull/13355#issuecomment-728354671


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rgruener commented on pull request #13355: [BEAM-11272] Fix combiner label constructor arg

2020-11-16 Thread GitBox


rgruener commented on pull request #13355:
URL: https://github.com/apache/beam/pull/13355#issuecomment-728354671


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rgruener removed a comment on pull request #13355: [BEAM-11272] Fix combiner label constructor arg

2020-11-16 Thread GitBox


rgruener removed a comment on pull request #13355:
URL: https://github.com/apache/beam/pull/13355#issuecomment-728337397







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mxm commented on a change in pull request #13353: [BEAM-11267] Remove unnecessary reshuffle for stateful ParDo after key…

2020-11-16 Thread GitBox


mxm commented on a change in pull request #13353:
URL: https://github.com/apache/beam/pull/13353#discussion_r524634584



##
File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslators.java
##
@@ -971,7 +987,9 @@ public void translateNode(
   .transform(fullName, outputTypeInfo, (OneInputStreamOperator) 
doFnOperator)
   .uid(fullName);
 
-  context.setOutputDataStream(context.getOutput(transform), outDataStream);
+  final PCollection>> output = 
context.getOutput(transform);
+  context.setOutputDataStream(output, outDataStream);
+  context.setProducer(output, transform);

Review comment:
   It would be nice if these explicit calls wouldn't be required. I believe 
`context.setOutputDataStream` internally has the current transform available. 
So we could perform it internally in the context.

##
File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslators.java
##
@@ -1127,7 +1148,9 @@ public void translateNode(
 
 
keyedWorkItemStream.getExecutionEnvironment().addOperator(rawFlinkTransform);
 
-context.setOutputDataStream(context.getOutput(transform), 
outDataStream);
+final PCollection> output = 
context.getOutput(transform);
+context.setOutputDataStream(output, outDataStream);
+context.setProducer(output, transform);

Review comment:
   Same here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles commented on pull request #13128: [BEAM-11265] Update quickstart-java.md

2020-11-16 Thread GitBox


kennknowles commented on pull request #13128:
URL: https://github.com/apache/beam/pull/13128#issuecomment-728348545


   A couple years ago I had this same thought ("why are we using pom.xml") and 
ended up finding an answer to my satisfaction and not changing it... I don't 
remember why, but please do watch for any issues that might come up. There are 
no accounts or anything needed for this, right?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles merged pull request #13311: [BEAM-8889] Upgrade GCSIO to 2.1.6

2020-11-16 Thread GitBox


kennknowles merged pull request #13311:
URL: https://github.com/apache/beam/pull/13311


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles commented on pull request #13311: [BEAM-8889] Upgrade GCSIO to 2.1.6

2020-11-16 Thread GitBox


kennknowles commented on pull request #13311:
URL: https://github.com/apache/beam/pull/13311#issuecomment-728345583


   OK then I am happy to merge. It is an experiment and there are no linkage 
errors and existing tests pass.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rgruener commented on pull request #13355: [BEAM-11272] Fix combiner label constructor arg

2020-11-16 Thread GitBox


rgruener commented on pull request #13355:
URL: https://github.com/apache/beam/pull/13355#issuecomment-728339372


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rgruener commented on pull request #13355: [BEAM-11272] Fix combiner label constructor arg

2020-11-16 Thread GitBox


rgruener commented on pull request #13355:
URL: https://github.com/apache/beam/pull/13355#issuecomment-728337397


retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] je-ik commented on pull request #13353: [BEAM-11267] Remove unnecessary reshuffle for stateful ParDo after key…

2020-11-16 Thread GitBox


je-ik commented on pull request #13353:
URL: https://github.com/apache/beam/pull/13353#issuecomment-728337299


   Do we have a test (in flink runner) for the GBK -> stateful pardo pair? Not 
sure if there is one in ValidatesRunner suite.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles commented on a change in pull request #13342: [BEAM-11260][BEAM-11261] Remove inappropriate assumptions about repo from linkage check script

2020-11-16 Thread GitBox


kennknowles commented on a change in pull request #13342:
URL: https://github.com/apache/beam/pull/13342#discussion_r524593181



##
File path: sdks/java/build-tools/beam-linkage-check.sh
##
@@ -36,46 +36,60 @@ set -o pipefail
 set -e
 
 # These default artifacts are common causes of linkage errors.
-ARTIFACTS="beam-sdks-java-core
-  beam-sdks-java-io-google-cloud-platform
-  beam-runners-google-cloud-dataflow-java
-  beam-sdks-java-io-hadoop-format"
-
-if [ ! -z "$1" ]; then
-  ARTIFACTS=$1
+DEFAULT_ARTIFACT_LISTS=" \
+  beam-sdks-java-core \
+  beam-sdks-java-io-google-cloud-platform \
+  beam-runners-google-cloud-dataflow-java \
+  beam-sdks-java-io-hadoop-format \
+"
+
+BASELINE_REF=$1
+PROPOSED_REF=$2
+ARTIFACT_LISTS=$3
+
+if [ -z "$ARTIFACT_LISTS" ]; then
+  ARTIFACT_LISTS=$DEFAULT_ARTIFACT_LISTS
 fi
 
-BRANCH_NAME=$(git symbolic-ref --short HEAD)
+if [ -z "$BASELINE_REF" ] || [ -z "$PROPOSED_REF" ] || [ -z "$ARTIFACT_LISTS" 
] ; then
+  echo "Usage: $0   [artifact lists]"
+  exit 1
+fi
 
 if [ ! -d buildSrc ]; then
-  echo "Please run this script in the Beam project root:"
-  echo "  /bin/bash sdks/java/build-tools/beam-linkage-check.sh"
-  exit 1
+  echo "Directory 'buildSrc' not found. Please run this script from the root 
directory of a clone of the Beam git repo."
 fi
 
-if [ "$BRANCH_NAME" = "master" ]; then
-  echo "Please run this script on a branch other than master"
+if [ "$BASELINE_REF" = "$PROPOSED_REF" ]; then
+  echo "Baseline and proposed refs are identical; cannot compare their linkage 
errors!"
   exit 1
 fi
 
-OUTPUT_DIR=build/linkagecheck
-mkdir -p $OUTPUT_DIR
-
 if [ ! -z "$(git diff)" ]; then
   echo "Uncommited change detected. Please commit changes and ensure 'git 
diff' is empty."
   exit 1
 fi
 
+STARTING_REF=$(git rev-parse --abbrev-ref HEAD)
+function cleanup() {
+  git checkout $STARTING_REF
+}
+trap cleanup EXIT
+
+echo "Comparing linkage of artifact lists $ARTIFACT_LISTS using baseline 
$BASELINE_REF and proposal $PROPOSED_REF"
+
+OUTPUT_DIR=build/linkagecheck
+mkdir -p $OUTPUT_DIR
+
 ACCUMULATED_RESULT=0
 
 function runLinkageCheck () {
-  COMMIT=$1
-  BRANCH=$2
-  MODE=$3 # "baseline" or "validate"
-  for ARTIFACT in $ARTIFACTS; do
-echo "`date`:" "Running linkage check (${MODE}) for ${ARTIFACT} in 
${BRANCH}"
+  MODE=$1 # "baseline" or "validate"
 
-BASELINE_FILE=${OUTPUT_DIR}/baseline-${ARTIFACT}.xml
+  for ARTIFACT_LIST in $ARTIFACT_LISTS; do
+echo "`date`:" "Running linkage check (${MODE}) for ${ARTIFACT_LISTS}"
+
+BASELINE_FILE=${OUTPUT_DIR}/baseline-${ARTIFACT_LIST}.xml

Review comment:
   It is a stylistic "nit" really. My expectation of gradle commands is 
that they operate on the current source tree. There are exceptions, sure. But 
my thinking was:
   
- `:checkJavaLinkage` always writes to `build/linkagecheck` and the report 
is the linkage report for the current state of the worktree.
- `beam-linkage-check.sh` operates at level above that. It produces the 
above file for two separate states of the source tree and compares them.
   
   It is not that important. Just recording my thoughts so we can discuss the 
script a bit.

##
File path: sdks/java/build-tools/beam-linkage-check.sh
##
@@ -36,46 +36,60 @@ set -o pipefail
 set -e
 
 # These default artifacts are common causes of linkage errors.
-ARTIFACTS="beam-sdks-java-core
-  beam-sdks-java-io-google-cloud-platform
-  beam-runners-google-cloud-dataflow-java
-  beam-sdks-java-io-hadoop-format"
-
-if [ ! -z "$1" ]; then
-  ARTIFACTS=$1
+DEFAULT_ARTIFACT_LISTS=" \

Review comment:
   Yes, I understand what the comma-separated list and space-separated list 
do. I just don't understand why we use both. I trust that there is a good 
reason, but I do not know it. Like if you had two subsets of artifacts that are 
not expected to work together?
   
   I would expect that we do want to run all the GCP modules together with the 
core SDK and, separately, run all the hadoop modules together with the core 
SDK, etc. Or maybe we just run them all together.
   
   So my proposal would be to drop the space-separated and only keep the 
comma-separated logic. But like I said, I trust there is a reason. So I am just 
making this proposal so you can tell me why it does not work.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] je-ik commented on a change in pull request #13353: [BEAM-11267] Remove unnecessary reshuffle for stateful ParDo after key…

2020-11-16 Thread GitBox


je-ik commented on a change in pull request #13353:
URL: https://github.com/apache/beam/pull/13353#discussion_r524597532



##
File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/WorkItemKeySelector.java
##
@@ -49,6 +52,6 @@ public ByteBuffer 
getKey(WindowedValue> value) thro
 
   @Override
   public TypeInformation getProducedType() {
-return new GenericTypeInfo<>(ByteBuffer.class);
+return new CoderTypeInformation<>(FlinkKeyUtils.ByteBufferCoder.of(), 
pipelineOptions.get());

Review comment:
   I would think that the partitioning is ensured by the 
`reinterpretAsKeyedStream` and will therefore be preserved from the previous 
shuffle phase. Is this not enough?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles commented on pull request #13306: [BEAM-10925] Create interface for SQL Java aggregate function.

2020-11-16 Thread GitBox


kennknowles commented on pull request #13306:
URL: https://github.com/apache/beam/pull/13306#issuecomment-728330950


   CombineFn is stable, but UDAF is not. One example is that a UDAF has to have 
a SQL type. Right now this is not represented on the UDAF object but is 
implied. That might change. Personally, I would make the type part of the UDAF 
object. If you look at any other SQL product's definition of UDAFs you will see 
lots of extra metadata 
([postgresql](https://www.postgresql.org/docs/9.5/sql-createaggregate.html), 
[mssql](https://docs.microsoft.com/en-us/sql/t-sql/statements/create-aggregate-transact-sql?view=sql-server-ver15)).
   
   A decoupled interface and trivial wrapper have obvious benefits and no real 
downside.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] iemejia commented on pull request #12963: [BEAM-10983] Add getting started from Spark page

2020-11-16 Thread GitBox


iemejia commented on pull request #12963:
URL: https://github.com/apache/beam/pull/12963#issuecomment-728329415


   Is there something important still missing to get this one merged? Maybe we 
can merge and ask/do minor fixes after?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] Aliraza-N edited a comment on pull request #13137: [BEAM-11073] Dicom IO Connector for Java

2020-11-16 Thread GitBox


Aliraza-N edited a comment on pull request #13137:
URL: https://github.com/apache/beam/pull/13137#issuecomment-727005597


   All done! @pabloem 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rgruener opened a new pull request #13355: [BEAM-11272] Fix combiner label constructor arg

2020-11-16 Thread GitBox


rgruener opened a new pull request #13355:
URL: https://github.com/apache/beam/pull/13355


   Combiners have a label constructor argument which is not currently used 
correctly.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 
con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 Status](htt
 
ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/)
 | [![Build 

[GitHub] [beam] lostluck commented on pull request #13347: [release-2.26.0][BEAM-11264] Add Reshuffle in pd.read_*

2020-11-16 Thread GitBox


lostluck commented on pull request #13347:
URL: https://github.com/apache/beam/pull/13347#issuecomment-728320640


   I'm not worried about the windows failures, but the precommit flake needed a 
quick re-run. LGTM and merging.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck edited a comment on pull request #13347: [release-2.26.0][BEAM-11264] Add Reshuffle in pd.read_*

2020-11-16 Thread GitBox


lostluck edited a comment on pull request #13347:
URL: https://github.com/apache/beam/pull/13347#issuecomment-728320640


   I'm not worried about the windows failures, but the precommit flake needed a 
quick re-run, which appears to be stalled?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #13347: [release-2.26.0][BEAM-11264] Add Reshuffle in pd.read_*

2020-11-16 Thread GitBox


lostluck commented on pull request #13347:
URL: https://github.com/apache/beam/pull/13347#issuecomment-728320898


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb merged pull request #13333: Further dataframe batch consolidation.

2020-11-16 Thread GitBox


robertwb merged pull request #1:
URL: https://github.com/apache/beam/pull/1


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz merged pull request #13344: [BEAM-11263] Java cleanUpDockerImages now force removes container images.

2020-11-16 Thread GitBox


boyuanzz merged pull request #13344:
URL: https://github.com/apache/beam/pull/13344


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] dmvk commented on a change in pull request #13353: [BEAM-11267] Remove unecessary reshuffle for stateful ParDo after key…

2020-11-16 Thread GitBox


dmvk commented on a change in pull request #13353:
URL: https://github.com/apache/beam/pull/13353#discussion_r524545989



##
File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTranslationContext.java
##
@@ -84,6 +85,17 @@ public void setOutputDataStream(PValue value, DataStream 
set) {
 }
   }
 
+   void setProducer(T value, PTransform producer) {
+if (!producers.containsKey(value)) {

Review comment:
    makes sense





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] dmvk commented on a change in pull request #13353: [BEAM-11267] Remove unecessary reshuffle for stateful ParDo after key…

2020-11-16 Thread GitBox


dmvk commented on a change in pull request #13353:
URL: https://github.com/apache/beam/pull/13353#discussion_r524545316



##
File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/WorkItemKeySelector.java
##
@@ -49,6 +52,6 @@ public ByteBuffer 
getKey(WindowedValue> value) thro
 
   @Override
   public TypeInformation getProducedType() {
-return new GenericTypeInfo<>(ByteBuffer.class);
+return new CoderTypeInformation<>(FlinkKeyUtils.ByteBufferCoder.of(), 
pipelineOptions.get());

Review comment:
   Not sure if this is necessary. I wanted to ensure that the new 
"reinterpreted partitioning" is compatible with the one used by GBK / Combine.
   
   The idea was if partitioning is not compatible, it may result in some state 
partitioning related glitches (eg. you wouldn't have local state for a 
key-group you need).
   
   Second thoughts, flink selects target partition (key group) based on "pojo 
hash code" (not based on binary representation), so the previous version was 
probably compatible enough 樂 
   
   @mxm WDYT?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] dmvk commented on a change in pull request #13353: [BEAM-11267] Remove unecessary reshuffle for stateful ParDo after key…

2020-11-16 Thread GitBox


dmvk commented on a change in pull request #13353:
URL: https://github.com/apache/beam/pull/13353#discussion_r524545316



##
File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/WorkItemKeySelector.java
##
@@ -49,6 +52,6 @@ public ByteBuffer 
getKey(WindowedValue> value) thro
 
   @Override
   public TypeInformation getProducedType() {
-return new GenericTypeInfo<>(ByteBuffer.class);
+return new CoderTypeInformation<>(FlinkKeyUtils.ByteBufferCoder.of(), 
pipelineOptions.get());

Review comment:
   Not sure if this is necessary. I wanted to ensure that the new 
"reinterpreted partitioning" is compatible with the one used by GBK / Combine.
   
   The idea was if partitioning is not compatible, it may result in some state 
partitioning related glitches (eg. you wouldn't have local state for a 
key-group you need).
   
   Second thoughts, flink selects target partition based on "pojo hash code" 
(not based on binary representation), so the previous version was probably 
compatible enough 樂 
   
   @mxm WDYT?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on pull request #13338: [BEAM-11070] Use self-checkpoint to enfore finalization happens.

2020-11-16 Thread GitBox


boyuanzz commented on pull request #13338:
URL: https://github.com/apache/beam/pull/13338#issuecomment-728292692


   Run PythonDocker PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit merged pull request #13128: [BEAM-11265] Update quickstart-java.md

2020-11-16 Thread GitBox


TheNeuralBit merged pull request #13128:
URL: https://github.com/apache/beam/pull/13128


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles merged pull request #13259: Fix Java ValidatesRunner V2 task dependency.

2020-11-16 Thread GitBox


kennknowles merged pull request #13259:
URL: https://github.com/apache/beam/pull/13259


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles commented on pull request #13259: Fix Java ValidatesRunner V2 task dependency.

2020-11-16 Thread GitBox


kennknowles commented on pull request #13259:
URL: https://github.com/apache/beam/pull/13259#issuecomment-728284902


   Makes sense to me.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on a change in pull request #13211: [BEAM-8106] Separate Java8/11 container image build tasks

2020-11-16 Thread GitBox


TheNeuralBit commented on a change in pull request #13211:
URL: https://github.com/apache/beam/pull/13211#discussion_r524529168



##
File path: sdks/java/container/common.gradle
##
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Build script containign common build tasks for Java SDK Docker images.

Review comment:
   typo nit:
   
   ```suggestion
* Build script containing common build tasks for Java SDK Docker images.
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on a change in pull request #13211: [BEAM-8106] Separate Java8/11 container image build tasks

2020-11-16 Thread GitBox


TheNeuralBit commented on a change in pull request #13211:
URL: https://github.com/apache/beam/pull/13211#discussion_r524528475



##
File path: sdks/java/container/build.gradle
##
@@ -86,51 +76,21 @@ licenseReport {
   renderers = [new JsonReportRenderer()]
 }
 
-def imageJavaVersion = project.hasProperty('imageJavaVersion') ? 
project.findProperty('imageJavaVersion') : '8'
-docker {

Review comment:
   Hm ok, I'm not sure if we have any precedent to follow for this. I don't 
think it's worth going to a lot of trouble for. Instead you could just send an 
FYI email to dev@ once this is merged letting people know they may need to 
change :sdks:java:container:docker -> :sdks:java:container:java8:docker.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on a change in pull request #13283: [BEAM-11142] Enable CrossLanguageValidateRunner test for dataflow run…

2020-11-16 Thread GitBox


boyuanzz commented on a change in pull request #13283:
URL: https://github.com/apache/beam/pull/13283#discussion_r524523450



##
File path: runners/google-cloud-dataflow-java/build.gradle
##
@@ -312,6 +313,36 @@ task validatesRunnerStreaming {
   ))
 }
 
+createCrossLanguageValidatesRunnerTask(

Review comment:
   This task makes all Dataflow runner v2 tests fail because the sdk 
container is cleaned up as soon as it is built. The cause is that in the common 
configuration, you have
   `config.startJobServer.finalizedBy config.cleanupJobServer`
   This will change the task dependency incorrectly.
   
   Filed jira here: https://issues.apache.org/jira/browse/BEAM-11270
   
   cc: @tysonjh





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on a change in pull request #13283: [BEAM-11142] Enable CrossLanguageValidateRunner test for dataflow run…

2020-11-16 Thread GitBox


boyuanzz commented on a change in pull request #13283:
URL: https://github.com/apache/beam/pull/13283#discussion_r524523450



##
File path: runners/google-cloud-dataflow-java/build.gradle
##
@@ -312,6 +313,36 @@ task validatesRunnerStreaming {
   ))
 }
 
+createCrossLanguageValidatesRunnerTask(

Review comment:
   This task makes all Dataflow runner v2 tests fail because the sdk 
container is cleaned up as soon as it is built. The cause is that in the common 
configuration, you have
   config.startJobServer.finalizedBy config.cleanupJobServer
   This will change the task dependency incorrectly.
   
   Filed jira here: https://issues.apache.org/jira/browse/BEAM-11270
   
   cc: @tysonjh





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #13235: Changing BigQuery insertAll error from INFO level logging to WARNING

2020-11-16 Thread GitBox


chamikaramj commented on pull request #13235:
URL: https://github.com/apache/beam/pull/13235#issuecomment-728274702


   I think the next step is to try out Beam 2.24.0 or later to see if this is 
really needed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on a change in pull request #13026: [WIP] [BEAM-7003 BEAM-8639 BEAM-8774] Update Kafka dependencies, enable IT test in Postcommit

2020-11-16 Thread GitBox


boyuanzz commented on a change in pull request #13026:
URL: https://github.com/apache/beam/pull/13026#discussion_r524517625



##
File path: sdks/java/io/kafka/build.gradle
##
@@ -65,26 +76,68 @@ dependencies {
   testCompile library.java.junit
   testCompile library.java.powermock
   testCompile library.java.powermock_mockito
+  testCompile library.java.testcontainers_kafka
   testRuntimeOnly library.java.slf4j_jdk14
   testRuntimeOnly project(path: ":runners:direct-java", configuration: 
"shadow")
-  kafkaVersion210 "org.apache.kafka:kafka-clients:2.1.0"
+  kafkaVersions.each {"kafkaVersion$it.key" 
"org.apache.kafka:kafka-clients:$it.value"}
 }
 
-configurations.kafkaVersion210 {
-  resolutionStrategy {
-force "org.apache.kafka:kafka-clients:2.1.0"
+kafkaVersions.each { kv ->
+  configurations."kafkaVersion$kv.key" {
+resolutionStrategy {
+  force "org.apache.kafka:kafka-clients:$kv.value"
+}
   }
 }
 
-task kafkaVersion210Test(type: Test) {

Review comment:
   So the problem is introduced by 
https://github.com/apache/beam/pull/13283/files





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] KevinGG commented on a change in pull request #13335: [BEAM-10921]: Fix BEAM-10921 and underlying issues

2020-11-16 Thread GitBox


KevinGG commented on a change in pull request #13335:
URL: https://github.com/apache/beam/pull/13335#discussion_r524516876



##
File path: 
sdks/python/apache_beam/runners/interactive/interactive_environment.py
##
@@ -163,7 +165,8 @@ def __init__(self):
 # the gRPC server serves.
 self._test_stream_service_controllers = {}
 self._cached_source_signature = {}
-self._tracked_user_pipelines = set()

Review comment:
   LGTM, only one question: 
   Does the `UserPipelineTracker` clean up the user pipeline and their derived 
pipelines when a user pipeline is out of scope (e.g., deleted or garbage 
collected)? Or are the pipelines tracked never get garbage collected at all?
   
   Is there any side effect when the user uses an outdated pipeline ref or a 
new pipeline ref (from re-executions) that results in the same `__hash__` or 
`__eq__`/`in` to be `True`? Will that give back a wrong user_pipeline when the 
tracker thinks the pipeline is tracked while it's not?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] KevinGG commented on a change in pull request #13335: [BEAM-10921]: Fix BEAM-10921 and underlying issues

2020-11-16 Thread GitBox


KevinGG commented on a change in pull request #13335:
URL: https://github.com/apache/beam/pull/13335#discussion_r524516876



##
File path: 
sdks/python/apache_beam/runners/interactive/interactive_environment.py
##
@@ -163,7 +165,8 @@ def __init__(self):
 # the gRPC server serves.
 self._test_stream_service_controllers = {}
 self._cached_source_signature = {}
-self._tracked_user_pipelines = set()

Review comment:
   LGTM, only one question: 
   Does the `UserPipelineTracker` clean up the user pipeline and their derived 
pipelines when a user pipeline is out of scope (e.g., deleted or garbage 
collected). 
   
   If not, is there any side effect when the user uses an outdated pipeline ref 
or a new pipeline ref (from re-executions) that results in the same `__hash__` 
or `__eq__`/`in` to be `True`? Will that give back a wrong user_pipeline when 
the tracker thinks the pipeline is tracked while it's not?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on a change in pull request #13333: Further dataframe batch consolidation.

2020-11-16 Thread GitBox


robertwb commented on a change in pull request #1:
URL: https://github.com/apache/beam/pull/1#discussion_r524508497



##
File path: sdks/python/apache_beam/dataframe/transforms.py
##
@@ -410,6 +426,35 @@ def _total_memory_usage(frame):
 float('inf')
 
 
+class _PreBatch(beam.DoFn):
+  def __init__(self, target_size=TARGET_PARTITION_SIZE):
+self._target_size = target_size
+
+  def start_bundle(self):
+self._parts = collections.defaultdict(list)
+self._running_size = 0
+
+  def process(
+  self,
+  part,
+  window=beam.DoFn.WindowParam,
+  timestamp=beam.DoFn.TimestampParam):
+part_size = _total_memory_usage(part)
+if part_size >= self._target_size:
+  yield part
+else:
+  self._running_size += part_size
+  self._parts[window, timestamp].append(part)
+  if self._running_size >= self._target_size:
+yield from self.finish_bundle()
+
+  def finish_bundle(self):
+for (window, timestamp), parts in self._parts.items():
+  yield windowed_value.WindowedValue(
+  pd.concat(parts), timestamp, (window, ))
+self.start_bundle()

Review comment:
   Yeah, I was thinking exactly the same... 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] edited a comment on pull request #12779: [BEAM-10856] Support for NestedValueProvider for Python SDK

2020-11-16 Thread GitBox


codecov[bot] edited a comment on pull request #12779:
URL: https://github.com/apache/beam/pull/12779#issuecomment-692856347


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12779?src=pr=h1) Report
   > Merging 
[#12779](https://codecov.io/gh/apache/beam/pull/12779?src=pr=desc) (af2c14c) 
into 
[master](https://codecov.io/gh/apache/beam/commit/3d6cc0ed9ed537229b27b5dbe73288f21b0e351c?el=desc)
 (3d6cc0e) will **decrease** coverage by `0.20%`.
   > The diff coverage is `93.33%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/beam/pull/12779/graphs/tree.svg?width=650=150=pr=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12779?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master   #12779  +/-   ##
   ==
   - Coverage   82.48%   82.28%   -0.21% 
   ==
 Files 455  451   -4 
 Lines   5487653738-1138 
   ==
   - Hits4526644217-1049 
   + Misses   9610 9521  -89 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/12779?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[sdks/python/apache\_beam/options/value\_provider.py](https://codecov.io/gh/apache/beam/pull/12779/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vb3B0aW9ucy92YWx1ZV9wcm92aWRlci5weQ==)
 | `91.76% <93.33%> (+0.21%)` | :arrow_up: |
   | 
[sdks/python/apache\_beam/utils/profiler.py](https://codecov.io/gh/apache/beam/pull/12779/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvcHJvZmlsZXIucHk=)
 | `32.11% <0.00%> (-54.35%)` | :arrow_down: |
   | 
[sdks/python/apache\_beam/utils/interactive\_utils.py](https://codecov.io/gh/apache/beam/pull/12779/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaW50ZXJhY3RpdmVfdXRpbHMucHk=)
 | `83.33% <0.00%> (-9.53%)` | :arrow_down: |
   | 
[...eam/runners/interactive/options/capture\_control.py](https://codecov.io/gh/apache/beam/pull/12779/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9vcHRpb25zL2NhcHR1cmVfY29udHJvbC5weQ==)
 | `92.00% <0.00%> (-8.00%)` | :arrow_down: |
   | 
[conftest.py](https://codecov.io/gh/apache/beam/pull/12779/diff?src=pr=tree#diff-Y29uZnRlc3QucHk=)
 | `77.77% <0.00%> (-7.94%)` | :arrow_down: |
   | 
[...s/snippets/transforms/aggregation/combinevalues.py](https://codecov.io/gh/apache/beam/pull/12779/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvc25pcHBldHMvdHJhbnNmb3Jtcy9hZ2dyZWdhdGlvbi9jb21iaW5ldmFsdWVzLnB5)
 | `87.36% <0.00%> (-7.37%)` | :arrow_down: |
   | 
[sdks/python/apache\_beam/\_\_init\_\_.py](https://codecov.io/gh/apache/beam/pull/12779/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vX19pbml0X18ucHk=)
 | `80.00% <0.00%> (-5.72%)` | :arrow_down: |
   | 
[sdks/python/apache\_beam/dataframe/partitionings.py](https://codecov.io/gh/apache/beam/pull/12779/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3BhcnRpdGlvbmluZ3MucHk=)
 | `83.60% <0.00%> (-5.44%)` | :arrow_down: |
   | 
[.../python/apache\_beam/io/gcp/bigquery\_io\_metadata.py](https://codecov.io/gh/apache/beam/pull/12779/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2lvX21ldGFkYXRhLnB5)
 | `86.95% <0.00%> (-3.67%)` | :arrow_down: |
   | 
[...pache\_beam/runners/interactive/interactive\_beam.py](https://codecov.io/gh/apache/beam/pull/12779/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9pbnRlcmFjdGl2ZV9iZWFtLnB5)
 | `76.02% <0.00%> (-3.51%)` | :arrow_down: |
   | ... and [87 
more](https://codecov.io/gh/apache/beam/pull/12779/diff?src=pr=tree-more) | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/beam/pull/12779?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/beam/pull/12779?src=pr=footer). Last 
update 
[eba648c...0f7961d](https://codecov.io/gh/apache/beam/pull/12779?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on a change in pull request #13272: [BEAM-11207] Metric Extraction via proto RPC API

2020-11-16 Thread GitBox


lostluck commented on a change in pull request #13272:
URL: https://github.com/apache/beam/pull/13272#discussion_r523124215



##
File path: sdks/go/pkg/beam/core/runtime/metricsx/metricsx_test.go
##
@@ -0,0 +1,166 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package metricsx
+
+import (
+   "testing"
+   "time"
+
+   "github.com/apache/beam/sdks/go/pkg/beam/core/metrics"
+   pipepb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1"
+   "github.com/google/go-cmp/cmp"
+)
+
+func TestFromMonitoringInfos_Counters(t *testing.T) {
+   var value int64 = 15
+   want := metrics.CounterResult{
+   Attempted: 15,
+   Committed: -1,
+   Key: metrics.StepKey{
+   Step:  "main.customDoFn",
+   Name:  "customCounter",
+   Namespace: "customDoFn",
+   }}
+
+   payload, err := Int64Counter(value)
+   if err != nil {
+   t.Fatalf("Failed to encode Int64Counter: %v", err)
+   }
+
+   labels := map[string]string{
+   "PTRANSFORM": "main.customDoFn",
+   "NAMESPACE":  "customDoFn",
+   "NAME":   "customCounter",
+   }
+
+   mInfo := {
+   Urn: UrnToString(UrnUserSumInt64),
+   Type:UrnToType(UrnUserSumInt64),
+   Labels:  labels,
+   Payload: payload,
+   }
+
+   attempted := []*pipepb.MonitoringInfo{mInfo}
+   committed := []*pipepb.MonitoringInfo{}
+
+   got := FromMonitoringInfos(attempted, committed).AllMetrics().Counters()
+   size := len(got)
+   if size < 1 {
+   t.Fatalf("Invalid array's size: got: %v, want: %v", size, 1)
+   }
+   if d := cmp.Diff(got[0], want); d != "" {

Review comment:
   You'll want to have want before got when using cmp.Diff. Otherwise your 
diff guidance is wrong.
   
   cmp.Diff(want, got[0]) means the diff will be (-want,+got) that is saying 
what is missing from want, and extra in got, which is a useful way to read 
them. cmp.Diff(got[0],want) means the diff will be (+want,-got), which will be 
confusing.
   
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #13347: [release-2.26.0][BEAM-11264] Add Reshuffle in pd.read_*

2020-11-16 Thread GitBox


lostluck commented on pull request #13347:
URL: https://github.com/apache/beam/pull/13347#issuecomment-728262843


   Run Python 3.8 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #13347: [release-2.26.0][BEAM-11264] Add Reshuffle in pd.read_*

2020-11-16 Thread GitBox


lostluck commented on pull request #13347:
URL: https://github.com/apache/beam/pull/13347#issuecomment-728262952


   Run Python 3.7 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #13348: Go redundant type cleanup.

2020-11-16 Thread GitBox


lostluck commented on pull request #13348:
URL: https://github.com/apache/beam/pull/13348#issuecomment-728262368


   Run Go PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #13347: [release-2.26.0][BEAM-11264] Add Reshuffle in pd.read_*

2020-11-16 Thread GitBox


lostluck commented on pull request #13347:
URL: https://github.com/apache/beam/pull/13347#issuecomment-728262680


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #13275: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch

2020-11-16 Thread GitBox


lostluck commented on pull request #13275:
URL: https://github.com/apache/beam/pull/13275#issuecomment-728261076







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on a change in pull request #13333: Further dataframe batch consolidation.

2020-11-16 Thread GitBox


TheNeuralBit commented on a change in pull request #1:
URL: https://github.com/apache/beam/pull/1#discussion_r524496438



##
File path: sdks/python/apache_beam/dataframe/transforms.py
##
@@ -223,8 +235,12 @@ def expand(self, pcolls):
 
 # Actually evaluate the expressions.
 def evaluate(partition, stage=self.stage, **side_inputs):
+  def lookup(expr):
+return expr.proxy(
+).iloc[:0] if partition[expr._id] is None else partition[expr._id]

Review comment:
   nit: maybe add a comment like this to clarify the intention
   
   ```suggestion
   # Use proxy if there's no data in this partition
   return expr.proxy(
   ).iloc[:0] if partition[expr._id] is None else 
partition[expr._id]
   ```

##
File path: sdks/python/apache_beam/dataframe/transforms.py
##
@@ -410,6 +426,35 @@ def _total_memory_usage(frame):
 float('inf')
 
 
+class _PreBatch(beam.DoFn):
+  def __init__(self, target_size=TARGET_PARTITION_SIZE):
+self._target_size = target_size
+
+  def start_bundle(self):
+self._parts = collections.defaultdict(list)
+self._running_size = 0
+
+  def process(
+  self,
+  part,
+  window=beam.DoFn.WindowParam,
+  timestamp=beam.DoFn.TimestampParam):
+part_size = _total_memory_usage(part)
+if part_size >= self._target_size:
+  yield part
+else:
+  self._running_size += part_size
+  self._parts[window, timestamp].append(part)
+  if self._running_size >= self._target_size:
+yield from self.finish_bundle()
+
+  def finish_bundle(self):
+for (window, timestamp), parts in self._parts.items():
+  yield windowed_value.WindowedValue(
+  pd.concat(parts), timestamp, (window, ))
+self.start_bundle()

Review comment:
   This is frustratingly similar to _ReBatch... but I don't see a 
reasonable way to combine them.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #13275: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch

2020-11-16 Thread GitBox


lostluck commented on pull request #13275:
URL: https://github.com/apache/beam/pull/13275#issuecomment-728260568







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #13275: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch

2020-11-16 Thread GitBox


lostluck commented on pull request #13275:
URL: https://github.com/apache/beam/pull/13275#issuecomment-728260456


   Run Release Gradle Build



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #13275: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch

2020-11-16 Thread GitBox


lostluck commented on pull request #13275:
URL: https://github.com/apache/beam/pull/13275#issuecomment-728260032


   Hmmm it seems I was mistaken and this did not automatically run all the post 
commits. Doing so now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb merged pull request #13341: Avoid unnecessary shuffling for single-input elementwise operations.

2020-11-16 Thread GitBox


robertwb merged pull request #13341:
URL: https://github.com/apache/beam/pull/13341


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on a change in pull request #13341: Avoid unnecessary shuffling for single-input elementwise operations.

2020-11-16 Thread GitBox


robertwb commented on a change in pull request #13341:
URL: https://github.com/apache/beam/pull/13341#discussion_r524496642



##
File path: sdks/python/apache_beam/dataframe/frames.py
##
@@ -1917,9 +1918,12 @@ def func(df, *args, **kwargs):
   '__i%s__' % base,
   frame_base._elementwise_method('__i%s__' % base, inplace=True))
 
-for name in ['__lt__', '__le__', '__gt__', '__ge__', '__eq__', '__ne__']:
-  setattr(DeferredSeries, name, frame_base._elementwise_method(name))
-  setattr(DeferredDataFrame, name, frame_base._elementwise_method(name))
+for name in ['lt', 'le', 'gt', 'ge', 'eq', 'ne']:
+  for p in '%s', '__%s__':
+# Note that non-underscore name is used for both as the __xxx__ methods are
+# order-sensitive.

Review comment:
   It appears just to be the comparison operators that suffer from this 
defect. 
   
   For PCollections, the order is unspecified, so in some sense one can't say 
that comparison of "differently-ordered" dataframes should be rejected (as the 
notion of "differently-ordered" is not well defined).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on a change in pull request #13215: [BEAM-11151] Adds the ToStringFnRunner to Java

2020-11-16 Thread GitBox


robertwb commented on a change in pull request #13215:
URL: https://github.com/apache/beam/pull/13215#discussion_r524491784



##
File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/ToStringFnRunner.java
##
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.fn.harness;
+
+import com.google.auto.service.AutoService;
+import java.util.Map;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
+import org.apache.beam.runners.core.construction.PTransformTranslation;
+import org.apache.beam.sdk.function.ThrowingFunction;
+import org.apache.beam.sdk.values.KV;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+/**
+ * Translates from elements to human-readable string.
+ *
+ * Translation function:
+ *
+ * 
+ *   Input: {@code KV}
+ *   Output: {@code KV}
+ * 
+ *
+ * For each element, the human-readable string is returned. The nonce is 
used by a runner to
+ * associate each input with its output. The nonce is represented as an opaque 
set of bytes.
+ */
+@SuppressWarnings({

Review comment:
   Can you scope this more narrowly, or use  instead?

##
File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/ToStringFnRunner.java
##
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.fn.harness;
+
+import com.google.auto.service.AutoService;
+import java.util.Map;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
+import org.apache.beam.runners.core.construction.PTransformTranslation;
+import org.apache.beam.sdk.function.ThrowingFunction;
+import org.apache.beam.sdk.values.KV;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+/**
+ * Translates from elements to human-readable string.
+ *
+ * Translation function:
+ *
+ * 
+ *   Input: {@code KV}
+ *   Output: {@code KV}
+ * 
+ *
+ * For each element, the human-readable string is returned. The nonce is 
used by a runner to
+ * associate each input with its output. The nonce is represented as an opaque 
set of bytes.
+ */
+@SuppressWarnings({
+  "rawtypes" // TODO(https://issues.apache.org/jira/browse/BEAM-10556)
+})
+public class ToStringFnRunner {
+  static final String URN = PTransformTranslation.TO_STRING_TRANSFORM_URN;
+
+  /**
+   * A registrar which provides a factory to handle translating elements to a 
human readable string.
+   */
+  @AutoService(PTransformRunnerFactory.Registrar.class)
+  public static class Registrar implements PTransformRunnerFactory.Registrar {
+
+@Override
+public Map getPTransformRunnerFactories() 
{
+  return ImmutableMap.of(
+  URN,
+  
MapFnRunners.forValueMapFnFactory(ToStringFnRunner::createToStringFunctionForPTransform));
+}
+  }
+
+  static  ThrowingFunction, KV> 
createToStringFunctionForPTransform(
+  String ptransformId, PTransform pTransform) {
+return (KV input) -> {
+  String val = "null";

Review comment:
   Rather than assigning a value and then overwriting it, do
   
   if (v == null) {
 val = "null";
   } else {
  val = v.toString();
   }
   
   You could also just return "" + val or, even better, just use 
Objects.toString(input.getValue()).





This is an automated message from the Apache Git Service.
To respond to the message, 

[GitHub] [beam] TheNeuralBit merged pull request #13331: [BEAM-11219][Website revamp] Development of All about Apache Beam component

2020-11-16 Thread GitBox


TheNeuralBit merged pull request #13331:
URL: https://github.com/apache/beam/pull/13331


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on a change in pull request #13331: [BEAM-11219][Website revamp] Development of All about Apache Beam component

2020-11-16 Thread GitBox


TheNeuralBit commented on a change in pull request #13331:
URL: https://github.com/apache/beam/pull/13331#discussion_r524493084



##
File path: website/www/site/assets/icons/extensive-icon.svg
##
@@ -0,0 +1,7 @@
+http://www.w3.org/2000/svg; width="112" height="112" fill="none" 
viewBox="0 0 112 112">
+
+
+
+
+
+

Review comment:
   Ah neat, thanks for the explanation





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on a change in pull request #13026: [WIP] [BEAM-7003 BEAM-8639 BEAM-8774] Update Kafka dependencies, enable IT test in Postcommit

2020-11-16 Thread GitBox


boyuanzz commented on a change in pull request #13026:
URL: https://github.com/apache/beam/pull/13026#discussion_r524491777



##
File path: sdks/java/io/kafka/build.gradle
##
@@ -65,26 +76,68 @@ dependencies {
   testCompile library.java.junit
   testCompile library.java.powermock
   testCompile library.java.powermock_mockito
+  testCompile library.java.testcontainers_kafka
   testRuntimeOnly library.java.slf4j_jdk14
   testRuntimeOnly project(path: ":runners:direct-java", configuration: 
"shadow")
-  kafkaVersion210 "org.apache.kafka:kafka-clients:2.1.0"
+  kafkaVersions.each {"kafkaVersion$it.key" 
"org.apache.kafka:kafka-clients:$it.value"}
 }
 
-configurations.kafkaVersion210 {
-  resolutionStrategy {
-force "org.apache.kafka:kafka-clients:2.1.0"
+kafkaVersions.each { kv ->
+  configurations."kafkaVersion$kv.key" {
+resolutionStrategy {
+  force "org.apache.kafka:kafka-clients:$kv.value"
+}
   }
 }
 
-task kafkaVersion210Test(type: Test) {

Review comment:
   It seems like the sdk docker image is cleaned up incorrectly before test 
finishes. I can have a quick fix for that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] edited a comment on pull request #13026: [WIP] [BEAM-7003 BEAM-8639 BEAM-8774] Update Kafka dependencies, enable IT test in Postcommit

2020-11-16 Thread GitBox


codecov[bot] edited a comment on pull request #13026:
URL: https://github.com/apache/beam/pull/13026#issuecomment-704839820


   # [Codecov](https://codecov.io/gh/apache/beam/pull/13026?src=pr=h1) Report
   > Merging 
[#13026](https://codecov.io/gh/apache/beam/pull/13026?src=pr=desc) (000ac07) 
into 
[master](https://codecov.io/gh/apache/beam/commit/3d6cc0ed9ed537229b27b5dbe73288f21b0e351c?el=desc)
 (3d6cc0e) will **increase** coverage by `0.07%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/beam/pull/13026/graphs/tree.svg?width=650=150=pr=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/13026?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master   #13026  +/-   ##
   ==
   + Coverage   82.48%   82.55%   +0.07% 
   ==
 Files 455  455  
 Lines   5487655143 +267 
   ==
   + Hits4526645526 +260 
   - Misses   9610 9617   +7 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/13026?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[...ks/python/apache\_beam/runners/worker/data\_plane.py](https://codecov.io/gh/apache/beam/pull/13026/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvZGF0YV9wbGFuZS5weQ==)
 | `88.32% <0.00%> (-1.20%)` | :arrow_down: |
   | 
[...eam/runners/interactive/interactive\_environment.py](https://codecov.io/gh/apache/beam/pull/13026/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9pbnRlcmFjdGl2ZV9lbnZpcm9ubWVudC5weQ==)
 | `89.45% <0.00%> (-0.36%)` | :arrow_down: |
   | 
[...nners/portability/fn\_api\_runner/worker\_handlers.py](https://codecov.io/gh/apache/beam/pull/13026/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3dvcmtlcl9oYW5kbGVycy5weQ==)
 | `80.57% <0.00%> (-0.18%)` | :arrow_down: |
   | 
[...runners/interactive/display/pcoll\_visualization.py](https://codecov.io/gh/apache/beam/pull/13026/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9kaXNwbGF5L3Bjb2xsX3Zpc3VhbGl6YXRpb24ucHk=)
 | `85.26% <0.00%> (-0.08%)` | :arrow_down: |
   | 
[...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/13026/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==)
 | `94.34% <0.00%> (ø)` | |
   | 
[...beam/runners/portability/local\_job\_service\_main.py](https://codecov.io/gh/apache/beam/pull/13026/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9sb2NhbF9qb2Jfc2VydmljZV9tYWluLnB5)
 | `0.00% <0.00%> (ø)` | |
   | 
[.../python/apache\_beam/transforms/periodicsequence.py](https://codecov.io/gh/apache/beam/pull/13026/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9wZXJpb2RpY3NlcXVlbmNlLnB5)
 | `98.24% <0.00%> (+1.75%)` | :arrow_up: |
   | 
[sdks/python/apache\_beam/io/gcp/bigquery\_tools.py](https://codecov.io/gh/apache/beam/pull/13026/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X3Rvb2xzLnB5)
 | `90.69% <0.00%> (+2.90%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/beam/pull/13026?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/beam/pull/13026?src=pr=footer). Last 
update 
[2f2ffda...f7d06ba](https://codecov.io/gh/apache/beam/pull/13026?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski commented on pull request #13026: [WIP] [BEAM-7003 BEAM-8639 BEAM-8774] Update Kafka dependencies, enable IT test in Postcommit

2020-11-16 Thread GitBox


piotr-szuberski commented on pull request #13026:
URL: https://github.com/apache/beam/pull/13026#issuecomment-728249571


   Run Java PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] y1chi commented on a change in pull request #13350: [BEAM-11266] Python IO MongoDB: add bucket_auto aggregation option for bundling in Atlas.

2020-11-16 Thread GitBox


y1chi commented on a change in pull request #13350:
URL: https://github.com/apache/beam/pull/13350#discussion_r524486089



##
File path: sdks/python/apache_beam/io/mongodbio.py
##
@@ -241,6 +275,27 @@ def _get_split_keys(self, desired_chunk_size_in_mb, 
start_pos, end_pos):
   max={'_id': end_pos},
   maxChunkSize=desired_chunk_size_in_mb)['splitKeys'])
 
+  def _get_buckets(self, desired_chunk_size, start_pos, end_pos):
+if start_pos >= end_pos:
+  # single document not splittable
+  return []
+size = self.estimate_size()
+bucket_count = size // desired_chunk_size

Review comment:
   The split function will likely be called recursively for dynamic 
rebalancing, so for a range with start_pos and end_pos, it can be further split 
upon backend request, so it might not reasonable to always use the total 
collection size divide by desired_chunk_size to calculate the bucket count. Is 
it possible to only get the buckets within the give _id range? and we can 
probably use an average document size times the number of documents to 
calculate the size of the range being split.

##
File path: sdks/python/apache_beam/io/mongodbio.py
##
@@ -241,6 +275,27 @@ def _get_split_keys(self, desired_chunk_size_in_mb, 
start_pos, end_pos):
   max={'_id': end_pos},
   maxChunkSize=desired_chunk_size_in_mb)['splitKeys'])
 
+  def _get_buckets(self, desired_chunk_size, start_pos, end_pos):
+if start_pos >= end_pos:
+  # single document not splittable
+  return []
+size = self.estimate_size()
+bucket_count = size // desired_chunk_size
+if size % desired_chunk_size != 0:
+  bucket_count += 1
+with beam.io.mongodbio.MongoClient(self.uri, **self.spec) as client:
+  buckets = list(

Review comment:
   the return buckets should guarantee the _id range is start_pos and 
end_pos otherwise same document could be read multiple times.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski commented on pull request #12611: [BEAM-10139][BEAM-10140] Add cross-language support for Java SpannerIO with python wrapper

2020-11-16 Thread GitBox


piotr-szuberski commented on pull request #12611:
URL: https://github.com/apache/beam/pull/12611#issuecomment-728248437


   > Looks good, merging now. Thanks for all your work on this @piotr-szuberski 
:)
   
   Thank you too for your reviews! :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski opened a new pull request #13354: [BEAM-8569] Add changes note about Hadoop 3 support

2020-11-16 Thread GitBox


piotr-szuberski opened a new pull request #13354:
URL: https://github.com/apache/beam/pull/13354


   R: @iemejia 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 
con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 Status](htt
 
ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build
 

  1   2   >