[GitHub] [beam] ihji commented on pull request #11607: [BEAM-9430] Makes sure the watermarks returned by estimators are within bounds

2020-05-04 Thread GitBox


ihji commented on pull request #11607:
URL: https://github.com/apache/beam/pull/11607#issuecomment-623860331


   Shouldn't we also disable `IllegalArgumentException`s in the constructors?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on pull request #11574: [BEAM-9449] Pass PipelineOptions through expansion service

2020-05-04 Thread GitBox


ihji commented on pull request #11574:
URL: https://github.com/apache/beam/pull/11574#issuecomment-623816157


   Thanks. Looks good to me overall.
   
   I think we should also consider adding optional `pipeline_options` argument 
to `ExternalTransform` given that each different expansion service needs 
different pipeline options.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on a change in pull request #11039: [BEAM-9383] Staging Dataflow artifacts from environment

2020-05-04 Thread GitBox


ihji commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r419841344



##
File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
##
@@ -462,7 +462,8 @@ def run_pipeline(self, pipeline, options):
 use_fnapi = apiclient._use_fnapi(options)
 from apache_beam.transforms import environments
 default_environment = environments.DockerEnvironment.from_container_image(
-apiclient.get_container_image_from_options(options))
+apiclient.get_container_image_from_options(options),
+artifacts=environments.python_sdk_dependencies(options))

Review comment:
   We need pipeline option to populate artifacts. I think we could either 
use `from_options` instead and override container image or just leave as is.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on a change in pull request #11039: [BEAM-9383] Staging Dataflow artifacts from environment

2020-05-04 Thread GitBox


ihji commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r419839927



##
File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py
##
@@ -102,7 +102,8 @@ def setUp(self):
 '--staging_location=ignored',
 '--temp_location=/dev/null',
 '--no_auth',
-'--dry_run=True'
+'--dry_run=True',
+'--sdk_location=container'

Review comment:
   The test tries to download a dev version of apache-beam dependency 
(which indeed does not exist in pypi) when it constructs the environment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] henryken commented on pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas

2020-05-04 Thread GitBox


henryken commented on pull request #11564:
URL: https://github.com/apache/beam/pull/11564#issuecomment-623808517


   Awesome! This PR can be merged now. Thanks @damondouglas!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on a change in pull request #11039: [BEAM-9383] Staging Dataflow artifacts from environment

2020-05-04 Thread GitBox


ihji commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r419838456



##
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java
##
@@ -336,25 +323,26 @@ public DataflowPackage stageToFile(
 final AtomicInteger numCached = new AtomicInteger(0);
 List> destinationPackages = new 
ArrayList<>();
 
-for (String classpathElement : classpathElements) {
-  DataflowPackage sourcePackage = new DataflowPackage();
-  if (classpathElement.contains("=")) {
-String[] components = classpathElement.split("=", 2);
-sourcePackage.setName(components[0]);
-sourcePackage.setLocation(components[1]);
-  } else {
-sourcePackage.setName(null);
-sourcePackage.setLocation(classpathElement);
+for (StagedFile classpathElement : classpathElements) {
+  DataflowPackage targetPackage = classpathElement.getStagedPackage();
+  String source = classpathElement.getSource();
+  if (source.contains("=")) {

Review comment:
   removed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on a change in pull request #11039: [BEAM-9383] Staging Dataflow artifacts from environment

2020-05-04 Thread GitBox


ihji commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r419838363



##
File path: model/pipeline/src/main/proto/beam_runner_api.proto
##
@@ -1271,6 +1271,11 @@ message DeferredArtifactPayload {
 message ArtifactStagingToRolePayload {
   // A generated staged name (relative path under staging directory).
   string staged_name = 1;
+
+  // (Optional) An artifact name when a runner supports it.
+  // For example, DataflowRunner requires predefined names for some artifacts
+  // such as "dataflow-worker.jar", "windmill_main".
+  string alias_name = 2;

Review comment:
   This is Dataflow specific requirement. `DataflowPackage` model has two 
separate fields for `location` and `name`. `staged_name` and `alias_name` 
correspond to `location` and `name` respectively.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on a change in pull request #11039: [BEAM-9383] Staging Dataflow artifacts from environment

2020-05-04 Thread GitBox


ihji commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r419832339



##
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
##
@@ -772,6 +783,88 @@ private Debuggee registerDebuggee(CloudDebugger 
debuggerClient, String uniquifie
 }
   }
 
+  private List stageArtifacts(RunnerApi.Pipeline pipeline) {
+ImmutableList.Builder filesToStageBuilder = 
ImmutableList.builder();
+for (Map.Entry entry :
+pipeline.getComponents().getEnvironmentsMap().entrySet()) {
+  for (RunnerApi.ArtifactInformation info : 
entry.getValue().getDependenciesList()) {
+if 
(!BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn()))
 {
+  throw new RuntimeException(
+  String.format("unsupported artifact type %s", 
info.getTypeUrn()));
+}
+RunnerApi.ArtifactFilePayload filePayload;
+try {
+  filePayload = 
RunnerApi.ArtifactFilePayload.parseFrom(info.getTypePayload());
+} catch (InvalidProtocolBufferException e) {
+  throw new RuntimeException("Error parsing artifact file payload.", 
e);
+}
+if (!BeamUrns.getUrn(RunnerApi.StandardArtifacts.Roles.STAGING_TO)
+.equals(info.getRoleUrn())) {
+  throw new RuntimeException(
+  String.format("unsupported artifact role %s", 
info.getRoleUrn()));
+}
+RunnerApi.ArtifactStagingToRolePayload stagingPayload;
+try {
+  stagingPayload = 
RunnerApi.ArtifactStagingToRolePayload.parseFrom(info.getRolePayload());
+} catch (InvalidProtocolBufferException e) {
+  throw new RuntimeException("Error parsing artifact staging_to role 
payload.", e);
+}
+DataflowPackage target = new DataflowPackage();
+target.setLocation(stagingPayload.getStagedName());
+if (!Strings.isNullOrEmpty(stagingPayload.getAliasName())) {
+  target.setName(stagingPayload.getAliasName());
+}
+filesToStageBuilder.add(StagedFile.of(filePayload.getPath(), target));
+  }
+}
+return options.getStager().stageFiles(filesToStageBuilder.build());
+  }
+
+  private List getDefaultArtifacts() {
+ImmutableList.Builder pathsToStageBuilder = 
ImmutableList.builder();
+ImmutableMap.Builder aliasMapBuilder = 
ImmutableMap.builder();
+String windmillBinary =
+
options.as(DataflowPipelineDebugOptions.class).getOverrideWindmillBinary();
+String dataflowWorkerJar = options.getDataflowWorkerJar();
+if (dataflowWorkerJar != null && !dataflowWorkerJar.isEmpty()) {
+  // Put the user specified worker jar at the start of the classpath, to 
be consistent with the
+  // built in worker order.
+  pathsToStageBuilder.add(dataflowWorkerJar);
+  aliasMapBuilder.put(dataflowWorkerJar, "dataflow-worker.jar");
+}
+for (String path : options.getFilesToStage()) {
+  if (path.contains("=")) {

Review comment:
   Yes. This syntax is only supported in Dataflow runner. `DataflowPackage` 
has a separate field `name` in addition to `location` and "=" separator allows 
to prefix `name` to the location of the source e.g. 
"dataflow.jar=/tmp/foo.jar". I could remove this special syntax but I decided 
to keep it since it's already exposed to users via `--filesToStage` option so 
removing it may cause backward compatibility issue.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11210: [BEAM-8949] SpannerIO integration tests

2020-05-04 Thread GitBox


aaltay commented on pull request #11210:
URL: https://github.com/apache/beam/pull/11210#issuecomment-623801499


   @mszb - Do you know why the test is failing?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

2020-05-04 Thread GitBox


pabloem commented on pull request #11086:
URL: https://github.com/apache/beam/pull/11086#issuecomment-623800613







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on a change in pull request #11039: [BEAM-9383] Staging Dataflow artifacts from environment

2020-05-04 Thread GitBox


ihji commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r419829971



##
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
##
@@ -784,7 +877,25 @@ public DataflowPipelineJob run(Pipeline pipeline) {
 "Executing pipeline on the Dataflow Service, which will have billing 
implications "
 + "related to Google Compute Engine usage and other Google Cloud 
Services.");
 
-List packages = options.getStager().stageDefaultFiles();
+// Capture the sdkComponents for look up during step translations
+SdkComponents sdkComponents = SdkComponents.create();
+
+DataflowPipelineOptions dataflowOptions = 
options.as(DataflowPipelineOptions.class);
+String workerHarnessContainerImageURL = 
DataflowRunner.getContainerImageForJob(dataflowOptions);
+RunnerApi.Environment defaultEnvironmentForDataflow =
+Environments.createDockerEnvironment(workerHarnessContainerImageURL);
+
+sdkComponents.registerEnvironment(
+defaultEnvironmentForDataflow
+.toBuilder()
+.addAllDependencies(getDefaultArtifacts())
+.build());
+
+RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, 
sdkComponents, true);
+
+LOG.debug("Portable pipeline proto:\n{}", 
TextFormat.printToString(pipelineProto));

Review comment:
   This debug log is not new. It's just relocated. Do you think it would be 
better to remove this?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on a change in pull request #11039: [BEAM-9383] Staging Dataflow artifacts from environment

2020-05-04 Thread GitBox


ihji commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r419829659



##
File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
##
@@ -210,56 +209,55 @@ public static Environment createProcessEnvironment(
 }
   }
 
-  private static List getArtifacts(List 
stagingFiles) {
-Set pathsToStage = Sets.newHashSet(stagingFiles);
+  public static List getArtifacts(
+  List stagingFiles, StagingFileNameGenerator generator) {
 ImmutableList.Builder artifactsBuilder = 
ImmutableList.builder();
-for (String path : pathsToStage) {
+for (String path : ImmutableSet.copyOf(stagingFiles)) {

Review comment:
   `ImmutableSet` preserves the order but I think we don't need to make a 
copy here. Will use `LinkedHashSet` instead.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] damondouglas commented on pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas

2020-05-04 Thread GitBox


damondouglas commented on pull request #11564:
URL: https://github.com/apache/beam/pull/11564#issuecomment-623799315


   @henryken and @lostluck I updated [the Stepik 
course](https://stepik.org/course/70387) and commited the `*-remote-info.yaml` 
files.  Thank you both for your help.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rahul8383 commented on a change in pull request #11569: [BEAM-9840] Support for Parameterized Types when converting from HCat…

2020-05-04 Thread GitBox


rahul8383 commented on a change in pull request #11569:
URL: https://github.com/apache/beam/pull/11569#discussion_r419827790



##
File path: 
sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/SchemaUtilsTest.java
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.hcatalog;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class SchemaUtilsTest {
+  @Test
+  public void testParameterizedTypesToBeamTypes() {
+List listOfFieldSchema = new ArrayList<>();
+listOfFieldSchema.add(new FieldSchema("parameterizedChar", "char(10)", 
null));
+listOfFieldSchema.add(new FieldSchema("parameterizedVarchar", 
"varchar(100)", null));
+listOfFieldSchema.add(new FieldSchema("parameterizedDecimal", 
"decimal(30,16)", null));
+
+Schema expectedSchema =
+Schema.builder()
+.addNullableField("parameterizedChar", Schema.FieldType.STRING)
+.addNullableField("parameterizedVarchar", Schema.FieldType.STRING)
+.addNullableField("parameterizedDecimal", Schema.FieldType.DECIMAL)

Review comment:
   Thanks @TheNeuralBit for the review.
   
   I am adding logical types in `schemas.logicaltypes` called 
`VariableLengthBytes`, `FixedLengthString`, `VariableLengthString`, 
`LogicalDecimal` as part of #11581 .
   
   I will take up the task of mapping these to logical types once my other PR 
gets merged. I also hope that #11272 get merged by then, so that I can use 
`SqlTypes.DATE` logical type. Can you please create a JIRA ticket and assign it 
to me.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on a change in pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-04 Thread GitBox


aaltay commented on a change in pull request #11554:
URL: https://github.com/apache/beam/pull/11554#discussion_r419826968



##
File path: website/www/site/content/en/community/contact-us.md
##
@@ -0,0 +1,47 @@
+---
+title: "Contact Us"
+aliases:
+  - /community/
+  - /use/issue-tracking/
+  - /use/mailing-lists/
+  - /get-started/support/
+---
+
+
+# Contact Us
+
+There are many ways to reach the Beam user and developer communities - use
+whichever one seems best.
+
+

Review comment:
   What is the problem here? What is markdownify and what is superscripts ?

##
File path: 
website/www/site/content/en/documentation/transforms/java/elementwise/regex.md
##
@@ -0,0 +1,34 @@
+---
+title: "Regex"
+---
+
+# Regex
+
+https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Regex.html";>
+  https://beam.apache.org/images/logos/sdks/java.png"; 
width="20px" height="20px"
+   alt="Javadoc" />
+ Javadoc
+
+
+

Review comment:
   Why do we multiple breaks here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on a change in pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-04 Thread GitBox


robertwb commented on a change in pull request #11554:
URL: https://github.com/apache/beam/pull/11554#discussion_r419822575



##
File path: 
website/www/site/content/en/documentation/transforms/python/elementwise/pardo.md
##
@@ -0,0 +1,142 @@
+---
+title: "Partition"

Review comment:
   Another copy of "Partiton." (There may be others, we should verify we 
haven't lost content in the move.)

##
File path: 
website/www/site/content/en/documentation/transforms/python/elementwise/map.md
##
@@ -1,8 +1,5 @@
 ---
-layout: section
-title: "Map"
-permalink: /documentation/transforms/python/elementwise/map/
-section_menu: section-menu/documentation.html
+title: "Partition"

Review comment:
   Is this Partition or Map?

##
File path: 
website/www/site/content/en/documentation/transforms/python/elementwise/kvswap.md
##
@@ -0,0 +1,142 @@
+---
+title: "Partition"

Review comment:
   This seems to be the wrong page. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] jaketf commented on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages

2020-05-04 Thread GitBox


jaketf commented on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623793025


   @chamikaramj @lukecwik 
   Apologies, I believe the the NPE was a user error on my part. 
   I've been able to revert my changes to to OffsetRangeTracker without 
reintroducing the NPE.
   
   To help future users as foolish as me,  to get a less confusing NPE, I 
suggest we do something like I put in 8891292. But this is definitely not a 
core issue in OffsetRangeTracker.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb opened a new pull request #11608: Migration of Jekylle to Hugo

2020-05-04 Thread GitBox


robertwb opened a new pull request #11608:
URL: https://github.com/apache/beam/pull/11608


   This should be the same as #11554, but with the massive commit split up into 
   
   (1) Infrastructure changes
   (2) Automated refactoring.
   (3) Manual refactoring.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastComp

[GitHub] [beam] henryken commented on pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas

2020-05-04 Thread GitBox


henryken commented on pull request #11564:
URL: https://github.com/apache/beam/pull/11564#issuecomment-623783144


   Please wait for the course uploading before merging this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] jaketf edited a comment on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages

2020-05-04 Thread GitBox


jaketf edited a comment on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796


   @chamikaramj thanks for the suggestion. I will look into using BoundedSource 
API in a separate PR and we can compare.
   
   Unfortunately, regular DoFns don't cut it because a single elements outputs 
are committed atomically (see this 
[conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)).
   Basically we have one input element (HL7v2 store) exploding to many, many 
output elements (all the messages in that store) in a single ProcessElement 
call. I'm trying to explore strategies for splitting up this listing (because 
due to pagination nature of messages.list it is single threaded).
   
   I originally chose splittable DoFn over BoundedSource based off the 
sentiment of this statement:
   > **Coding against the Source API involves a lot of boilerplate and is 
error-prone**, and it does not compose well with the rest of the Beam model 
because a Source can appear only at the root of a pipeline. - 
https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
   
   Using splittable DoFn also made reading from multiple HL7v2 stores come for 
free (e.g. if you had several regional HL7v2 stores and your use case was to 
read from them all and write to a single multi-regional store). This admittedly 
a rather contrived use case. 
   
   The blog also mentions 
   - A Source can not emit an additional output (for example, records that 
failed to parse).
   - Healthcare customers feeding requirements for this plugin want DLQ on 
all sinks and sources. To be consistent with the streaming API provided in 
`HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I 
believe this is more of a nice to have for batch use cases (because there's no 
room for passing ListMessages bad messages IDs like there is in HL7v2IO.Read).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on a change in pull request #11582: [BEAM-9650] Add ReadAllFromBigQuery PTransform

2020-05-04 Thread GitBox


aaltay commented on a change in pull request #11582:
URL: https://github.com/apache/beam/pull/11582#discussion_r419808099



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -283,6 +284,8 @@ def compute_table_name(row):
 'BigQuerySink',
 'WriteToBigQuery',
 'ReadFromBigQuery',
+'ReadAllFromBigQueryRequest',

Review comment:
   Does the java api use a similar concept?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] henryken commented on a change in pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas

2020-05-04 Thread GitBox


henryken commented on a change in pull request #11564:
URL: https://github.com/apache/beam/pull/11564#discussion_r419807983



##
File path: learning/katas/go/Core Transforms/Map/ParDo struct/pkg/task/task.go
##
@@ -18,10 +18,7 @@ package task
 import "github.com/apache/beam/sdks/go/pkg/beam"
 
 func ApplyTransform(s beam.Scope, input beam.PCollection) beam.PCollection {
-   processFn := &multiplyByFn{
-   Factor: 5,
-   }
-   return beam.ParDo(s, processFn, input)
+   return beam.ParDo(s, &multiplyByFn{Factor: 5}, input)

Review comment:
   This looks good now! Thanks @damondouglas!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] henryken commented on a change in pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas

2020-05-04 Thread GitBox


henryken commented on a change in pull request #11564:
URL: https://github.com/apache/beam/pull/11564#discussion_r419807137



##
File path: learning/katas/go/Core Transforms/Map/ParDo/task.md
##
@@ -0,0 +1,39 @@
+
+
+# ParDo
+
+ParDo is a Beam transform for generic parallel processing. The ParDo 
processing paradigm is similar to the “Map” 
+phase of a Map/Shuffle/Reduce-style algorithm: a ParDo transform considers 
each element in the input PCollection,
+performs some processing function (your user code) on that element, and emits 
zero, one, or multiple elements to an 
+output PCollection.
+
+**Kata:** Please write a simple ParDo that maps the input element by 
multiplying it by 10.

Review comment:
   @lostluck, the space will not create any problem. It is friendlier for 
the learner to see the lesson and task names in the natural way versus using 
underscore name.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on a change in pull request #11560: Auto-inferring project for ReadFromBigQuery

2020-05-04 Thread GitBox


pabloem commented on a change in pull request #11560:
URL: https://github.com/apache/beam/pull/11560#discussion_r419805319



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -526,7 +526,7 @@ def reader(self, test_bigquery_client=None):
 
 
 def _to_bool(value):
-  return value == 'true'
+  return value

Review comment:
   ahhh what a great catch. thanks Yichi!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11560: Auto-inferring project for ReadFromBigQuery

2020-05-04 Thread GitBox


pabloem commented on pull request #11560:
URL: https://github.com/apache/beam/pull/11560#issuecomment-623779892


   Run Python 3.7 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] y1chi commented on a change in pull request #11560: Auto-inferring project for ReadFromBigQuery

2020-05-04 Thread GitBox


y1chi commented on a change in pull request #11560:
URL: https://github.com/apache/beam/pull/11560#discussion_r419804731



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -526,7 +526,7 @@ def reader(self, test_bigquery_client=None):
 
 
 def _to_bool(value):
-  return value == 'true'
+  return value

Review comment:
   Do we need the _to_bool function then, would the bool casting suffice?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on pull request #11603: [BEAM-9875] Use FlinkRunner instead of PortableRunner in cross-langua…

2020-05-04 Thread GitBox


ibzib commented on pull request #11603:
URL: https://github.com/apache/beam/pull/11603#issuecomment-623772817


   R: @ihji 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on pull request #11603: [BEAM-9875] Use FlinkRunner instead of PortableRunner in cross-langua…

2020-05-04 Thread GitBox


ibzib commented on pull request #11603:
URL: https://github.com/apache/beam/pull/11603#issuecomment-623772242


   Python precommit is failing due to BEAM-9767



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] jaketf edited a comment on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages

2020-05-04 Thread GitBox


jaketf edited a comment on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796


   @chamikaramj thanks for the suggestion. I will look into using BoundedSource 
API in a separate PR and we can compare.
   
   Unfortunately, regular DoFns don't cut it because a single elements outputs 
are committed atomically (see this 
[conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)).
   Basically we have one input element (HL7v2 store) exploding to many, many 
output elements (all the messages in that store) in a single ProcessElement 
call. I'm trying to explore strategies for splitting up this listing.
   
   I originally chose splittable DoFn over BoundedSource based off the 
sentiment of this statement:
   > **Coding against the Source API involves a lot of boilerplate and is 
error-prone**, and it does not compose well with the rest of the Beam model 
because a Source can appear only at the root of a pipeline. - 
https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
   
   Using splittable DoFn also made reading from multiple HL7v2 stores come for 
free (e.g. if you had several regional HL7v2 stores and your use case was to 
read from them all and write to a single multi-regional store). This admittedly 
a rather contrived use case. 
   
   The blog also mentions 
   - A Source can not emit an additional output (for example, records that 
failed to parse).
   - Healthcare customers feeding requirements for this plugin want DLQ on 
all sinks and sources. To be consistent with the streaming API provided in 
`HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I 
believe this is more of a nice to have for batch use cases (because there's no 
room for passing ListMessages bad messages IDs like there is in HL7v2IO.Read).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] jaketf edited a comment on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages

2020-05-04 Thread GitBox


jaketf edited a comment on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796


   @chamikaramj thanks for the suggestion. I will look into using BoundedSource 
API in a separate PR and we can compare.
   
   Unfortunately, regular DoFns don't cut it because a single elements outputs 
are committed atomically (see this 
[conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)).
   Basically we have one input element (HL7v2 store) exploding to many, many 
output elements (all the messages in that store) in a single ProcessElement 
call. I'm trying to explore strategies for splitting up this listing.
   
   I originally chose splittable DoFn over BoundedSource based off the 
sentiment of this statement:
   > **Coding against the Source API involves a lot of boilerplate and is 
error-prone**, and it does not compose well with the rest of the Beam model 
because a Source can appear only at the root of a pipeline. - 
https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
   
   The blog also mentions 
   - A Source can not emit an additional output (for example, records that 
failed to parse).
   - Healthcare customers feeding requirements for this plugin want DLQ on 
all sinks and sources. To be consistent with the streaming API provided in 
`HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I 
believe this is more of a nice to have for batch use cases (because there's no 
room for passing ListMessages bad messages IDs like there is in HL7v2IO.Read).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] jaketf edited a comment on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages

2020-05-04 Thread GitBox


jaketf edited a comment on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796


   @chamikaramj thanks for the suggestion. I will look into using BoundedSource 
API.
   
   Unfortunately, regular DoFns don't cut it because a single elements outputs 
are committed atomically (see this 
[conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)).
   Basically we have one input element (HL7v2 store) exploding to many, many 
output elements (all the messages in that store) in a single ProcessElement 
call. I'm trying to explore strategies for splitting up this listing.
   
   I originally chose splittable DoFn over BoundedSource based off the 
sentiment of this statement:
   > **Coding against the Source API involves a lot of boilerplate and is 
error-prone**, and it does not compose well with the rest of the Beam model 
because a Source can appear only at the root of a pipeline. - 
https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
   
   The blog also mentions 
   - A Source can not emit an additional output (for example, records that 
failed to parse).
   - Healthcare customers feeding requirements for this plugin want DLQ on 
all sinks and sources. To be consistent with the streaming API provided in 
`HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I 
believe this is more of a nice to have for batch use cases (because there's no 
room for passing ListMessages bad messages IDs like there is in HL7v2IO.Read).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on pull request #11403: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch

2020-05-04 Thread GitBox


ibzib commented on pull request #11403:
URL: https://github.com/apache/beam/pull/11403#issuecomment-623767557


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on pull request #11403: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch

2020-05-04 Thread GitBox


ibzib commented on pull request #11403:
URL: https://github.com/apache/beam/pull/11403#issuecomment-623766941


   Run Python 2 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] jaketf commented on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages

2020-05-04 Thread GitBox


jaketf commented on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796


   @chamikaramj thanks for the suggestion. I will look into using BoundedSource 
API.
   
   Unfortunately, regular DoFns don't cut it because a single elements outputs 
are committed atomically (see this 
[conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)).
   Basically we have one input element (HL7v2 store) exploding to many, many 
output elements (all the messages in that store) in a single ProcessElement 
call. I'm trying to explore strategies for splitting up this listing.
   
   I originally chose splittable DoFn over BoundedSource based off the 
sentiment of this statement:
   > **Coding against the Source API involves a lot of boilerplate and is 
error-prone**, and it does not compose well with the rest of the Beam model 
because a Source can appear only at the root of a pipeline. - 
https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
   
   The blog also mentions 
   - A Source can not emit an additional output (for example, records that 
failed to parse).
   - Healthcare customers feeding requirements for this plugin want DLQ on 
all sinks and sources. To be consistent with the streaming API provided in 
`HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I 
believe this is more of a nice to have for batch use cases.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on pull request #11603: [BEAM-9875] Use FlinkRunner instead of PortableRunner in cross-langua…

2020-05-04 Thread GitBox


ibzib commented on pull request #11603:
URL: https://github.com/apache/beam/pull/11603#issuecomment-623752060


   Run Python 2 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj opened a new pull request #11607: [BEAM-9430] Updates the watermark estimators to make sure the returned watermark …

2020-05-04 Thread GitBox


chamikaramj opened a new pull request #11607:
URL: https://github.com/apache/beam/pull/11607


   …is within bounds
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/la

[GitHub] [beam] chamikaramj edited a comment on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages

2020-05-04 Thread GitBox


chamikaramj edited a comment on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623741333


   Is using SplittableDoFn API here intentional ?
   
   I think this API is being updated. cc: @lukecwik 
   
   If you need dynamic work rebalancing, consider using the BoundedSource 
interface. Otherwise we can just implement the source using regular DoFns and 
wait for SplittableDoFn API to stabilize before adding support for dynamic work 
rebalancing.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages

2020-05-04 Thread GitBox


chamikaramj commented on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623741333


   Is using SplittableDoFn API here intentional ?
   
   I think this API is being updated. cc: @lukecwik 
   
   If you need dynamic work rebalancing, consider using the BoundedSource 
interface. Otherwise we can just implement the source as a regular and wait for 
SplittableDoFn API to stabilize before adding support for dynamic work 
rebalancing.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on a change in pull request #11560: Auto-inferring project for ReadFromBigQuery

2020-05-04 Thread GitBox


pabloem commented on a change in pull request #11560:
URL: https://github.com/apache/beam/pull/11560#discussion_r419765325



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -526,7 +526,7 @@ def reader(self, test_bigquery_client=None):
 
 
 def _to_bool(value):
-  return value == 'true'
+  return value

Review comment:
   the coder incorrectly expected boolean types to be encoded as strings. 
This is incorrect, as JSON supports boolean 'natively', like this: 
https://json-schema.org/understanding-json-schema/reference/boolean.html





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] jaketf commented on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages

2020-05-04 Thread GitBox


jaketf commented on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623737955


   > The NPE is caused by `OffsetRangeTracker.lastAttemptedOffset` being 
unboxed in `OffsetRangeTracker::checkDone` 
[here](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L98).
   > 
   > 
[all](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L53)
 
[other](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L77)
 
[uses](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L119)
 check that `lastAttemptedOffset` is not null.
   > 
   > I'm not sure if this was intentional in the implementation of 
OffsetRangeTracker.
   
   @chamikaramj pablo said you might know about this.
   
   should check done have some conditional on `lastAttemptedOffset != null`
   e.g.
   ```java
 @Override
 public void checkDone() throws IllegalStateException {
   if (range.getFrom() == range.getTo()) {
 return;
   }
   
   if (lastAttemptedOffset != null) {
 checkState(
 lastAttemptedOffset >= range.getTo() - 1,
 "Last attempted offset was %s in range %s, claiming work in [%s, 
%s) was not attempted",
 lastAttemptedOffset,
 range,
 lastAttemptedOffset + 1,
 range.getTo());
   }
 }
   ```
   
   I'm not really familiar with what checkDone should do in the case that 
lastAttemptedOffset was null.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on pull request #11603: [BEAM-9875] Use FlinkRunner instead of PortableRunner in cross-langua…

2020-05-04 Thread GitBox


ibzib commented on pull request #11603:
URL: https://github.com/apache/beam/pull/11603#issuecomment-623736044


   Run Python 2 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages

2020-05-04 Thread GitBox


pabloem commented on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623733179


   @chamikaramj can you take a look at this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11600: Allow accessing window parameters in finishBundle

2020-05-04 Thread GitBox


pabloem commented on pull request #11600:
URL: https://github.com/apache/beam/pull/11600#issuecomment-623732876


   can you please create a JIRA?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] y1chi commented on a change in pull request #11560: Auto-inferring project for ReadFromBigQuery

2020-05-04 Thread GitBox


y1chi commented on a change in pull request #11560:
URL: https://github.com/apache/beam/pull/11560#discussion_r419751064



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -526,7 +526,7 @@ def reader(self, test_bigquery_client=None):
 
 
 def _to_bool(value):
-  return value == 'true'
+  return value

Review comment:
   could you explain why this change is needed?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] Hannah-Jiang opened a new pull request #11606: [BEAM-9880] fix cannot touch issue

2020-05-04 Thread GitBox


Hannah-Jiang opened a new pull request #11606:
URL: https://github.com/apache/beam/pull/11606


   R: @ibzib 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_Po

[GitHub] [beam] youngoli commented on a change in pull request #11553: [BEAM-9799] Automated validation for RTrackers.

2020-05-04 Thread GitBox


youngoli commented on a change in pull request #11553:
URL: https://github.com/apache/beam/pull/11553#discussion_r419743023



##
File path: sdks/go/pkg/beam/core/runtime/exec/pardo.go
##
@@ -162,6 +168,14 @@ func (n *ParDo) processMainInput(mainIn *MainInput) error {
return nil
 }
 
+func rtErrHelper(err error) error {
+   if err != nil {
+   return err
+   } else {
+   return errors.New("RTracker IsDone failed for unspecified 
reason")
+   }

Review comment:
   Yeah good point. I'll rephrase it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] jaketf commented on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages

2020-05-04 Thread GitBox


jaketf commented on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623718338


   The NPE is caused by `OffsetRangeTracker.lastAttemptedOffset` being unboxed 
in `OffsetRangeTracker::checkDone` 
[here](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L98).
 
   
   
[all](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L53)
 
[other](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L77)
 
[uses](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L119)
 check that `lastAttemptedOffset`  is not null.
   
   I'm not sure if this was intentional in the implementation of 
OffsetRangeTracker. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on pull request #11403: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch

2020-05-04 Thread GitBox


ibzib commented on pull request #11403:
URL: https://github.com/apache/beam/pull/11403#issuecomment-623717447


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on pull request #11403: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch

2020-05-04 Thread GitBox


ibzib commented on pull request #11403:
URL: https://github.com/apache/beam/pull/11403#issuecomment-623716216


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #10055: [BEAM-8603] Add Python SqlTransform

2020-05-04 Thread GitBox


pabloem commented on pull request #10055:
URL: https://github.com/apache/beam/pull/10055#issuecomment-623714745


   yoohooo



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #11574: [BEAM-9449] Pass PipelineOptions through expansion service

2020-05-04 Thread GitBox


TheNeuralBit commented on pull request #11574:
URL: https://github.com/apache/beam/pull/11574#issuecomment-623713457


   Synced this up now that #11571 is merged. This is ready for review now.
   
   @ihji do you have time to review?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] youngoli commented on pull request #11605: [BEAM-9883] Refactor SDF test restrictions.

2020-05-04 Thread GitBox


youngoli commented on pull request #11605:
URL: https://github.com/apache/beam/pull/11605#issuecomment-623713567


   R: @lostluck 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] youngoli opened a new pull request #11605: [BEAM-9883] Refactor SDF test restrictions.

2020-05-04 Thread GitBox


youngoli opened a new pull request #11605:
URL: https://github.com/apache/beam/pull/11605


   Refactoring the restriction used for testing SDFs. Instead of having
   some obtuse behavior that we can validate, it now just contains a bunch
   of flags we can flip to track that it was used in each method.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache

[GitHub] [beam] robertwb commented on a change in pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-04 Thread GitBox


robertwb commented on a change in pull request #11554:
URL: https://github.com/apache/beam/pull/11554#discussion_r419717683



##
File path: website/www/site/content/en/blog/splittable-do-fn.md
##
@@ -475,8 +471,7 @@ IO connectors. However, a large amount of work is in 
progress or planned.
 
 As of August 2017, SDF is available for use in the Beam Java Direct runner and
 Dataflow Streaming runner, and implementation is in progress in the Flink and
-Apex runners; see [capability matrix]({{ site.baseurl
-}}/documentation/runners/capability-matrix/) for the current status. Support
+Apex runners; see [capability 
matrix]({/documentation/runners/capability-matrix/) for the current status. 
Support

Review comment:
   Stray {?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on pull request #11403: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch

2020-05-04 Thread GitBox


ibzib commented on pull request #11403:
URL: https://github.com/apache/beam/pull/11403#issuecomment-623692174







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on pull request #11601: [BEAM-9877] [cherry-pick] Make BatchGroupAlsoByWindowViaIteratorsFn extend the Elem…

2020-05-04 Thread GitBox


ibzib commented on pull request #11601:
URL: https://github.com/apache/beam/pull/11601#issuecomment-623684159


   Java test failing due to BEAM-9164 (known flake).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] apilloud commented on pull request #11604: [BEAM-9659][BEAM-9657][BEAM-9664] Reject unsupported unnest joins

2020-05-04 Thread GitBox


apilloud commented on pull request #11604:
URL: https://github.com/apache/beam/pull/11604#issuecomment-623679534


   R: @robinyqiu 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] apilloud opened a new pull request #11604: [BEAM-9659][BEAM-9657][BEAM-9664] Reject unsupported unnest joins

2020-05-04 Thread GitBox


apilloud opened a new pull request #11604:
URL: https://github.com/apache/beam/pull/11604


   There is no trivial path to fixing these, so just ensure they return a 
sensible error for now.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Pyt

[GitHub] [beam] ibzib commented on pull request #11603: [BEAM-9875] Use FlinkRunner instead of PortableRunner in cross-langua…

2020-05-04 Thread GitBox


ibzib commented on pull request #11603:
URL: https://github.com/apache/beam/pull/11603#issuecomment-623678292


   Run Python 2 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib opened a new pull request #11603: [BEAM-9875] Use FlinkRunner instead of PortableRunner in cross-langua…

2020-05-04 Thread GitBox


ibzib opened a new pull request #11603:
URL: https://github.com/apache/beam/pull/11603


   …ge tests.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommi

[GitHub] [beam] pabloem commented on pull request #11560: Auto-inferring project for ReadFromBigQuery

2020-05-04 Thread GitBox


pabloem commented on pull request #11560:
URL: https://github.com/apache/beam/pull/11560#issuecomment-623677009


   @kamilwu @y1chi  PTAL



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tudorm commented on pull request #11601: [BEAM-9877] Make BatchGroupAlsoByWindowViaIteratorsFn extend the Elem…

2020-05-04 Thread GitBox


tudorm commented on pull request #11601:
URL: https://github.com/apache/beam/pull/11601#issuecomment-623674559


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tudorm commented on pull request #11598: [BEAM-9877] Estimate sizes of group-by-key values behind a key lazily only.

2020-05-04 Thread GitBox


tudorm commented on pull request #11598:
URL: https://github.com/apache/beam/pull/11598#issuecomment-623673128


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11560: Auto-inferring project for ReadFromBigQuery

2020-05-04 Thread GitBox


pabloem commented on pull request #11560:
URL: https://github.com/apache/beam/pull/11560#issuecomment-623669571


   Run Python2_PVR_Flink PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] apilloud commented on pull request #11602: [BEAM-9661] Fix ORDER BY with LIMIT

2020-05-04 Thread GitBox


apilloud commented on pull request #11602:
URL: https://github.com/apache/beam/pull/11602#issuecomment-623663334


   R: @robinyqiu 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] apilloud opened a new pull request #11602: [BEAM-9661] Fix ORDER BY with LIMIT

2020-05-04 Thread GitBox


apilloud opened a new pull request #11602:
URL: https://github.com/apache/beam/pull/11602


   It turns out ZetaSQL column references don't always start at 0, an example 
of this is on union operators. We probably have a number of bugs in this regard.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/b

[GitHub] [beam] ibzib opened a new pull request #11601: [BEAM-9877] Make BatchGroupAlsoByWindowViaIteratorsFn extend the Elem…

2020-05-04 Thread GitBox


ibzib opened a new pull request #11601:
URL: https://github.com/apache/beam/pull/11601


   …entByteSizeObservableIterable so that size estimation is lazy
   
   Cherry-pick of #11598. (I removed the commit history because I forgot to 
squash before merging the original PR, and one atomic commit is a lot easier to 
deal with.) R: @tudorm 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![

[GitHub] [beam] TheNeuralBit commented on a change in pull request #11569: [BEAM-9840] Support for Parameterized Types when converting from HCat…

2020-05-04 Thread GitBox


TheNeuralBit commented on a change in pull request #11569:
URL: https://github.com/apache/beam/pull/11569#discussion_r419653421



##
File path: 
sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/SchemaUtilsTest.java
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.hcatalog;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class SchemaUtilsTest {
+  @Test
+  public void testParameterizedTypesToBeamTypes() {
+List listOfFieldSchema = new ArrayList<>();
+listOfFieldSchema.add(new FieldSchema("parameterizedChar", "char(10)", 
null));
+listOfFieldSchema.add(new FieldSchema("parameterizedVarchar", 
"varchar(100)", null));
+listOfFieldSchema.add(new FieldSchema("parameterizedDecimal", 
"decimal(30,16)", null));
+
+Schema expectedSchema =
+Schema.builder()
+.addNullableField("parameterizedChar", Schema.FieldType.STRING)
+.addNullableField("parameterizedVarchar", Schema.FieldType.STRING)
+.addNullableField("parameterizedDecimal", Schema.FieldType.DECIMAL)

Review comment:
   I think these should map to logical types instead of to primitives so we 
don't lose the information from the parameter. Unfortunately we don't (yet) 
have good logical types in `schemas.logicaltypes` to map them to, but maybe we 
will after your other PR, https://github.com/apache/beam/pull/11581 (or you 
could just add the relevant ones here).
   
   `char(10)` looks like it could map to a `FixedLengthString` logical type, 
`varchar(100)` probably deserves its own type, maybe just called `Varchar`? and 
I've been meaning to add a logical type for DECIMAL parameterized by precision 
and scale as part of 
[BEAM-7554](https://issues.apache.org/jira/browse/BEAM-7554) (and deprecate the 
primitive one).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on a change in pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-04 Thread GitBox


robertwb commented on a change in pull request #11554:
URL: https://github.com/apache/beam/pull/11554#discussion_r419661024



##
File path: website/www/site/content/en/contribute/release-guide.md
##
@@ -218,7 +214,7 @@ docker login docker.io
 After successful login, authorization info will be stored at 
~/.docker/config.json file. For example,
 ```
 "https://index.docker.io/v1/": {
-   "auth": "xx"
+   "auth": "aGFubmFoamlhbmc6cmtkdGpmZ2hrMTIxMw=="

Review comment:
   Probably don't want to check this in.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on a change in pull request #11597: [BEAM-9874] Support clearing timers in portable batch mode (Spark/Flink)

2020-05-04 Thread GitBox


ibzib commented on a change in pull request #11597:
URL: https://github.com/apache/beam/pull/11597#discussion_r419658069



##
File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryTimerInternals.java
##
@@ -167,20 +165,17 @@ public void deleteTimer(StateNamespace namespace, String 
timerId, TimeDomain tim
   @Deprecated
   @Override
   public void deleteTimer(StateNamespace namespace, String timerId, String 
timerFamilyId) {
-TimerData existing = existingTimers.get(namespace, timerId + '+' + 
timerFamilyId);
-if (existing != null) {
-  deleteTimer(existing);
+TimerData removedTimer = existingTimers.remove(namespace, timerId + '+' + 
timerFamilyId);
+if (removedTimer != null) {
+  timersForDomain(removedTimer.getDomain()).remove(removedTimer);
 }
   }
 
   /** @deprecated use {@link #deleteTimer(StateNamespace, String, 
TimeDomain)}. */

Review comment:
   Maybe worth a JIRA/refactor? I'll leave it up to you since I am not as 
familiar with this code.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on a change in pull request #11598: [BEAM-9877] Estimate sizes of group-by-key values behind a key lazily only.

2020-05-04 Thread GitBox


ibzib commented on a change in pull request #11598:
URL: https://github.com/apache/beam/pull/11598#discussion_r419649174



##
File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowViaIteratorsFn.java
##
@@ -165,12 +168,17 @@ public WindowReiterable(
 }
 
 @Override
-public Reiterator iterator() {
+public WindowReiterator iterator() {

Review comment:
   It's odd that `ElementByteSizeObservableIterable::iterator` adds 
observers within the method body. I assume this is for historic reasons, since 
it doesn't seem to do anything now, and the comment documenting references a 
`setObserver` method that doesn't exist. Anyway, your change looks fine. But we 
should consider cleaning this up.
   
   
https://github.com/apache/beam/blob/6453e859badcb629ae2528b77d84235b7291ff89/sdks/java/core/src/main/java/org/apache/beam/sdk/util/common/ElementByteSizeObservableIterable.java#L49-L61





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #11569: [BEAM-9840] Support for Parameterized Types when converting from HCat…

2020-05-04 Thread GitBox


TheNeuralBit commented on pull request #11569:
URL: https://github.com/apache/beam/pull/11569#issuecomment-623636289


   @akedin isn't very involved with Beam anymore. I think I can help review 
this instead



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on a change in pull request #11517: [BEAM-9643] Adding Go SDF Documentation.

2020-05-04 Thread GitBox


lostluck commented on a change in pull request #11517:
URL: https://github.com/apache/beam/pull/11517#discussion_r419635469



##
File path: sdks/go/pkg/beam/core/sdf/sdf.go
##
@@ -13,63 +13,64 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
-// Package sdf is experimental, incomplete, and not yet meant for general 
usage.
+// Package contains interfaces used specifically for splittable DoFns.
+//
+// Warning: Splittable DoFns are still experimental, largely untested, and
+// likely to have bugs.
 package sdf
 
 // RTracker is an interface used to interact with restrictions while 
processing elements in
-// SplittableDoFns. Each implementation of RTracker is expected to be used for 
tracking a single
-// restriction type, which is the type that should be used to create the 
RTracker, and output by
-// TrySplit.
+// splittable DoFns (specifically, in the ProcessElement method). Each 
RTracker tracks the progress
+// of a single restriction.
 type RTracker interface {
-   // TryClaim attempts to claim the block of work in the current 
restriction located at a given
-   // position. This method must be used in the ProcessElement method of 
Splittable DoFns to claim
-   // work before performing it. If no work is claimed, the ProcessElement 
is not allowed to perform
-   // work or emit outputs. If the claim is successful, the DoFn must 
process the entire block. If
-   // the claim is unsuccessful the ProcessElement method of the DoFn must 
return without performing
-   // any additional work or emitting any outputs.
-   //
-   // TryClaim accepts an arbitrary value that can be interpreted as the 
position of a block, and
-   // returns a boolean indicating whether the claim succeeded.
+   // TryClaim attempts to claim the block of work located in the given 
position of the
+   // restriction. This method must be called in ProcessElement to claim 
work before it can be
+   // processed. Processing work without claiming it first can lead to 
incorrect output.
//
-   // If the claim fails due to an error, that error can be retrieved with 
GetError.
+   // If the claim is successful, the DoFn must process the entire block. 
If the claim is
+   // unsuccessful ProcessElement method of the DoFn must return without 
performing
+   // any additional work or emitting any outputs.
//
-   // For SDFs to work properly, claims must always be monotonically 
increasing in reference to the
-   // restriction's start and end points, and every block of work in a 
restriction must be claimed.
+   // If the claim fails due to an error, that error is stored and will be 
automatically emitted
+   // when the RTracker is validated, or can be manually retrieved with 
GetError.
//
// This pseudocode example illustrates the typical usage of TryClaim:
//
-   //  pos = position of first block after restriction.start
+   //  pos = position of first block within the restriction
//  for TryClaim(pos) == true {
//  // Do all work in the claimed block and emit outputs.
-   //  pos = position of next block
+   //  pos = position of next block within the restriction
//  }
//  return
TryClaim(pos interface{}) (ok bool)
 
-   // GetError returns the error that made this RTracker stop executing, 
and it returns nil if no
-   // error occurred. If IsDone fails while validating this RTracker, this 
method will be
-   // called to log the error.
+   // GetError returns the error that made this RTracker stop executing, 
and returns nil if no
+   // error occurred. This is the error that is emitted if automated 
validation fails.
GetError() error
 
-   // TrySplit splits the current restriction into a primary and residual 
based on a fraction of the
-   // work remaining. The split is performed along the first valid split 
point located after the
-   // given fraction of the remainder. This method is called by the SDK 
harness when receiving a
-   // split request by the runner.
+   // TrySplit splits the current restriction into a primary (currently 
executing work) and
+   // residual (work to be split off) based on a fraction of work 
remaining. The split is performed
+   // at the first valid split point located after the given fraction of 
remaining work.
+   //
+   // For example, a fraction of 0.5 means to split at the halfway point 
of remaining work only. If
+   // 50% of work is done and 50% remaining, then a fraction of 0.5 would 
split after 75% of work.
+   //
+   // This method modifies the underlying restriction in the RTracker to 
reflect the primary. It
+   // then returns a copy of the newly modified restriction as a primary, 
and returns a new
+

[GitHub] [beam] damondouglas edited a comment on pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas

2020-05-04 Thread GitBox


damondouglas edited a comment on pull request #11564:
URL: https://github.com/apache/beam/pull/11564#issuecomment-623621585


   > Retest this please
   
   Is this addressed to me?  Would you like me to retest something?
   
   That's "addressed" to the jenkins bot to retest the PR via whatever 
automated selection of tests it requires. Since the bot is particular, I needed 
to separate it from my other comment so it would work.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on a change in pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas

2020-05-04 Thread GitBox


lostluck commented on a change in pull request #11564:
URL: https://github.com/apache/beam/pull/11564#discussion_r419627705



##
File path: learning/katas/go/Core Transforms/Map/ParDo OneToMany/task.md
##
@@ -0,0 +1,32 @@
+
+
+# ParDo - One to Many
+
+In the previous kata we learned that ParDo maps a single element into another 
element.
+In this kata we will map a single element into many by splitting a sentence 
into words.
+
+**Kata:** Please write a ParDo that maps each input sentence into words 
tokenized by whitespace (" ").
+
+
+  Use https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo";>
+  ParDo
+  with https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#hdr-DoFns";>

Review comment:
   DoFns are generally a specific thing, so an article is handy.
   ```suggestion
 with a https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#hdr-DoFns";>
   ```

##
File path: learning/katas/go/Core Transforms/Map/ParDo OneToMany/task.md
##
@@ -0,0 +1,32 @@
+
+
+# ParDo - One to Many
+
+In the previous kata we learned that ParDo maps a single element into another 
element.
+In this kata we will map a single element into many by splitting a sentence 
into words.
+
+**Kata:** Please write a ParDo that maps each input sentence into words 
tokenized by whitespace (" ").
+
+
+  Use https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo";>
+  ParDo

Review comment:
   Consider using the qualifed import name, like they'd see in their Go 
code (eg. beam.ParDo) , rather than just the single method. It would look odd 
to other languages, but the generally explicit package/provenance of 
identifiers is a hallmark of Go.

##
File path: learning/katas/go/Core Transforms/Map/ParDo/task.md
##
@@ -0,0 +1,39 @@
+
+
+# ParDo
+
+ParDo is a Beam transform for generic parallel processing. The ParDo 
processing paradigm is similar to the “Map” 
+phase of a Map/Shuffle/Reduce-style algorithm: a ParDo transform considers 
each element in the input PCollection,
+performs some processing function (your user code) on that element, and emits 
zero, one, or multiple elements to an 
+output PCollection.
+
+**Kata:** Please write a simple ParDo that maps the input element by 
multiplying it by 10.

Review comment:
   A possible adjacent task/step is converting a func DoFn into a 
Structural DoFn. Yes, this is relatively simple, but from a learning 
standpoint, it makes the distinction pretty clear, while not asking users too 
much else that might conflate with it.  (eg. Getting the wrong idea that funcs 
must be 1:1 vs 1:many/none, vs structs etc).
   
   Not necessary to do it in this PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mxm commented on a change in pull request #11597: [BEAM-9874] Support clearing timers in portable batch mode (Spark/Flink)

2020-05-04 Thread GitBox


mxm commented on a change in pull request #11597:
URL: https://github.com/apache/beam/pull/11597#discussion_r419632632



##
File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryTimerInternals.java
##
@@ -167,20 +165,17 @@ public void deleteTimer(StateNamespace namespace, String 
timerId, TimeDomain tim
   @Deprecated
   @Override
   public void deleteTimer(StateNamespace namespace, String timerId, String 
timerFamilyId) {
-TimerData existing = existingTimers.get(namespace, timerId + '+' + 
timerFamilyId);
-if (existing != null) {
-  deleteTimer(existing);
+TimerData removedTimer = existingTimers.remove(namespace, timerId + '+' + 
timerFamilyId);
+if (removedTimer != null) {
+  timersForDomain(removedTimer.getDomain()).remove(removedTimer);
 }
   }
 
   /** @deprecated use {@link #deleteTimer(StateNamespace, String, 
TimeDomain)}. */

Review comment:
   yeah, actually that might have been a mistake when the timer family was 
added because the old way was to use `TimerData` and the new way is to use 
timerId/timerFamily.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] damondouglas commented on pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas

2020-05-04 Thread GitBox


damondouglas commented on pull request #11564:
URL: https://github.com/apache/beam/pull/11564#issuecomment-623621585


   > Retest this please
   
   Is this addressed to me?  Would you like me to retest something?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chadrik commented on pull request #11038: [BEAM-7746] More typing fixes

2020-05-04 Thread GitBox


chadrik commented on pull request #11038:
URL: https://github.com/apache/beam/pull/11038#issuecomment-623619389


   Hi everyone, I have some availability to finish this PR off now.  I'm going 
to rebase it soon.  @udim  do you have the time to help me get this through 
review?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on pull request #11598: [BEAM-9877] Estimate sizes of group-by-key values behind a key lazily only.

2020-05-04 Thread GitBox


lukecwik commented on pull request #11598:
URL: https://github.com/apache/beam/pull/11598#issuecomment-623619474


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chadrik commented on a change in pull request #11038: [BEAM-7746] More typing fixes

2020-05-04 Thread GitBox


chadrik commented on a change in pull request #11038:
URL: https://github.com/apache/beam/pull/11038#discussion_r419627186



##
File path: sdks/python/apache_beam/transforms/core.py
##
@@ -1300,12 +1300,13 @@ def to_runner_api_parameter(self, context):
   common_urns.requirements.REQUIRES_STATEFUL_PROCESSING.urn)
 from apache_beam.runners.common import DoFnSignature
 sig = DoFnSignature(self.fn)
-is_splittable = sig.is_splittable_dofn()

Review comment:
   > It is an error to say is_splittable_dofn is True without returning a 
restriction coder as well and vice versa.
   
   This seems to validate my earlier assessment that a None result from this 
`get_restriction_coder` means "is not splittable", and therefore that my 
proposed change is valid.  
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas

2020-05-04 Thread GitBox


lostluck commented on pull request #11564:
URL: https://github.com/apache/beam/pull/11564#issuecomment-623617721


   (sadly, only comitters can trigger the tests even after the first time. Your 
commands were correct)
   I'm doing this review now. Thank you for your patience!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas

2020-05-04 Thread GitBox


lostluck commented on pull request #11564:
URL: https://github.com/apache/beam/pull/11564#issuecomment-623617273


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on a change in pull request #11553: [BEAM-9799] Automated validation for RTrackers.

2020-05-04 Thread GitBox


lostluck commented on a change in pull request #11553:
URL: https://github.com/apache/beam/pull/11553#discussion_r419623182



##
File path: sdks/go/pkg/beam/core/runtime/exec/pardo.go
##
@@ -162,6 +168,14 @@ func (n *ParDo) processMainInput(mainIn *MainInput) error {
return nil
 }
 
+func rtErrHelper(err error) error {
+   if err != nil {
+   return err
+   } else {
+   return errors.New("RTracker IsDone failed for unspecified 
reason")
+   }

Review comment:
   ```suggestion
}
return errors.New("RTracker.IsDone() failed for unspecified reason")
   ```

##
File path: sdks/go/pkg/beam/core/runtime/exec/pardo.go
##
@@ -162,6 +168,14 @@ func (n *ParDo) processMainInput(mainIn *MainInput) error {
return nil
 }
 
+func rtErrHelper(err error) error {
+   if err != nil {
+   return err
+   } else {
+   return errors.New("RTracker IsDone failed for unspecified 
reason")
+   }

Review comment:
   Also, in both places this is called, error or no, we don't know IsDone 
failed(), we know that the DoFn terminated without fully processing the 
restriction (If I understand IsDone's semantics correctly). Perhaps the message 
should indicate that? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] bntnam commented on pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-04 Thread GitBox


bntnam commented on pull request #11554:
URL: https://github.com/apache/beam/pull/11554#issuecomment-62360


   > @bntnam Do we have a JIRA issue for this PR where we can refer to for 
background and context information ?
   
   @manuzhang : Hey hey, here you have it. [1]
   [1] https://issues.apache.org/jira/browse/BEAM-9876



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on a change in pull request #11597: [BEAM-9874] Support clearing timers in portable batch mode (Spark/Flink)

2020-05-04 Thread GitBox


ibzib commented on a change in pull request #11597:
URL: https://github.com/apache/beam/pull/11597#discussion_r419607989



##
File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryTimerInternals.java
##
@@ -167,20 +165,17 @@ public void deleteTimer(StateNamespace namespace, String 
timerId, TimeDomain tim
   @Deprecated
   @Override
   public void deleteTimer(StateNamespace namespace, String timerId, String 
timerFamilyId) {
-TimerData existing = existingTimers.get(namespace, timerId + '+' + 
timerFamilyId);
-if (existing != null) {
-  deleteTimer(existing);
+TimerData removedTimer = existingTimers.remove(namespace, timerId + '+' + 
timerFamilyId);
+if (removedTimer != null) {
+  timersForDomain(removedTimer.getDomain()).remove(removedTimer);
 }
   }
 
   /** @deprecated use {@link #deleteTimer(StateNamespace, String, 
TimeDomain)}. */

Review comment:
   It seems strange that the deprecated method is the one being used, and 
the recommended method is not implemented.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on pull request #11521: [BEAM-9577] Update Java Runners to handle dependency-based artifact staging.

2020-05-04 Thread GitBox


robertwb commented on pull request #11521:
URL: https://github.com/apache/beam/pull/11521#issuecomment-623600647


   Ping @ihji 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tudorm commented on pull request #11598: Do not attempt to estimate sizes of group-by-key values behind a key.

2020-05-04 Thread GitBox


tudorm commented on pull request #11598:
URL: https://github.com/apache/beam/pull/11598#issuecomment-623600578


   Updated the change to perform lazy estimation instead of side-stepping the 
estimation altogether.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] Akshay-Iyangar commented on pull request #10078: [BEAM-8542] Change write to async in AWS SNS IO & remove retry logic

2020-05-04 Thread GitBox


Akshay-Iyangar commented on pull request #10078:
URL: https://github.com/apache/beam/pull/10078#issuecomment-623598544


   @aromanenko-dev - Done
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on pull request #11549: [BEAM-9136] Update doc about docker image license pulling

2020-05-04 Thread GitBox


ibzib commented on pull request #11549:
URL: https://github.com/apache/beam/pull/11549#issuecomment-623596078


   The folks working on the website migration have requested that we pause 
website changes until they are finished with #11554 ([email 
thread](https://lists.apache.org/thread.html/r6b999b6d7d1f6cbb94e16bb2deed2b65098a6b14c4ac98707fe0c36a%40%3Cdev.beam.apache.org%3E)).
 @Hannah-Jiang can you please split this PR into 2 separate PRs for release 
notes and website changes?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on pull request #11585: [BEAM-9860] Make job_endpoint required for PortableRunner

2020-05-04 Thread GitBox


ibzib commented on pull request #11585:
URL: https://github.com/apache/beam/pull/11585#issuecomment-623592467


   Thanks @boyuanzz, I will take care of it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on pull request #11585: [BEAM-9860] Make job_endpoint required for PortableRunner

2020-05-04 Thread GitBox


boyuanzz commented on pull request #11585:
URL: https://github.com/apache/beam/pull/11585#issuecomment-623585130


   This PR breaks  
:sdks:python:test-suites:portable:py2:crossLanguagePortableWordCount and  
:sdks:python:test-suites:portable:py2:crossLanguagePythonJavaFlink with the 
same error message:
   ```
   
   File "apache_beam/pipeline.py", line 525, in run |  
   -- | --
     | return self.runner.run_pipeline(self, self._options) |  
     | File "apache_beam/runners/portability/portable_runner.py", line 422, in 
run_pipeline |  
     | job_service_handle = self.create_job_service(options) |  
     | File "apache_beam/runners/portability/portable_runner.py", line 328, in 
create_job_service |  
     | server = self.default_job_server(options) |  
     | File "apache_beam/runners/portability/portable_runner.py", line 307, in 
default_job_server |  
     | 'You must specify a --job_endpoint when using --runner=PortableRunner. ' 
|  
     | NotImplementedError: You must specify a --job_endpoint when using 
--runner=PortableRunner. Alternatively, you may specify which portable runner 
you intend to use, such as --runner=FlinkRunner or --runner=SparkRunner.
   ```
   We should update `sdks/python/apache_beam/examples/wordcount_xlang.py` with 
the same requirement.
   
   File issue here: https://issues.apache.org/jira/browse/BEAM-9875



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] Akshay-Iyangar commented on pull request #10078: [BEAM-8542] Change write to async in AWS SNS IO & remove retry logic

2020-05-04 Thread GitBox


Akshay-Iyangar commented on pull request #10078:
URL: https://github.com/apache/beam/pull/10078#issuecomment-623584264


   @aromanenko-dev - cool !! fixed the typos



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aromanenko-dev commented on pull request #10078: [BEAM-8542] Change write to async in AWS SNS IO & remove retry logic

2020-05-04 Thread GitBox


aromanenko-dev commented on pull request #10078:
URL: https://github.com/apache/beam/pull/10078#issuecomment-623584314


   @Akshay-Iyangar Could you squash all your commits and I'll merge it?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] reuvenlax opened a new pull request #11600: Allow accessing window parameters in finishBundle

2020-05-04 Thread GitBox


reuvenlax opened a new pull request #11600:
URL: https://github.com/apache/beam/pull/11600


   A bundle in Beam today can contain many windows. This often makes using 
finishBundle correctly tricky. Users are sometimes seen keeping maps in their 
DoFn of windows seen in a bundle so they can properly process them in 
finishBundle. We also don't support injecting an OutputReceiver in 
finishBundle, as there's no good way to associate a window to the output.
   
   This PR allows injecting a BoundedWindow or OutputReceiver to the 
finishBundle function. In this case, the finishBundle will be invoked once per 
window seen in the bundle. If there is no such parameter, then the previous 
behavior of invoking finishBundle once is preserved.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aromanenko-dev commented on pull request #10078: [BEAM-8542] Change write to async in AWS SNS IO & remove retry logic

2020-05-04 Thread GitBox


aromanenko-dev commented on pull request #10078:
URL: https://github.com/apache/beam/pull/10078#issuecomment-623583082


   Well, it LGTM for except small typos I mentioned above. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aromanenko-dev commented on a change in pull request #10078: [BEAM-8542] Change write to async in AWS SNS IO & remove retry logic

2020-05-04 Thread GitBox


aromanenko-dev commented on a change in pull request #10078:
URL: https://github.com/apache/beam/pull/10078#discussion_r419580372



##
File path: 
sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsResponseCoderTest.java
##
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.aws2.sns;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.util.Optional;
+import java.util.OptionalInt;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class SnsResponseCoderTest {
+
+  @Test
+  public void verifyResponseWithStatusCodeAndText() throws IOException {
+
+SnsResponse expected =
+SnsResponse.create("test-1", OptionalInt.of(200), Optional.of("OK"));
+
+SnsResponseCoder coder = SnsResponseCoder.of(StringUtf8Coder.of());
+ByteArrayOutputStream output = new ByteArrayOutputStream();
+coder.encode(expected, output);
+
+ByteArrayInputStream in = new ByteArrayInputStream(output.toByteArray());
+SnsResponse actual = coder.decode(in);
+
+Assert.assertEquals(expected, actual);
+  }
+
+  @Test
+  public void verifyResponseWithStatusAndNoText() throws IOException {
+SnsResponse expected =
+SnsResponse.create("test-2", OptionalInt.of(200), Optional.empty());
+
+SnsResponseCoder coder = SnsResponseCoder.of(StringUtf8Coder.of());
+ByteArrayOutputStream output = new ByteArrayOutputStream();
+coder.encode(expected, output);
+
+ByteArrayInputStream in = new ByteArrayInputStream(output.toByteArray());
+SnsResponse actual = coder.decode(in);
+
+Assert.assertEquals(expected, actual);
+  }
+
+  @Test
+  public void verifyResponseWIthNoStatusCodeAndText() throws IOException {

Review comment:
   nit: "WIth" type capital "I"

##
File path: 
sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsResponseCoderTest.java
##
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.aws2.sns;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.util.Optional;
+import java.util.OptionalInt;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class SnsResponseCoderTest {
+
+  @Test
+  public void verifyResponseWithStatusCodeAndText() throws IOException {
+
+SnsResponse expected =
+SnsResponse.create("test-1", OptionalInt.of(200), Optional.of("OK"));
+
+SnsResponseCoder coder = SnsResponseCoder.of(StringUtf8Coder.of());
+ByteArrayOutputStream output = new ByteArrayOutputStream();
+coder.encode(expected, output);
+
+ByteArrayInputStream in = new ByteArrayInputStream(output.toByteArray());
+SnsResponse actual = coder.decode(in);
+
+Assert.assertEquals(expected, actual);
+  }
+
+  @Test
+  public void verifyResponseWithStatusAndNoText() throws IOException {
+SnsResponse expected =
+SnsResponse.create("test-2", OptionalInt.of(200), Optional.empty());
+
+SnsResponseCoder coder = SnsResponseCoder.of(StringUtf8Coder.of());
+ByteArrayOutputStream output = new ByteArrayOutputStream();
+coder.encode(expected, output);
+
+ByteArrayInputStream in = new ByteArrayInputStream(out

[GitHub] [beam] reuvenlax commented on pull request #11559: [BEAM-9836] Excluding spark runner for KeyTests

2020-05-04 Thread GitBox


reuvenlax commented on pull request #11559:
URL: https://github.com/apache/beam/pull/11559#issuecomment-623580541


   Key also makes sense in OnWindowExpiration, and it might make sense in 
processElements as well. However for test exclusion categories we can create 
overly-specific category names.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aromanenko-dev commented on a change in pull request #10078: [BEAM-8542] Change write to async in AWS SNS IO & remove retry logic

2020-05-04 Thread GitBox


aromanenko-dev commented on a change in pull request #10078:
URL: https://github.com/apache/beam/pull/10078#discussion_r419578585



##
File path: 
sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sns/SnsIO.java
##
@@ -504,18 +504,18 @@ public void tearDown() {
   checkArgument(getCoder() != null, "withElementCoder() needs to called");
 
   return input
-  .apply(ParDo.of(new SnsAsyncWriterFn<>(this)))
+  .apply(ParDo.of(new SnsWriteAsyncrFn<>(this)))
   .setCoder(SnsResponseCoder.of(getCoder()));
 }
 
-private static class SnsAsyncWriterFn extends DoFn> {
+private static class SnsWriteAsyncrFn extends DoFn> {
 
-  private static final Logger LOG = 
LoggerFactory.getLogger(SnsAsyncWriterFn.class);
+  private static final Logger LOG = 
LoggerFactory.getLogger(SnsWriteAsyncrFn.class);
 
-  private final AsyncWrite spec;
+  private final WriteAsync spec;
   private transient SnsAsyncClient client;
 
-  SnsAsyncWriterFn(AsyncWrite spec) {
+  SnsWriteAsyncrFn(WriteAsync spec) {

Review comment:
   nit: `SnsWriteAsyncrFn` - I guess a letter "r" before "Fn" is redundant. 
Please remove it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mxm removed a comment on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks

2020-05-04 Thread GitBox


mxm removed a comment on pull request #11558:
URL: https://github.com/apache/beam/pull/11558#issuecomment-623577185







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mxm commented on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks

2020-05-04 Thread GitBox


mxm commented on pull request #11558:
URL: https://github.com/apache/beam/pull/11558#issuecomment-623577363


   Run Seed Job



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >