[GitHub] [beam] ihji commented on pull request #11607: [BEAM-9430] Makes sure the watermarks returned by estimators are within bounds
ihji commented on pull request #11607: URL: https://github.com/apache/beam/pull/11607#issuecomment-623860331 Shouldn't we also disable `IllegalArgumentException`s in the constructors? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji commented on pull request #11574: [BEAM-9449] Pass PipelineOptions through expansion service
ihji commented on pull request #11574: URL: https://github.com/apache/beam/pull/11574#issuecomment-623816157 Thanks. Looks good to me overall. I think we should also consider adding optional `pipeline_options` argument to `ExternalTransform` given that each different expansion service needs different pipeline options. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji commented on a change in pull request #11039: [BEAM-9383] Staging Dataflow artifacts from environment
ihji commented on a change in pull request #11039: URL: https://github.com/apache/beam/pull/11039#discussion_r419841344 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -462,7 +462,8 @@ def run_pipeline(self, pipeline, options): use_fnapi = apiclient._use_fnapi(options) from apache_beam.transforms import environments default_environment = environments.DockerEnvironment.from_container_image( -apiclient.get_container_image_from_options(options)) +apiclient.get_container_image_from_options(options), +artifacts=environments.python_sdk_dependencies(options)) Review comment: We need pipeline option to populate artifacts. I think we could either use `from_options` instead and override container image or just leave as is. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji commented on a change in pull request #11039: [BEAM-9383] Staging Dataflow artifacts from environment
ihji commented on a change in pull request #11039: URL: https://github.com/apache/beam/pull/11039#discussion_r419839927 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py ## @@ -102,7 +102,8 @@ def setUp(self): '--staging_location=ignored', '--temp_location=/dev/null', '--no_auth', -'--dry_run=True' +'--dry_run=True', +'--sdk_location=container' Review comment: The test tries to download a dev version of apache-beam dependency (which indeed does not exist in pypi) when it constructs the environment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] henryken commented on pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas
henryken commented on pull request #11564: URL: https://github.com/apache/beam/pull/11564#issuecomment-623808517 Awesome! This PR can be merged now. Thanks @damondouglas! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji commented on a change in pull request #11039: [BEAM-9383] Staging Dataflow artifacts from environment
ihji commented on a change in pull request #11039: URL: https://github.com/apache/beam/pull/11039#discussion_r419838456 ## File path: runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java ## @@ -336,25 +323,26 @@ public DataflowPackage stageToFile( final AtomicInteger numCached = new AtomicInteger(0); List> destinationPackages = new ArrayList<>(); -for (String classpathElement : classpathElements) { - DataflowPackage sourcePackage = new DataflowPackage(); - if (classpathElement.contains("=")) { -String[] components = classpathElement.split("=", 2); -sourcePackage.setName(components[0]); -sourcePackage.setLocation(components[1]); - } else { -sourcePackage.setName(null); -sourcePackage.setLocation(classpathElement); +for (StagedFile classpathElement : classpathElements) { + DataflowPackage targetPackage = classpathElement.getStagedPackage(); + String source = classpathElement.getSource(); + if (source.contains("=")) { Review comment: removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji commented on a change in pull request #11039: [BEAM-9383] Staging Dataflow artifacts from environment
ihji commented on a change in pull request #11039: URL: https://github.com/apache/beam/pull/11039#discussion_r419838363 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1271,6 +1271,11 @@ message DeferredArtifactPayload { message ArtifactStagingToRolePayload { // A generated staged name (relative path under staging directory). string staged_name = 1; + + // (Optional) An artifact name when a runner supports it. + // For example, DataflowRunner requires predefined names for some artifacts + // such as "dataflow-worker.jar", "windmill_main". + string alias_name = 2; Review comment: This is Dataflow specific requirement. `DataflowPackage` model has two separate fields for `location` and `name`. `staged_name` and `alias_name` correspond to `location` and `name` respectively. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji commented on a change in pull request #11039: [BEAM-9383] Staging Dataflow artifacts from environment
ihji commented on a change in pull request #11039: URL: https://github.com/apache/beam/pull/11039#discussion_r419832339 ## File path: runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java ## @@ -772,6 +783,88 @@ private Debuggee registerDebuggee(CloudDebugger debuggerClient, String uniquifie } } + private List stageArtifacts(RunnerApi.Pipeline pipeline) { +ImmutableList.Builder filesToStageBuilder = ImmutableList.builder(); +for (Map.Entry entry : +pipeline.getComponents().getEnvironmentsMap().entrySet()) { + for (RunnerApi.ArtifactInformation info : entry.getValue().getDependenciesList()) { +if (!BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn())) { + throw new RuntimeException( + String.format("unsupported artifact type %s", info.getTypeUrn())); +} +RunnerApi.ArtifactFilePayload filePayload; +try { + filePayload = RunnerApi.ArtifactFilePayload.parseFrom(info.getTypePayload()); +} catch (InvalidProtocolBufferException e) { + throw new RuntimeException("Error parsing artifact file payload.", e); +} +if (!BeamUrns.getUrn(RunnerApi.StandardArtifacts.Roles.STAGING_TO) +.equals(info.getRoleUrn())) { + throw new RuntimeException( + String.format("unsupported artifact role %s", info.getRoleUrn())); +} +RunnerApi.ArtifactStagingToRolePayload stagingPayload; +try { + stagingPayload = RunnerApi.ArtifactStagingToRolePayload.parseFrom(info.getRolePayload()); +} catch (InvalidProtocolBufferException e) { + throw new RuntimeException("Error parsing artifact staging_to role payload.", e); +} +DataflowPackage target = new DataflowPackage(); +target.setLocation(stagingPayload.getStagedName()); +if (!Strings.isNullOrEmpty(stagingPayload.getAliasName())) { + target.setName(stagingPayload.getAliasName()); +} +filesToStageBuilder.add(StagedFile.of(filePayload.getPath(), target)); + } +} +return options.getStager().stageFiles(filesToStageBuilder.build()); + } + + private List getDefaultArtifacts() { +ImmutableList.Builder pathsToStageBuilder = ImmutableList.builder(); +ImmutableMap.Builder aliasMapBuilder = ImmutableMap.builder(); +String windmillBinary = + options.as(DataflowPipelineDebugOptions.class).getOverrideWindmillBinary(); +String dataflowWorkerJar = options.getDataflowWorkerJar(); +if (dataflowWorkerJar != null && !dataflowWorkerJar.isEmpty()) { + // Put the user specified worker jar at the start of the classpath, to be consistent with the + // built in worker order. + pathsToStageBuilder.add(dataflowWorkerJar); + aliasMapBuilder.put(dataflowWorkerJar, "dataflow-worker.jar"); +} +for (String path : options.getFilesToStage()) { + if (path.contains("=")) { Review comment: Yes. This syntax is only supported in Dataflow runner. `DataflowPackage` has a separate field `name` in addition to `location` and "=" separator allows to prefix `name` to the location of the source e.g. "dataflow.jar=/tmp/foo.jar". I could remove this special syntax but I decided to keep it since it's already exposed to users via `--filesToStage` option so removing it may cause backward compatibility issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11210: [BEAM-8949] SpannerIO integration tests
aaltay commented on pull request #11210: URL: https://github.com/apache/beam/pull/11210#issuecomment-623801499 @mszb - Do you know why the test is failing? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
pabloem commented on pull request #11086: URL: https://github.com/apache/beam/pull/11086#issuecomment-623800613 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji commented on a change in pull request #11039: [BEAM-9383] Staging Dataflow artifacts from environment
ihji commented on a change in pull request #11039: URL: https://github.com/apache/beam/pull/11039#discussion_r419829971 ## File path: runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java ## @@ -784,7 +877,25 @@ public DataflowPipelineJob run(Pipeline pipeline) { "Executing pipeline on the Dataflow Service, which will have billing implications " + "related to Google Compute Engine usage and other Google Cloud Services."); -List packages = options.getStager().stageDefaultFiles(); +// Capture the sdkComponents for look up during step translations +SdkComponents sdkComponents = SdkComponents.create(); + +DataflowPipelineOptions dataflowOptions = options.as(DataflowPipelineOptions.class); +String workerHarnessContainerImageURL = DataflowRunner.getContainerImageForJob(dataflowOptions); +RunnerApi.Environment defaultEnvironmentForDataflow = +Environments.createDockerEnvironment(workerHarnessContainerImageURL); + +sdkComponents.registerEnvironment( +defaultEnvironmentForDataflow +.toBuilder() +.addAllDependencies(getDefaultArtifacts()) +.build()); + +RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(pipeline, sdkComponents, true); + +LOG.debug("Portable pipeline proto:\n{}", TextFormat.printToString(pipelineProto)); Review comment: This debug log is not new. It's just relocated. Do you think it would be better to remove this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji commented on a change in pull request #11039: [BEAM-9383] Staging Dataflow artifacts from environment
ihji commented on a change in pull request #11039: URL: https://github.com/apache/beam/pull/11039#discussion_r419829659 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java ## @@ -210,56 +209,55 @@ public static Environment createProcessEnvironment( } } - private static List getArtifacts(List stagingFiles) { -Set pathsToStage = Sets.newHashSet(stagingFiles); + public static List getArtifacts( + List stagingFiles, StagingFileNameGenerator generator) { ImmutableList.Builder artifactsBuilder = ImmutableList.builder(); -for (String path : pathsToStage) { +for (String path : ImmutableSet.copyOf(stagingFiles)) { Review comment: `ImmutableSet` preserves the order but I think we don't need to make a copy here. Will use `LinkedHashSet` instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] damondouglas commented on pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas
damondouglas commented on pull request #11564: URL: https://github.com/apache/beam/pull/11564#issuecomment-623799315 @henryken and @lostluck I updated [the Stepik course](https://stepik.org/course/70387) and commited the `*-remote-info.yaml` files. Thank you both for your help. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] rahul8383 commented on a change in pull request #11569: [BEAM-9840] Support for Parameterized Types when converting from HCat…
rahul8383 commented on a change in pull request #11569: URL: https://github.com/apache/beam/pull/11569#discussion_r419827790 ## File path: sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/SchemaUtilsTest.java ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.hcatalog; + +import java.util.ArrayList; +import java.util.List; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.hadoop.hive.metastore.api.FieldSchema; +import org.junit.Assert; +import org.junit.Test; + +public class SchemaUtilsTest { + @Test + public void testParameterizedTypesToBeamTypes() { +List listOfFieldSchema = new ArrayList<>(); +listOfFieldSchema.add(new FieldSchema("parameterizedChar", "char(10)", null)); +listOfFieldSchema.add(new FieldSchema("parameterizedVarchar", "varchar(100)", null)); +listOfFieldSchema.add(new FieldSchema("parameterizedDecimal", "decimal(30,16)", null)); + +Schema expectedSchema = +Schema.builder() +.addNullableField("parameterizedChar", Schema.FieldType.STRING) +.addNullableField("parameterizedVarchar", Schema.FieldType.STRING) +.addNullableField("parameterizedDecimal", Schema.FieldType.DECIMAL) Review comment: Thanks @TheNeuralBit for the review. I am adding logical types in `schemas.logicaltypes` called `VariableLengthBytes`, `FixedLengthString`, `VariableLengthString`, `LogicalDecimal` as part of #11581 . I will take up the task of mapping these to logical types once my other PR gets merged. I also hope that #11272 get merged by then, so that I can use `SqlTypes.DATE` logical type. Can you please create a JIRA ticket and assign it to me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on a change in pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content
aaltay commented on a change in pull request #11554: URL: https://github.com/apache/beam/pull/11554#discussion_r419826968 ## File path: website/www/site/content/en/community/contact-us.md ## @@ -0,0 +1,47 @@ +--- +title: "Contact Us" +aliases: + - /community/ + - /use/issue-tracking/ + - /use/mailing-lists/ + - /get-started/support/ +--- + + +# Contact Us + +There are many ways to reach the Beam user and developer communities - use +whichever one seems best. + + Review comment: What is the problem here? What is markdownify and what is superscripts ? ## File path: website/www/site/content/en/documentation/transforms/java/elementwise/regex.md ## @@ -0,0 +1,34 @@ +--- +title: "Regex" +--- + +# Regex + +https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Regex.html";> + https://beam.apache.org/images/logos/sdks/java.png"; width="20px" height="20px" + alt="Javadoc" /> + Javadoc + + + Review comment: Why do we multiple breaks here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on a change in pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content
robertwb commented on a change in pull request #11554: URL: https://github.com/apache/beam/pull/11554#discussion_r419822575 ## File path: website/www/site/content/en/documentation/transforms/python/elementwise/pardo.md ## @@ -0,0 +1,142 @@ +--- +title: "Partition" Review comment: Another copy of "Partiton." (There may be others, we should verify we haven't lost content in the move.) ## File path: website/www/site/content/en/documentation/transforms/python/elementwise/map.md ## @@ -1,8 +1,5 @@ --- -layout: section -title: "Map" -permalink: /documentation/transforms/python/elementwise/map/ -section_menu: section-menu/documentation.html +title: "Partition" Review comment: Is this Partition or Map? ## File path: website/www/site/content/en/documentation/transforms/python/elementwise/kvswap.md ## @@ -0,0 +1,142 @@ +--- +title: "Partition" Review comment: This seems to be the wrong page. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] jaketf commented on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages
jaketf commented on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623793025 @chamikaramj @lukecwik Apologies, I believe the the NPE was a user error on my part. I've been able to revert my changes to to OffsetRangeTracker without reintroducing the NPE. To help future users as foolish as me, to get a less confusing NPE, I suggest we do something like I put in 8891292. But this is definitely not a core issue in OffsetRangeTracker. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb opened a new pull request #11608: Migration of Jekylle to Hugo
robertwb opened a new pull request #11608: URL: https://github.com/apache/beam/pull/11608 This should be the same as #11554, but with the massive commit split up into (1) Infrastructure changes (2) Automated refactoring. (3) Manual refactoring. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastComp
[GitHub] [beam] henryken commented on pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas
henryken commented on pull request #11564: URL: https://github.com/apache/beam/pull/11564#issuecomment-623783144 Please wait for the course uploading before merging this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] jaketf edited a comment on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages
jaketf edited a comment on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796 @chamikaramj thanks for the suggestion. I will look into using BoundedSource API in a separate PR and we can compare. Unfortunately, regular DoFns don't cut it because a single elements outputs are committed atomically (see this [conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)). Basically we have one input element (HL7v2 store) exploding to many, many output elements (all the messages in that store) in a single ProcessElement call. I'm trying to explore strategies for splitting up this listing (because due to pagination nature of messages.list it is single threaded). I originally chose splittable DoFn over BoundedSource based off the sentiment of this statement: > **Coding against the Source API involves a lot of boilerplate and is error-prone**, and it does not compose well with the rest of the Beam model because a Source can appear only at the root of a pipeline. - https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html Using splittable DoFn also made reading from multiple HL7v2 stores come for free (e.g. if you had several regional HL7v2 stores and your use case was to read from them all and write to a single multi-regional store). This admittedly a rather contrived use case. The blog also mentions - A Source can not emit an additional output (for example, records that failed to parse). - Healthcare customers feeding requirements for this plugin want DLQ on all sinks and sources. To be consistent with the streaming API provided in `HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I believe this is more of a nice to have for batch use cases (because there's no room for passing ListMessages bad messages IDs like there is in HL7v2IO.Read). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on a change in pull request #11582: [BEAM-9650] Add ReadAllFromBigQuery PTransform
aaltay commented on a change in pull request #11582: URL: https://github.com/apache/beam/pull/11582#discussion_r419808099 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -283,6 +284,8 @@ def compute_table_name(row): 'BigQuerySink', 'WriteToBigQuery', 'ReadFromBigQuery', +'ReadAllFromBigQueryRequest', Review comment: Does the java api use a similar concept? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] henryken commented on a change in pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas
henryken commented on a change in pull request #11564: URL: https://github.com/apache/beam/pull/11564#discussion_r419807983 ## File path: learning/katas/go/Core Transforms/Map/ParDo struct/pkg/task/task.go ## @@ -18,10 +18,7 @@ package task import "github.com/apache/beam/sdks/go/pkg/beam" func ApplyTransform(s beam.Scope, input beam.PCollection) beam.PCollection { - processFn := &multiplyByFn{ - Factor: 5, - } - return beam.ParDo(s, processFn, input) + return beam.ParDo(s, &multiplyByFn{Factor: 5}, input) Review comment: This looks good now! Thanks @damondouglas! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] henryken commented on a change in pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas
henryken commented on a change in pull request #11564: URL: https://github.com/apache/beam/pull/11564#discussion_r419807137 ## File path: learning/katas/go/Core Transforms/Map/ParDo/task.md ## @@ -0,0 +1,39 @@ + + +# ParDo + +ParDo is a Beam transform for generic parallel processing. The ParDo processing paradigm is similar to the “Map” +phase of a Map/Shuffle/Reduce-style algorithm: a ParDo transform considers each element in the input PCollection, +performs some processing function (your user code) on that element, and emits zero, one, or multiple elements to an +output PCollection. + +**Kata:** Please write a simple ParDo that maps the input element by multiplying it by 10. Review comment: @lostluck, the space will not create any problem. It is friendlier for the learner to see the lesson and task names in the natural way versus using underscore name. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on a change in pull request #11560: Auto-inferring project for ReadFromBigQuery
pabloem commented on a change in pull request #11560: URL: https://github.com/apache/beam/pull/11560#discussion_r419805319 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -526,7 +526,7 @@ def reader(self, test_bigquery_client=None): def _to_bool(value): - return value == 'true' + return value Review comment: ahhh what a great catch. thanks Yichi! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11560: Auto-inferring project for ReadFromBigQuery
pabloem commented on pull request #11560: URL: https://github.com/apache/beam/pull/11560#issuecomment-623779892 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] y1chi commented on a change in pull request #11560: Auto-inferring project for ReadFromBigQuery
y1chi commented on a change in pull request #11560: URL: https://github.com/apache/beam/pull/11560#discussion_r419804731 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -526,7 +526,7 @@ def reader(self, test_bigquery_client=None): def _to_bool(value): - return value == 'true' + return value Review comment: Do we need the _to_bool function then, would the bool casting suffice? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on pull request #11603: [BEAM-9875] Use FlinkRunner instead of PortableRunner in cross-langua…
ibzib commented on pull request #11603: URL: https://github.com/apache/beam/pull/11603#issuecomment-623772817 R: @ihji This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on pull request #11603: [BEAM-9875] Use FlinkRunner instead of PortableRunner in cross-langua…
ibzib commented on pull request #11603: URL: https://github.com/apache/beam/pull/11603#issuecomment-623772242 Python precommit is failing due to BEAM-9767 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] jaketf edited a comment on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages
jaketf edited a comment on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796 @chamikaramj thanks for the suggestion. I will look into using BoundedSource API in a separate PR and we can compare. Unfortunately, regular DoFns don't cut it because a single elements outputs are committed atomically (see this [conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)). Basically we have one input element (HL7v2 store) exploding to many, many output elements (all the messages in that store) in a single ProcessElement call. I'm trying to explore strategies for splitting up this listing. I originally chose splittable DoFn over BoundedSource based off the sentiment of this statement: > **Coding against the Source API involves a lot of boilerplate and is error-prone**, and it does not compose well with the rest of the Beam model because a Source can appear only at the root of a pipeline. - https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html Using splittable DoFn also made reading from multiple HL7v2 stores come for free (e.g. if you had several regional HL7v2 stores and your use case was to read from them all and write to a single multi-regional store). This admittedly a rather contrived use case. The blog also mentions - A Source can not emit an additional output (for example, records that failed to parse). - Healthcare customers feeding requirements for this plugin want DLQ on all sinks and sources. To be consistent with the streaming API provided in `HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I believe this is more of a nice to have for batch use cases (because there's no room for passing ListMessages bad messages IDs like there is in HL7v2IO.Read). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] jaketf edited a comment on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages
jaketf edited a comment on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796 @chamikaramj thanks for the suggestion. I will look into using BoundedSource API in a separate PR and we can compare. Unfortunately, regular DoFns don't cut it because a single elements outputs are committed atomically (see this [conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)). Basically we have one input element (HL7v2 store) exploding to many, many output elements (all the messages in that store) in a single ProcessElement call. I'm trying to explore strategies for splitting up this listing. I originally chose splittable DoFn over BoundedSource based off the sentiment of this statement: > **Coding against the Source API involves a lot of boilerplate and is error-prone**, and it does not compose well with the rest of the Beam model because a Source can appear only at the root of a pipeline. - https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html The blog also mentions - A Source can not emit an additional output (for example, records that failed to parse). - Healthcare customers feeding requirements for this plugin want DLQ on all sinks and sources. To be consistent with the streaming API provided in `HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I believe this is more of a nice to have for batch use cases (because there's no room for passing ListMessages bad messages IDs like there is in HL7v2IO.Read). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] jaketf edited a comment on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages
jaketf edited a comment on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796 @chamikaramj thanks for the suggestion. I will look into using BoundedSource API. Unfortunately, regular DoFns don't cut it because a single elements outputs are committed atomically (see this [conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)). Basically we have one input element (HL7v2 store) exploding to many, many output elements (all the messages in that store) in a single ProcessElement call. I'm trying to explore strategies for splitting up this listing. I originally chose splittable DoFn over BoundedSource based off the sentiment of this statement: > **Coding against the Source API involves a lot of boilerplate and is error-prone**, and it does not compose well with the rest of the Beam model because a Source can appear only at the root of a pipeline. - https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html The blog also mentions - A Source can not emit an additional output (for example, records that failed to parse). - Healthcare customers feeding requirements for this plugin want DLQ on all sinks and sources. To be consistent with the streaming API provided in `HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I believe this is more of a nice to have for batch use cases (because there's no room for passing ListMessages bad messages IDs like there is in HL7v2IO.Read). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on pull request #11403: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch
ibzib commented on pull request #11403: URL: https://github.com/apache/beam/pull/11403#issuecomment-623767557 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on pull request #11403: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch
ibzib commented on pull request #11403: URL: https://github.com/apache/beam/pull/11403#issuecomment-623766941 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] jaketf commented on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages
jaketf commented on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796 @chamikaramj thanks for the suggestion. I will look into using BoundedSource API. Unfortunately, regular DoFns don't cut it because a single elements outputs are committed atomically (see this [conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)). Basically we have one input element (HL7v2 store) exploding to many, many output elements (all the messages in that store) in a single ProcessElement call. I'm trying to explore strategies for splitting up this listing. I originally chose splittable DoFn over BoundedSource based off the sentiment of this statement: > **Coding against the Source API involves a lot of boilerplate and is error-prone**, and it does not compose well with the rest of the Beam model because a Source can appear only at the root of a pipeline. - https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html The blog also mentions - A Source can not emit an additional output (for example, records that failed to parse). - Healthcare customers feeding requirements for this plugin want DLQ on all sinks and sources. To be consistent with the streaming API provided in `HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I believe this is more of a nice to have for batch use cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on pull request #11603: [BEAM-9875] Use FlinkRunner instead of PortableRunner in cross-langua…
ibzib commented on pull request #11603: URL: https://github.com/apache/beam/pull/11603#issuecomment-623752060 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj opened a new pull request #11607: [BEAM-9430] Updates the watermark estimators to make sure the returned watermark …
chamikaramj opened a new pull request #11607: URL: https://github.com/apache/beam/pull/11607 …is within bounds **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/la
[GitHub] [beam] chamikaramj edited a comment on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages
chamikaramj edited a comment on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623741333 Is using SplittableDoFn API here intentional ? I think this API is being updated. cc: @lukecwik If you need dynamic work rebalancing, consider using the BoundedSource interface. Otherwise we can just implement the source using regular DoFns and wait for SplittableDoFn API to stabilize before adding support for dynamic work rebalancing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages
chamikaramj commented on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623741333 Is using SplittableDoFn API here intentional ? I think this API is being updated. cc: @lukecwik If you need dynamic work rebalancing, consider using the BoundedSource interface. Otherwise we can just implement the source as a regular and wait for SplittableDoFn API to stabilize before adding support for dynamic work rebalancing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on a change in pull request #11560: Auto-inferring project for ReadFromBigQuery
pabloem commented on a change in pull request #11560: URL: https://github.com/apache/beam/pull/11560#discussion_r419765325 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -526,7 +526,7 @@ def reader(self, test_bigquery_client=None): def _to_bool(value): - return value == 'true' + return value Review comment: the coder incorrectly expected boolean types to be encoded as strings. This is incorrect, as JSON supports boolean 'natively', like this: https://json-schema.org/understanding-json-schema/reference/boolean.html This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] jaketf commented on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages
jaketf commented on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623737955 > The NPE is caused by `OffsetRangeTracker.lastAttemptedOffset` being unboxed in `OffsetRangeTracker::checkDone` [here](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L98). > > [all](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L53) [other](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L77) [uses](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L119) check that `lastAttemptedOffset` is not null. > > I'm not sure if this was intentional in the implementation of OffsetRangeTracker. @chamikaramj pablo said you might know about this. should check done have some conditional on `lastAttemptedOffset != null` e.g. ```java @Override public void checkDone() throws IllegalStateException { if (range.getFrom() == range.getTo()) { return; } if (lastAttemptedOffset != null) { checkState( lastAttemptedOffset >= range.getTo() - 1, "Last attempted offset was %s in range %s, claiming work in [%s, %s) was not attempted", lastAttemptedOffset, range, lastAttemptedOffset + 1, range.getTo()); } } ``` I'm not really familiar with what checkDone should do in the case that lastAttemptedOffset was null. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on pull request #11603: [BEAM-9875] Use FlinkRunner instead of PortableRunner in cross-langua…
ibzib commented on pull request #11603: URL: https://github.com/apache/beam/pull/11603#issuecomment-623736044 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages
pabloem commented on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623733179 @chamikaramj can you take a look at this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11600: Allow accessing window parameters in finishBundle
pabloem commented on pull request #11600: URL: https://github.com/apache/beam/pull/11600#issuecomment-623732876 can you please create a JIRA? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] y1chi commented on a change in pull request #11560: Auto-inferring project for ReadFromBigQuery
y1chi commented on a change in pull request #11560: URL: https://github.com/apache/beam/pull/11560#discussion_r419751064 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -526,7 +526,7 @@ def reader(self, test_bigquery_client=None): def _to_bool(value): - return value == 'true' + return value Review comment: could you explain why this change is needed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] Hannah-Jiang opened a new pull request #11606: [BEAM-9880] fix cannot touch issue
Hannah-Jiang opened a new pull request #11606: URL: https://github.com/apache/beam/pull/11606 R: @ibzib Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_Po
[GitHub] [beam] youngoli commented on a change in pull request #11553: [BEAM-9799] Automated validation for RTrackers.
youngoli commented on a change in pull request #11553: URL: https://github.com/apache/beam/pull/11553#discussion_r419743023 ## File path: sdks/go/pkg/beam/core/runtime/exec/pardo.go ## @@ -162,6 +168,14 @@ func (n *ParDo) processMainInput(mainIn *MainInput) error { return nil } +func rtErrHelper(err error) error { + if err != nil { + return err + } else { + return errors.New("RTracker IsDone failed for unspecified reason") + } Review comment: Yeah good point. I'll rephrase it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] jaketf commented on pull request #11596: [BEAM-9856] [*WIP DO NOT MERGE*] Optimization/hl7v2 io list messages
jaketf commented on pull request #11596: URL: https://github.com/apache/beam/pull/11596#issuecomment-623718338 The NPE is caused by `OffsetRangeTracker.lastAttemptedOffset` being unboxed in `OffsetRangeTracker::checkDone` [here](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L98). [all](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L53) [other](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L77) [uses](https://github.com/apache/beam/blob/0291976d6029b4cf4d72caa03bd3836900fb6328/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java#L119) check that `lastAttemptedOffset` is not null. I'm not sure if this was intentional in the implementation of OffsetRangeTracker. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on pull request #11403: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch
ibzib commented on pull request #11403: URL: https://github.com/apache/beam/pull/11403#issuecomment-623717447 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on pull request #11403: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch
ibzib commented on pull request #11403: URL: https://github.com/apache/beam/pull/11403#issuecomment-623716216 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #10055: [BEAM-8603] Add Python SqlTransform
pabloem commented on pull request #10055: URL: https://github.com/apache/beam/pull/10055#issuecomment-623714745 yoohooo This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #11574: [BEAM-9449] Pass PipelineOptions through expansion service
TheNeuralBit commented on pull request #11574: URL: https://github.com/apache/beam/pull/11574#issuecomment-623713457 Synced this up now that #11571 is merged. This is ready for review now. @ihji do you have time to review? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] youngoli commented on pull request #11605: [BEAM-9883] Refactor SDF test restrictions.
youngoli commented on pull request #11605: URL: https://github.com/apache/beam/pull/11605#issuecomment-623713567 R: @lostluck This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] youngoli opened a new pull request #11605: [BEAM-9883] Refactor SDF test restrictions.
youngoli opened a new pull request #11605: URL: https://github.com/apache/beam/pull/11605 Refactoring the restriction used for testing SDFs. Instead of having some obtuse behavior that we can validate, it now just contains a bunch of flags we can flip to track that it was used in each method. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache
[GitHub] [beam] robertwb commented on a change in pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content
robertwb commented on a change in pull request #11554: URL: https://github.com/apache/beam/pull/11554#discussion_r419717683 ## File path: website/www/site/content/en/blog/splittable-do-fn.md ## @@ -475,8 +471,7 @@ IO connectors. However, a large amount of work is in progress or planned. As of August 2017, SDF is available for use in the Beam Java Direct runner and Dataflow Streaming runner, and implementation is in progress in the Flink and -Apex runners; see [capability matrix]({{ site.baseurl -}}/documentation/runners/capability-matrix/) for the current status. Support +Apex runners; see [capability matrix]({/documentation/runners/capability-matrix/) for the current status. Support Review comment: Stray {? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on pull request #11403: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch
ibzib commented on pull request #11403: URL: https://github.com/apache/beam/pull/11403#issuecomment-623692174 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on pull request #11601: [BEAM-9877] [cherry-pick] Make BatchGroupAlsoByWindowViaIteratorsFn extend the Elem…
ibzib commented on pull request #11601: URL: https://github.com/apache/beam/pull/11601#issuecomment-623684159 Java test failing due to BEAM-9164 (known flake). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] apilloud commented on pull request #11604: [BEAM-9659][BEAM-9657][BEAM-9664] Reject unsupported unnest joins
apilloud commented on pull request #11604: URL: https://github.com/apache/beam/pull/11604#issuecomment-623679534 R: @robinyqiu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] apilloud opened a new pull request #11604: [BEAM-9659][BEAM-9657][BEAM-9664] Reject unsupported unnest joins
apilloud opened a new pull request #11604: URL: https://github.com/apache/beam/pull/11604 There is no trivial path to fixing these, so just ensure they return a sensible error for now. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Pyt
[GitHub] [beam] ibzib commented on pull request #11603: [BEAM-9875] Use FlinkRunner instead of PortableRunner in cross-langua…
ibzib commented on pull request #11603: URL: https://github.com/apache/beam/pull/11603#issuecomment-623678292 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib opened a new pull request #11603: [BEAM-9875] Use FlinkRunner instead of PortableRunner in cross-langua…
ibzib opened a new pull request #11603: URL: https://github.com/apache/beam/pull/11603 …ge tests. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommi
[GitHub] [beam] pabloem commented on pull request #11560: Auto-inferring project for ReadFromBigQuery
pabloem commented on pull request #11560: URL: https://github.com/apache/beam/pull/11560#issuecomment-623677009 @kamilwu @y1chi PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tudorm commented on pull request #11601: [BEAM-9877] Make BatchGroupAlsoByWindowViaIteratorsFn extend the Elem…
tudorm commented on pull request #11601: URL: https://github.com/apache/beam/pull/11601#issuecomment-623674559 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tudorm commented on pull request #11598: [BEAM-9877] Estimate sizes of group-by-key values behind a key lazily only.
tudorm commented on pull request #11598: URL: https://github.com/apache/beam/pull/11598#issuecomment-623673128 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11560: Auto-inferring project for ReadFromBigQuery
pabloem commented on pull request #11560: URL: https://github.com/apache/beam/pull/11560#issuecomment-623669571 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] apilloud commented on pull request #11602: [BEAM-9661] Fix ORDER BY with LIMIT
apilloud commented on pull request #11602: URL: https://github.com/apache/beam/pull/11602#issuecomment-623663334 R: @robinyqiu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] apilloud opened a new pull request #11602: [BEAM-9661] Fix ORDER BY with LIMIT
apilloud opened a new pull request #11602: URL: https://github.com/apache/beam/pull/11602 It turns out ZetaSQL column references don't always start at 0, an example of this is on union operators. We probably have a number of bugs in this regard. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/b
[GitHub] [beam] ibzib opened a new pull request #11601: [BEAM-9877] Make BatchGroupAlsoByWindowViaIteratorsFn extend the Elem…
ibzib opened a new pull request #11601: URL: https://github.com/apache/beam/pull/11601 …entByteSizeObservableIterable so that size estimation is lazy Cherry-pick of #11598. (I removed the commit history because I forgot to squash before merging the original PR, and one atomic commit is a lot easier to deal with.) R: @tudorm Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![
[GitHub] [beam] TheNeuralBit commented on a change in pull request #11569: [BEAM-9840] Support for Parameterized Types when converting from HCat…
TheNeuralBit commented on a change in pull request #11569: URL: https://github.com/apache/beam/pull/11569#discussion_r419653421 ## File path: sdks/java/io/hcatalog/src/test/java/org/apache/beam/sdk/io/hcatalog/SchemaUtilsTest.java ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.hcatalog; + +import java.util.ArrayList; +import java.util.List; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.hadoop.hive.metastore.api.FieldSchema; +import org.junit.Assert; +import org.junit.Test; + +public class SchemaUtilsTest { + @Test + public void testParameterizedTypesToBeamTypes() { +List listOfFieldSchema = new ArrayList<>(); +listOfFieldSchema.add(new FieldSchema("parameterizedChar", "char(10)", null)); +listOfFieldSchema.add(new FieldSchema("parameterizedVarchar", "varchar(100)", null)); +listOfFieldSchema.add(new FieldSchema("parameterizedDecimal", "decimal(30,16)", null)); + +Schema expectedSchema = +Schema.builder() +.addNullableField("parameterizedChar", Schema.FieldType.STRING) +.addNullableField("parameterizedVarchar", Schema.FieldType.STRING) +.addNullableField("parameterizedDecimal", Schema.FieldType.DECIMAL) Review comment: I think these should map to logical types instead of to primitives so we don't lose the information from the parameter. Unfortunately we don't (yet) have good logical types in `schemas.logicaltypes` to map them to, but maybe we will after your other PR, https://github.com/apache/beam/pull/11581 (or you could just add the relevant ones here). `char(10)` looks like it could map to a `FixedLengthString` logical type, `varchar(100)` probably deserves its own type, maybe just called `Varchar`? and I've been meaning to add a logical type for DECIMAL parameterized by precision and scale as part of [BEAM-7554](https://issues.apache.org/jira/browse/BEAM-7554) (and deprecate the primitive one). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on a change in pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content
robertwb commented on a change in pull request #11554: URL: https://github.com/apache/beam/pull/11554#discussion_r419661024 ## File path: website/www/site/content/en/contribute/release-guide.md ## @@ -218,7 +214,7 @@ docker login docker.io After successful login, authorization info will be stored at ~/.docker/config.json file. For example, ``` "https://index.docker.io/v1/": { - "auth": "xx" + "auth": "aGFubmFoamlhbmc6cmtkdGpmZ2hrMTIxMw==" Review comment: Probably don't want to check this in. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on a change in pull request #11597: [BEAM-9874] Support clearing timers in portable batch mode (Spark/Flink)
ibzib commented on a change in pull request #11597: URL: https://github.com/apache/beam/pull/11597#discussion_r419658069 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryTimerInternals.java ## @@ -167,20 +165,17 @@ public void deleteTimer(StateNamespace namespace, String timerId, TimeDomain tim @Deprecated @Override public void deleteTimer(StateNamespace namespace, String timerId, String timerFamilyId) { -TimerData existing = existingTimers.get(namespace, timerId + '+' + timerFamilyId); -if (existing != null) { - deleteTimer(existing); +TimerData removedTimer = existingTimers.remove(namespace, timerId + '+' + timerFamilyId); +if (removedTimer != null) { + timersForDomain(removedTimer.getDomain()).remove(removedTimer); } } /** @deprecated use {@link #deleteTimer(StateNamespace, String, TimeDomain)}. */ Review comment: Maybe worth a JIRA/refactor? I'll leave it up to you since I am not as familiar with this code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on a change in pull request #11598: [BEAM-9877] Estimate sizes of group-by-key values behind a key lazily only.
ibzib commented on a change in pull request #11598: URL: https://github.com/apache/beam/pull/11598#discussion_r419649174 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BatchGroupAlsoByWindowViaIteratorsFn.java ## @@ -165,12 +168,17 @@ public WindowReiterable( } @Override -public Reiterator iterator() { +public WindowReiterator iterator() { Review comment: It's odd that `ElementByteSizeObservableIterable::iterator` adds observers within the method body. I assume this is for historic reasons, since it doesn't seem to do anything now, and the comment documenting references a `setObserver` method that doesn't exist. Anyway, your change looks fine. But we should consider cleaning this up. https://github.com/apache/beam/blob/6453e859badcb629ae2528b77d84235b7291ff89/sdks/java/core/src/main/java/org/apache/beam/sdk/util/common/ElementByteSizeObservableIterable.java#L49-L61 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #11569: [BEAM-9840] Support for Parameterized Types when converting from HCat…
TheNeuralBit commented on pull request #11569: URL: https://github.com/apache/beam/pull/11569#issuecomment-623636289 @akedin isn't very involved with Beam anymore. I think I can help review this instead This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck commented on a change in pull request #11517: [BEAM-9643] Adding Go SDF Documentation.
lostluck commented on a change in pull request #11517: URL: https://github.com/apache/beam/pull/11517#discussion_r419635469 ## File path: sdks/go/pkg/beam/core/sdf/sdf.go ## @@ -13,63 +13,64 @@ // See the License for the specific language governing permissions and // limitations under the License. -// Package sdf is experimental, incomplete, and not yet meant for general usage. +// Package contains interfaces used specifically for splittable DoFns. +// +// Warning: Splittable DoFns are still experimental, largely untested, and +// likely to have bugs. package sdf // RTracker is an interface used to interact with restrictions while processing elements in -// SplittableDoFns. Each implementation of RTracker is expected to be used for tracking a single -// restriction type, which is the type that should be used to create the RTracker, and output by -// TrySplit. +// splittable DoFns (specifically, in the ProcessElement method). Each RTracker tracks the progress +// of a single restriction. type RTracker interface { - // TryClaim attempts to claim the block of work in the current restriction located at a given - // position. This method must be used in the ProcessElement method of Splittable DoFns to claim - // work before performing it. If no work is claimed, the ProcessElement is not allowed to perform - // work or emit outputs. If the claim is successful, the DoFn must process the entire block. If - // the claim is unsuccessful the ProcessElement method of the DoFn must return without performing - // any additional work or emitting any outputs. - // - // TryClaim accepts an arbitrary value that can be interpreted as the position of a block, and - // returns a boolean indicating whether the claim succeeded. + // TryClaim attempts to claim the block of work located in the given position of the + // restriction. This method must be called in ProcessElement to claim work before it can be + // processed. Processing work without claiming it first can lead to incorrect output. // - // If the claim fails due to an error, that error can be retrieved with GetError. + // If the claim is successful, the DoFn must process the entire block. If the claim is + // unsuccessful ProcessElement method of the DoFn must return without performing + // any additional work or emitting any outputs. // - // For SDFs to work properly, claims must always be monotonically increasing in reference to the - // restriction's start and end points, and every block of work in a restriction must be claimed. + // If the claim fails due to an error, that error is stored and will be automatically emitted + // when the RTracker is validated, or can be manually retrieved with GetError. // // This pseudocode example illustrates the typical usage of TryClaim: // - // pos = position of first block after restriction.start + // pos = position of first block within the restriction // for TryClaim(pos) == true { // // Do all work in the claimed block and emit outputs. - // pos = position of next block + // pos = position of next block within the restriction // } // return TryClaim(pos interface{}) (ok bool) - // GetError returns the error that made this RTracker stop executing, and it returns nil if no - // error occurred. If IsDone fails while validating this RTracker, this method will be - // called to log the error. + // GetError returns the error that made this RTracker stop executing, and returns nil if no + // error occurred. This is the error that is emitted if automated validation fails. GetError() error - // TrySplit splits the current restriction into a primary and residual based on a fraction of the - // work remaining. The split is performed along the first valid split point located after the - // given fraction of the remainder. This method is called by the SDK harness when receiving a - // split request by the runner. + // TrySplit splits the current restriction into a primary (currently executing work) and + // residual (work to be split off) based on a fraction of work remaining. The split is performed + // at the first valid split point located after the given fraction of remaining work. + // + // For example, a fraction of 0.5 means to split at the halfway point of remaining work only. If + // 50% of work is done and 50% remaining, then a fraction of 0.5 would split after 75% of work. + // + // This method modifies the underlying restriction in the RTracker to reflect the primary. It + // then returns a copy of the newly modified restriction as a primary, and returns a new +
[GitHub] [beam] damondouglas edited a comment on pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas
damondouglas edited a comment on pull request #11564: URL: https://github.com/apache/beam/pull/11564#issuecomment-623621585 > Retest this please Is this addressed to me? Would you like me to retest something? That's "addressed" to the jenkins bot to retest the PR via whatever automated selection of tests it requires. Since the bot is particular, I needed to separate it from my other comment so it would work. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck commented on a change in pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas
lostluck commented on a change in pull request #11564: URL: https://github.com/apache/beam/pull/11564#discussion_r419627705 ## File path: learning/katas/go/Core Transforms/Map/ParDo OneToMany/task.md ## @@ -0,0 +1,32 @@ + + +# ParDo - One to Many + +In the previous kata we learned that ParDo maps a single element into another element. +In this kata we will map a single element into many by splitting a sentence into words. + +**Kata:** Please write a ParDo that maps each input sentence into words tokenized by whitespace (" "). + + + Use https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo";> + ParDo + with https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#hdr-DoFns";> Review comment: DoFns are generally a specific thing, so an article is handy. ```suggestion with a https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#hdr-DoFns";> ``` ## File path: learning/katas/go/Core Transforms/Map/ParDo OneToMany/task.md ## @@ -0,0 +1,32 @@ + + +# ParDo - One to Many + +In the previous kata we learned that ParDo maps a single element into another element. +In this kata we will map a single element into many by splitting a sentence into words. + +**Kata:** Please write a ParDo that maps each input sentence into words tokenized by whitespace (" "). + + + Use https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo";> + ParDo Review comment: Consider using the qualifed import name, like they'd see in their Go code (eg. beam.ParDo) , rather than just the single method. It would look odd to other languages, but the generally explicit package/provenance of identifiers is a hallmark of Go. ## File path: learning/katas/go/Core Transforms/Map/ParDo/task.md ## @@ -0,0 +1,39 @@ + + +# ParDo + +ParDo is a Beam transform for generic parallel processing. The ParDo processing paradigm is similar to the “Map” +phase of a Map/Shuffle/Reduce-style algorithm: a ParDo transform considers each element in the input PCollection, +performs some processing function (your user code) on that element, and emits zero, one, or multiple elements to an +output PCollection. + +**Kata:** Please write a simple ParDo that maps the input element by multiplying it by 10. Review comment: A possible adjacent task/step is converting a func DoFn into a Structural DoFn. Yes, this is relatively simple, but from a learning standpoint, it makes the distinction pretty clear, while not asking users too much else that might conflate with it. (eg. Getting the wrong idea that funcs must be 1:1 vs 1:many/none, vs structs etc). Not necessary to do it in this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mxm commented on a change in pull request #11597: [BEAM-9874] Support clearing timers in portable batch mode (Spark/Flink)
mxm commented on a change in pull request #11597: URL: https://github.com/apache/beam/pull/11597#discussion_r419632632 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryTimerInternals.java ## @@ -167,20 +165,17 @@ public void deleteTimer(StateNamespace namespace, String timerId, TimeDomain tim @Deprecated @Override public void deleteTimer(StateNamespace namespace, String timerId, String timerFamilyId) { -TimerData existing = existingTimers.get(namespace, timerId + '+' + timerFamilyId); -if (existing != null) { - deleteTimer(existing); +TimerData removedTimer = existingTimers.remove(namespace, timerId + '+' + timerFamilyId); +if (removedTimer != null) { + timersForDomain(removedTimer.getDomain()).remove(removedTimer); } } /** @deprecated use {@link #deleteTimer(StateNamespace, String, TimeDomain)}. */ Review comment: yeah, actually that might have been a mistake when the timer family was added because the old way was to use `TimerData` and the new way is to use timerId/timerFamily. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] damondouglas commented on pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas
damondouglas commented on pull request #11564: URL: https://github.com/apache/beam/pull/11564#issuecomment-623621585 > Retest this please Is this addressed to me? Would you like me to retest something? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chadrik commented on pull request #11038: [BEAM-7746] More typing fixes
chadrik commented on pull request #11038: URL: https://github.com/apache/beam/pull/11038#issuecomment-623619389 Hi everyone, I have some availability to finish this PR off now. I'm going to rebase it soon. @udim do you have the time to help me get this through review? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #11598: [BEAM-9877] Estimate sizes of group-by-key values behind a key lazily only.
lukecwik commented on pull request #11598: URL: https://github.com/apache/beam/pull/11598#issuecomment-623619474 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chadrik commented on a change in pull request #11038: [BEAM-7746] More typing fixes
chadrik commented on a change in pull request #11038: URL: https://github.com/apache/beam/pull/11038#discussion_r419627186 ## File path: sdks/python/apache_beam/transforms/core.py ## @@ -1300,12 +1300,13 @@ def to_runner_api_parameter(self, context): common_urns.requirements.REQUIRES_STATEFUL_PROCESSING.urn) from apache_beam.runners.common import DoFnSignature sig = DoFnSignature(self.fn) -is_splittable = sig.is_splittable_dofn() Review comment: > It is an error to say is_splittable_dofn is True without returning a restriction coder as well and vice versa. This seems to validate my earlier assessment that a None result from this `get_restriction_coder` means "is not splittable", and therefore that my proposed change is valid. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck commented on pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas
lostluck commented on pull request #11564: URL: https://github.com/apache/beam/pull/11564#issuecomment-623617721 (sadly, only comitters can trigger the tests even after the first time. Your commands were correct) I'm doing this review now. Thank you for your patience! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck commented on pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas
lostluck commented on pull request #11564: URL: https://github.com/apache/beam/pull/11564#issuecomment-623617273 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck commented on a change in pull request #11553: [BEAM-9799] Automated validation for RTrackers.
lostluck commented on a change in pull request #11553: URL: https://github.com/apache/beam/pull/11553#discussion_r419623182 ## File path: sdks/go/pkg/beam/core/runtime/exec/pardo.go ## @@ -162,6 +168,14 @@ func (n *ParDo) processMainInput(mainIn *MainInput) error { return nil } +func rtErrHelper(err error) error { + if err != nil { + return err + } else { + return errors.New("RTracker IsDone failed for unspecified reason") + } Review comment: ```suggestion } return errors.New("RTracker.IsDone() failed for unspecified reason") ``` ## File path: sdks/go/pkg/beam/core/runtime/exec/pardo.go ## @@ -162,6 +168,14 @@ func (n *ParDo) processMainInput(mainIn *MainInput) error { return nil } +func rtErrHelper(err error) error { + if err != nil { + return err + } else { + return errors.New("RTracker IsDone failed for unspecified reason") + } Review comment: Also, in both places this is called, error or no, we don't know IsDone failed(), we know that the DoFn terminated without fully processing the restriction (If I understand IsDone's semantics correctly). Perhaps the message should indicate that? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] bntnam commented on pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content
bntnam commented on pull request #11554: URL: https://github.com/apache/beam/pull/11554#issuecomment-62360 > @bntnam Do we have a JIRA issue for this PR where we can refer to for background and context information ? @manuzhang : Hey hey, here you have it. [1] [1] https://issues.apache.org/jira/browse/BEAM-9876 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on a change in pull request #11597: [BEAM-9874] Support clearing timers in portable batch mode (Spark/Flink)
ibzib commented on a change in pull request #11597: URL: https://github.com/apache/beam/pull/11597#discussion_r419607989 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryTimerInternals.java ## @@ -167,20 +165,17 @@ public void deleteTimer(StateNamespace namespace, String timerId, TimeDomain tim @Deprecated @Override public void deleteTimer(StateNamespace namespace, String timerId, String timerFamilyId) { -TimerData existing = existingTimers.get(namespace, timerId + '+' + timerFamilyId); -if (existing != null) { - deleteTimer(existing); +TimerData removedTimer = existingTimers.remove(namespace, timerId + '+' + timerFamilyId); +if (removedTimer != null) { + timersForDomain(removedTimer.getDomain()).remove(removedTimer); } } /** @deprecated use {@link #deleteTimer(StateNamespace, String, TimeDomain)}. */ Review comment: It seems strange that the deprecated method is the one being used, and the recommended method is not implemented. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on pull request #11521: [BEAM-9577] Update Java Runners to handle dependency-based artifact staging.
robertwb commented on pull request #11521: URL: https://github.com/apache/beam/pull/11521#issuecomment-623600647 Ping @ihji This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tudorm commented on pull request #11598: Do not attempt to estimate sizes of group-by-key values behind a key.
tudorm commented on pull request #11598: URL: https://github.com/apache/beam/pull/11598#issuecomment-623600578 Updated the change to perform lazy estimation instead of side-stepping the estimation altogether. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] Akshay-Iyangar commented on pull request #10078: [BEAM-8542] Change write to async in AWS SNS IO & remove retry logic
Akshay-Iyangar commented on pull request #10078: URL: https://github.com/apache/beam/pull/10078#issuecomment-623598544 @aromanenko-dev - Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on pull request #11549: [BEAM-9136] Update doc about docker image license pulling
ibzib commented on pull request #11549: URL: https://github.com/apache/beam/pull/11549#issuecomment-623596078 The folks working on the website migration have requested that we pause website changes until they are finished with #11554 ([email thread](https://lists.apache.org/thread.html/r6b999b6d7d1f6cbb94e16bb2deed2b65098a6b14c4ac98707fe0c36a%40%3Cdev.beam.apache.org%3E)). @Hannah-Jiang can you please split this PR into 2 separate PRs for release notes and website changes? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on pull request #11585: [BEAM-9860] Make job_endpoint required for PortableRunner
ibzib commented on pull request #11585: URL: https://github.com/apache/beam/pull/11585#issuecomment-623592467 Thanks @boyuanzz, I will take care of it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] boyuanzz commented on pull request #11585: [BEAM-9860] Make job_endpoint required for PortableRunner
boyuanzz commented on pull request #11585: URL: https://github.com/apache/beam/pull/11585#issuecomment-623585130 This PR breaks :sdks:python:test-suites:portable:py2:crossLanguagePortableWordCount and :sdks:python:test-suites:portable:py2:crossLanguagePythonJavaFlink with the same error message: ``` File "apache_beam/pipeline.py", line 525, in run | -- | -- | return self.runner.run_pipeline(self, self._options) | | File "apache_beam/runners/portability/portable_runner.py", line 422, in run_pipeline | | job_service_handle = self.create_job_service(options) | | File "apache_beam/runners/portability/portable_runner.py", line 328, in create_job_service | | server = self.default_job_server(options) | | File "apache_beam/runners/portability/portable_runner.py", line 307, in default_job_server | | 'You must specify a --job_endpoint when using --runner=PortableRunner. ' | | NotImplementedError: You must specify a --job_endpoint when using --runner=PortableRunner. Alternatively, you may specify which portable runner you intend to use, such as --runner=FlinkRunner or --runner=SparkRunner. ``` We should update `sdks/python/apache_beam/examples/wordcount_xlang.py` with the same requirement. File issue here: https://issues.apache.org/jira/browse/BEAM-9875 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] Akshay-Iyangar commented on pull request #10078: [BEAM-8542] Change write to async in AWS SNS IO & remove retry logic
Akshay-Iyangar commented on pull request #10078: URL: https://github.com/apache/beam/pull/10078#issuecomment-623584264 @aromanenko-dev - cool !! fixed the typos This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aromanenko-dev commented on pull request #10078: [BEAM-8542] Change write to async in AWS SNS IO & remove retry logic
aromanenko-dev commented on pull request #10078: URL: https://github.com/apache/beam/pull/10078#issuecomment-623584314 @Akshay-Iyangar Could you squash all your commits and I'll merge it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] reuvenlax opened a new pull request #11600: Allow accessing window parameters in finishBundle
reuvenlax opened a new pull request #11600: URL: https://github.com/apache/beam/pull/11600 A bundle in Beam today can contain many windows. This often makes using finishBundle correctly tricky. Users are sometimes seen keeping maps in their DoFn of windows seen in a bundle so they can properly process them in finishBundle. We also don't support injecting an OutputReceiver in finishBundle, as there's no good way to associate a window to the output. This PR allows injecting a BoundedWindow or OutputReceiver to the finishBundle function. In this case, the finishBundle will be invoked once per window seen in the bundle. If there is no such parameter, then the previous behavior of invoking finishBundle once is preserved. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aromanenko-dev commented on pull request #10078: [BEAM-8542] Change write to async in AWS SNS IO & remove retry logic
aromanenko-dev commented on pull request #10078: URL: https://github.com/apache/beam/pull/10078#issuecomment-623583082 Well, it LGTM for except small typos I mentioned above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aromanenko-dev commented on a change in pull request #10078: [BEAM-8542] Change write to async in AWS SNS IO & remove retry logic
aromanenko-dev commented on a change in pull request #10078: URL: https://github.com/apache/beam/pull/10078#discussion_r419580372 ## File path: sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsResponseCoderTest.java ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.sns; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.util.Optional; +import java.util.OptionalInt; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.junit.Assert; +import org.junit.Test; + +public class SnsResponseCoderTest { + + @Test + public void verifyResponseWithStatusCodeAndText() throws IOException { + +SnsResponse expected = +SnsResponse.create("test-1", OptionalInt.of(200), Optional.of("OK")); + +SnsResponseCoder coder = SnsResponseCoder.of(StringUtf8Coder.of()); +ByteArrayOutputStream output = new ByteArrayOutputStream(); +coder.encode(expected, output); + +ByteArrayInputStream in = new ByteArrayInputStream(output.toByteArray()); +SnsResponse actual = coder.decode(in); + +Assert.assertEquals(expected, actual); + } + + @Test + public void verifyResponseWithStatusAndNoText() throws IOException { +SnsResponse expected = +SnsResponse.create("test-2", OptionalInt.of(200), Optional.empty()); + +SnsResponseCoder coder = SnsResponseCoder.of(StringUtf8Coder.of()); +ByteArrayOutputStream output = new ByteArrayOutputStream(); +coder.encode(expected, output); + +ByteArrayInputStream in = new ByteArrayInputStream(output.toByteArray()); +SnsResponse actual = coder.decode(in); + +Assert.assertEquals(expected, actual); + } + + @Test + public void verifyResponseWIthNoStatusCodeAndText() throws IOException { Review comment: nit: "WIth" type capital "I" ## File path: sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/sns/SnsResponseCoderTest.java ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws2.sns; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.util.Optional; +import java.util.OptionalInt; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.junit.Assert; +import org.junit.Test; + +public class SnsResponseCoderTest { + + @Test + public void verifyResponseWithStatusCodeAndText() throws IOException { + +SnsResponse expected = +SnsResponse.create("test-1", OptionalInt.of(200), Optional.of("OK")); + +SnsResponseCoder coder = SnsResponseCoder.of(StringUtf8Coder.of()); +ByteArrayOutputStream output = new ByteArrayOutputStream(); +coder.encode(expected, output); + +ByteArrayInputStream in = new ByteArrayInputStream(output.toByteArray()); +SnsResponse actual = coder.decode(in); + +Assert.assertEquals(expected, actual); + } + + @Test + public void verifyResponseWithStatusAndNoText() throws IOException { +SnsResponse expected = +SnsResponse.create("test-2", OptionalInt.of(200), Optional.empty()); + +SnsResponseCoder coder = SnsResponseCoder.of(StringUtf8Coder.of()); +ByteArrayOutputStream output = new ByteArrayOutputStream(); +coder.encode(expected, output); + +ByteArrayInputStream in = new ByteArrayInputStream(out
[GitHub] [beam] reuvenlax commented on pull request #11559: [BEAM-9836] Excluding spark runner for KeyTests
reuvenlax commented on pull request #11559: URL: https://github.com/apache/beam/pull/11559#issuecomment-623580541 Key also makes sense in OnWindowExpiration, and it might make sense in processElements as well. However for test exclusion categories we can create overly-specific category names. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aromanenko-dev commented on a change in pull request #10078: [BEAM-8542] Change write to async in AWS SNS IO & remove retry logic
aromanenko-dev commented on a change in pull request #10078: URL: https://github.com/apache/beam/pull/10078#discussion_r419578585 ## File path: sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sns/SnsIO.java ## @@ -504,18 +504,18 @@ public void tearDown() { checkArgument(getCoder() != null, "withElementCoder() needs to called"); return input - .apply(ParDo.of(new SnsAsyncWriterFn<>(this))) + .apply(ParDo.of(new SnsWriteAsyncrFn<>(this))) .setCoder(SnsResponseCoder.of(getCoder())); } -private static class SnsAsyncWriterFn extends DoFn> { +private static class SnsWriteAsyncrFn extends DoFn> { - private static final Logger LOG = LoggerFactory.getLogger(SnsAsyncWriterFn.class); + private static final Logger LOG = LoggerFactory.getLogger(SnsWriteAsyncrFn.class); - private final AsyncWrite spec; + private final WriteAsync spec; private transient SnsAsyncClient client; - SnsAsyncWriterFn(AsyncWrite spec) { + SnsWriteAsyncrFn(WriteAsync spec) { Review comment: nit: `SnsWriteAsyncrFn` - I guess a letter "r" before "Fn" is redundant. Please remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mxm removed a comment on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks
mxm removed a comment on pull request #11558: URL: https://github.com/apache/beam/pull/11558#issuecomment-623577185 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mxm commented on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks
mxm commented on pull request #11558: URL: https://github.com/apache/beam/pull/11558#issuecomment-623577363 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org