[GitHub] [beam] y1chi opened a new pull request #12096: Change dataflow runner to reify windowed value for batch stateful dof…
y1chi opened a new pull request #12096: URL: https://github.com/apache/beam/pull/12096 …n and restore timestamp This should at least allow the batch stateful dofn to access the original timestamp of the windowed value. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https
[GitHub] [beam] ZijieSong946 commented on a change in pull request #12054: [BEAM-10219] Support ZetaSQL TIME functions in BeamSQL
ZijieSong946 commented on a change in pull request #12054: URL: https://github.com/apache/beam/pull/12054#discussion_r445974593 ## File path: sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java ## @@ -452,21 +412,32 @@ public void testNullDatetimeFields() { .addNullableField("year_with_null", FieldType.INT64) .addField("mm", FieldType.INT64) .addNullableField("month_with_null", FieldType.INT64) -.addField("time_with_hour_added", FieldType.DATETIME) Review comment: Yeah. But it seems nothing wrong with these test cases. I think we could just leave it here right now and update them when we implementing TIMESTAMP type translation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ZijieSong946 commented on a change in pull request #12054: [BEAM-10219] Support ZetaSQL TIME functions in BeamSQL
ZijieSong946 commented on a change in pull request #12054: URL: https://github.com/apache/beam/pull/12054#discussion_r445971273 ## File path: sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java ## @@ -2457,6 +2458,256 @@ public void testDateFromUnixInt64() { pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } + / + // TIME type tests + / + + @Test + public void testTimeLiteral() { +String sql = "SELECT TIME '15:30:00'"; Review comment: TIME literal for millisecond and microsecond precision tests added. Nanosecond precision are not supported at this point. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ZijieSong946 commented on a change in pull request #12054: [BEAM-10219] Support ZetaSQL TIME functions in BeamSQL
ZijieSong946 commented on a change in pull request #12054: URL: https://github.com/apache/beam/pull/12054#discussion_r445972137 ## File path: sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java ## @@ -2457,6 +2458,256 @@ public void testDateFromUnixInt64() { pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } + / + // TIME type tests + / + + @Test + public void testTimeLiteral() { +String sql = "SELECT TIME '15:30:00'"; + +ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); +BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); +PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + +PAssert.that(stream) +.containsInAnyOrder( +Row.withSchema(Schema.builder().addLogicalTypeField("f_time", SqlTypes.TIME).build()) +.addValues(LocalTime.of(15, 30, 0, 0)) +.build()); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } + + @Test + public void testTimeColumn() { +String sql = "SELECT FORMAT_TIME('%T', time_field) FROM table_with_time"; + +ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); +BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); +PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + +PAssert.that(stream) +.containsInAnyOrder( + Row.withSchema(Schema.builder().addStringField("f_time_str").build()) +.addValues("15:30:00") +.build(), + Row.withSchema(Schema.builder().addStringField("f_time_str").build()) +.addValues("23:35:59") +.build()); + pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); + } + + // TODO[BEAM-9166]: Add a test for CURRENT_TIME function ("SELECT CURRENT_TIME()") + + @Test + public void testExtractTime() { +String sql = +"SELECT " ++ "EXTRACT(HOUR FROM TIME '15:30:35') as hour, " ++ "EXTRACT(MINUTE FROM TIME '15:30:35') as minute, " ++ "EXTRACT(SECOND FROM TIME '15:30:35') as second, " ++ "EXTRACT(MILLISECOND FROM TIME '15:30:35') as millisecond, " ++ "EXTRACT(MICROSECOND FROM TIME '15:30:35') as microsecond "; Review comment: Extract for millisecond and microsecond precision tests added. Nanosecond precision are not supported at this point. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ZijieSong946 commented on a change in pull request #12054: [BEAM-10219] Support ZetaSQL TIME functions in BeamSQL
ZijieSong946 commented on a change in pull request #12054: URL: https://github.com/apache/beam/pull/12054#discussion_r445971273 ## File path: sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java ## @@ -2457,6 +2458,256 @@ public void testDateFromUnixInt64() { pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } + / + // TIME type tests + / + + @Test + public void testTimeLiteral() { +String sql = "SELECT TIME '15:30:00'"; Review comment: Tests with sub-second components added, including: - millisecond - microsecond Nanosecond precision are not supported for TIME type at this point. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ZijieSong946 commented on a change in pull request #12054: [BEAM-10219] Support ZetaSQL TIME functions in BeamSQL
ZijieSong946 commented on a change in pull request #12054: URL: https://github.com/apache/beam/pull/12054#discussion_r445971273 ## File path: sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java ## @@ -2457,6 +2458,256 @@ public void testDateFromUnixInt64() { pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } + / + // TIME type tests + / + + @Test + public void testTimeLiteral() { +String sql = "SELECT TIME '15:30:00'"; Review comment: Tests with sub-second components added, including: - Millisecond - Microsecond Nanosecond precision are not supported for TIME type at this point. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ZijieSong946 commented on a change in pull request #12054: [BEAM-10219] Support ZetaSQL TIME functions in BeamSQL
ZijieSong946 commented on a change in pull request #12054: URL: https://github.com/apache/beam/pull/12054#discussion_r445971273 ## File path: sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java ## @@ -2457,6 +2458,256 @@ public void testDateFromUnixInt64() { pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } + / + // TIME type tests + / + + @Test + public void testTimeLiteral() { +String sql = "SELECT TIME '15:30:00'"; Review comment: Tests with sub-second components added, including: - MILLISECOND - MICROSECOND NANOSECOND precision are not supported for TIME type at this point. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ZijieSong946 commented on pull request #12054: [BEAM-10219] Support ZetaSQL TIME functions in BeamSQL
ZijieSong946 commented on pull request #12054: URL: https://github.com/apache/beam/pull/12054#issuecomment-649971677 > This ([robinyqiu@06c49a1](https://github.com/robinyqiu/beam/commit/06c49a15ba36f23cf3dac54669d5d3fa1f5873c2)) should fix the precision issue with TIME literal. Please patch it to your PR. > > The default ZetaSQL analyzer setup supports microsecond precision (if wants nanoseconds you need to set an language option). I believe it's enough for us to support microsecond precision. Other system like BigQuery also support microsecond. Fixed. Thanks for figuring out the solution. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ZijieSong946 commented on a change in pull request #12054: [BEAM-10219] Support ZetaSQL TIME functions in BeamSQL
ZijieSong946 commented on a change in pull request #12054: URL: https://github.com/apache/beam/pull/12054#discussion_r445965828 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java ## @@ -315,34 +317,38 @@ private static Expression castOutput(Expression value, FieldType toType) { private static Expression castOutputTime(Expression value, FieldType toType) { Expression valueDateTime = value; -// First, convert to millis (except for DATE type) +// Convert TIMESTAMP to joda Instant Review comment: Done. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ZijieSong946 commented on a change in pull request #12054: [BEAM-10219] Support ZetaSQL TIME functions in BeamSQL
ZijieSong946 commented on a change in pull request #12054: URL: https://github.com/apache/beam/pull/12054#discussion_r445965109 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java ## @@ -431,11 +437,11 @@ private static Expression value( private static Expression value(Expression value, Schema.FieldType type) { if (type.getTypeName().isLogicalType()) { String logicalId = type.getLogicalType().getIdentifier(); -if (TimeType.IDENTIFIER.equals(logicalId)) { +if (SqlTypes.TIME.getIdentifier().equals(logicalId)) { return nullOr( - value, Expressions.convert_(Expressions.call(value, "getMillis"), int.class)); + value, Expressions.divide(value, Expressions.constant(NANOS_PER_MILLISECOND))); } else if (SqlTypes.DATE.getIdentifier().equals(logicalId)) { - value = nullOr(value, value); + return nullOr(value, value); Review comment: Yeah. It works if we just return value here. So I have changed it to make it simpler. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ZijieSong946 commented on a change in pull request #12054: [BEAM-10219] Support ZetaSQL TIME functions in BeamSQL
ZijieSong946 commented on a change in pull request #12054: URL: https://github.com/apache/beam/pull/12054#discussion_r445965109 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java ## @@ -431,11 +437,11 @@ private static Expression value( private static Expression value(Expression value, Schema.FieldType type) { if (type.getTypeName().isLogicalType()) { String logicalId = type.getLogicalType().getIdentifier(); -if (TimeType.IDENTIFIER.equals(logicalId)) { +if (SqlTypes.TIME.getIdentifier().equals(logicalId)) { return nullOr( - value, Expressions.convert_(Expressions.call(value, "getMillis"), int.class)); + value, Expressions.divide(value, Expressions.constant(NANOS_PER_MILLISECOND))); } else if (SqlTypes.DATE.getIdentifier().equals(logicalId)) { - value = nullOr(value, value); + return nullOr(value, value); Review comment: Yeah. It works if we just return value here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] udim commented on a change in pull request #12009: [BEAM-10258] Support type hint annotations on PTransform's expand()
udim commented on a change in pull request #12009: URL: https://github.com/apache/beam/pull/12009#discussion_r445942319 ## File path: sdks/python/apache_beam/typehints/typehints_test_py3.py ## @@ -46,11 +51,59 @@ class MyDoFn(DoFn): def process(self, element: int) -> Iterable[str]: pass -print(MyDoFn().get_type_hints()) th = MyDoFn().get_type_hints() self.assertEqual(th.input_types, ((int, ), {})) self.assertEqual(th.output_types, ((str, ), {})) +class TestPTransformAnnotations(unittest.TestCase): + def test_pep484_annotations(self): +class MyPTransform(PTransform): + def expand(self, pcoll: PCollection[int]) -> PCollection[str]: +return pcoll | Map(lambda num: str(num)) + +th = MyPTransform().get_type_hints() +self.assertEqual(th.input_types, ((int, ), {})) +self.assertEqual(th.output_types, ((str, ), {})) + + def test_annotations_without_pcollection_wrapper(self): +class MyPTransform(PTransform): + def expand(self, pcoll: int) -> str: +return pcoll | Map(lambda num: str(num)) + +error_str = 'An input typehint to a PTransform must be a single (or nested) type wrapped by a PCollection or PBegin. ' + +with self.assertRaisesRegex(TypeCheckError, error_str): Review comment: Probably because there are unescaped regex control characters in that string: `().` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] udim commented on pull request #12009: [BEAM-10258] Support type hint annotations on PTransform's expand()
udim commented on pull request #12009: URL: https://github.com/apache/beam/pull/12009#issuecomment-649924273 > The two functions **strip_pcoll_input** and **strip_pcoll_output_** are very similar. Could be refactored into one function. What do you think Yeah, they do look similar enough to merge. It's your call. If they do get much longer though they probably should be merged or using a shared method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] udim commented on a change in pull request #12009: [BEAM-10258] Support type hint annotations on PTransform's expand()
udim commented on a change in pull request #12009: URL: https://github.com/apache/beam/pull/12009#discussion_r445941065 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -378,6 +379,61 @@ def has_simple_output_type(self): self.output_types and len(self.output_types[0]) == 1 and not self.output_types[1]) + def strip_pcoll_input(self): +# type: () -> IOTypeHints + +error_str = 'An input typehint to a PTransform must be a single (or nested) type wrapped by a PCollection or ' \ +'PBegin. ' + +if any(element is None for element in [self.input_types, self.input_types[0], self.input_types[0][0]]): Review comment: This line will raise an exception when self.input_types is None, compared to: ```py if self.input_types is None or len(self.input_types[0]) != 1: ``` It doesn't take advantage of short-circuit evaluation. `self.input_types = None` is a valid value. It means nothing was set. (We should get rid of that and always have ((), {}) as the input/output_types value if there are no type hints, but I haven't had the chance to do that.) Same goes for having zero type hints: it's valid and there's nothing to do. Might be useful to reuse: `self._has_input_types`, `self.has_simple_output_type`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pulasthi commented on pull request #10888: [BEAM-7304] Twister2 Beam runner
pulasthi commented on pull request #10888: URL: https://github.com/apache/beam/pull/10888#issuecomment-649920371 @iemejia Just made the changes, Sorry about the delay, Let me know if anything else is needed from my end, I will start work on the document PR's This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pulasthi commented on a change in pull request #10888: [BEAM-7304] Twister2 Beam runner
pulasthi commented on a change in pull request #10888: URL: https://github.com/apache/beam/pull/10888#discussion_r445938844 ## File path: runners/twister2/src/main/java/org/apache/beam/runners/twister2/Twister2LegacyRunner.java ## @@ -0,0 +1,342 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.twister2; + +import static org.apache.beam.runners.core.construction.resources.PipelineResources.detectClassPathResourcesToStage; + +import edu.iu.dsc.tws.api.JobConfig; +import edu.iu.dsc.tws.api.Twister2Job; +import edu.iu.dsc.tws.api.config.Config; +import edu.iu.dsc.tws.api.driver.DriverJobState; +import edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException; +import edu.iu.dsc.tws.api.scheduler.Twister2JobState; +import edu.iu.dsc.tws.api.tset.sets.TSet; +import edu.iu.dsc.tws.api.tset.sets.batch.BatchTSet; +import edu.iu.dsc.tws.local.LocalSubmitter; +import edu.iu.dsc.tws.rsched.core.ResourceAllocator; +import edu.iu.dsc.tws.rsched.job.Twister2Submitter; +import java.io.File; +import java.io.FileInputStream; +import java.io.FileNotFoundException; +import java.io.FileOutputStream; +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.HashSet; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.logging.LogManager; +import java.util.logging.Logger; +import java.util.stream.Collectors; +import java.util.zip.ZipEntry; +import java.util.zip.ZipOutputStream; +import org.apache.beam.runners.core.construction.PTransformMatchers; +import org.apache.beam.runners.core.construction.PTransformTranslation; +import org.apache.beam.runners.core.construction.SplittableParDo; +import org.apache.beam.runners.core.construction.SplittableParDoNaiveBounded; +import org.apache.beam.runners.core.construction.resources.PipelineResources; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.PipelineResult; +import org.apache.beam.sdk.PipelineRunner; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.PipelineOptionsValidator; +import org.apache.beam.sdk.runners.PTransformOverride; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** + * A {@link PipelineRunner} that executes the operations in the pipeline by first translating them + * to a Twister2 Plan and then executing them either locally or on a Twister2 cluster, depending on + * the configuration. + */ +public class Twister2LegacyRunner extends PipelineRunner { Review comment: Renamed it to Twister2Runner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] henryken commented on a change in pull request #12066: [BEAM-9679] Add Side Output to Core Transform Go SDK katas
henryken commented on a change in pull request #12066: URL: https://github.com/apache/beam/pull/12066#discussion_r445934652 ## File path: learning/katas/go/Core Transforms/Side Output/Side Output/pkg/task/task.go ## @@ -0,0 +1,30 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package task + +import ( + "github.com/apache/beam/sdks/go/pkg/beam" +) + +func ApplyTransform(s beam.Scope, input beam.PCollection) (beam.PCollection, beam.PCollection) { + return beam.ParDo2(s, func(element int, numBelow100, numAbove100 func(int)) { + if element <= 100 { + numBelow100(element) + return + } + numAbove100(element) Review comment: While this works too, maybe you would want to consider using `else` statement. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] udim commented on a change in pull request #12088: Add output typehints to GroupIntoBatches
udim commented on a change in pull request #12088: URL: https://github.com/apache/beam/pull/12088#discussion_r445927300 ## File path: sdks/python/apache_beam/transforms/util.py ## @@ -741,6 +741,7 @@ def WithKeys(pcoll, k): @experimental() @typehints.with_input_types(Tuple[K, V]) +@typehints.with_output_types(Tuple[K, List[V]]) Review comment: The Java implementation splits (key, value) pairs. https://github.com/apache/beam/blob/eaa41cc4cbcc4f94d0ec1a36ff2b0f3fcee962f9/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java#L173-L174 I don't see that in Python - is the runner supposed to do that? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] udim commented on a change in pull request #12088: Add output typehints to GroupIntoBatches
udim commented on a change in pull request #12088: URL: https://github.com/apache/beam/pull/12088#discussion_r445925136 ## File path: sdks/python/apache_beam/transforms/util.py ## @@ -741,6 +741,7 @@ def WithKeys(pcoll, k): @experimental() @typehints.with_input_types(Tuple[K, V]) +@typehints.with_output_types(Tuple[K, List[V]]) Review comment: The test passes because 2 batches/elements are output. And I was running the test via pytest as usual. I don't follow the state api reasoning, and this is not documented in GroupIntoBatches. If you get a different result in DirectRunner vs DataflowRunner, it's a bug. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #12067: [BEAM-10308] Make component ID assignments consistent across PipelineContext instances
TheNeuralBit commented on pull request #12067: URL: https://github.com/apache/beam/pull/12067#issuecomment-649898941 I'll see if I can add a simple unit test tomorrow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12090: Added schema aware abstraction SchemaIO and implemented for pubsub
TheNeuralBit commented on a change in pull request #12090: URL: https://github.com/apache/beam/pull/12090#discussion_r445908191 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubJsonTableProvider.java ## @@ -47,32 +44,40 @@ @Experimental @AutoService(TableProvider.class) public class PubsubJsonTableProvider extends InMemoryMetaTableProvider { + static final String TIMESTAMP_FIELD = "event_timestamp"; + static final String ATTRIBUTES_FIELD = "attributes"; + static final String PAYLOAD_FIELD = "payload"; @Override public String getTableType() { return "pubsub"; } @Override - public BeamSqlTable buildBeamSqlTable(Table tableDefintion) { -JSONObject tableProperties = tableDefintion.getProperties(); + public BeamSqlTable buildBeamSqlTable(Table tableDefinition) { +JSONObject tableProperties = tableDefinition.getProperties(); String timestampAttributeKey = tableProperties.getString("timestampAttributeKey"); String deadLetterQueue = tableProperties.getString("deadLetterQueue"); -validateDlq(deadLetterQueue); -Schema schema = tableDefintion.getSchema(); +Schema schema = tableDefinition.getSchema(); +String location = tableDefinition.getLocation(); +Schema dataSchema = tableDefinition.getSchema(); + +validateDlq(deadLetterQueue); validateEventTimestamp(schema); Review comment: Could you move this validation to the new implementation (probably in the `from` method)? We'll want to make sure we still do the validation when this is used outside of SQL (e.g. for cross-language). ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubSchemaIO.java ## @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.pubsub; + +import static org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageToRow.DLQ_TAG; +import static org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageToRow.MAIN_TAG; + +import java.io.Serializable; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.io.SchemaIO; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.POutput; +import org.apache.beam.sdk.values.Row; + +/** An abstraction to create schema aware IOs. */ +@Internal +public class PubsubSchemaIO implements SchemaIO, Serializable { Review comment: I think we should make this a `private static` inner class within `PubsubSchemaCapableIOProvider` so no one tries to use it on its own. ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/package-info.java ## @@ -0,0 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Provides a new schema aware IO abstraction interface. */ Review comment: nit: it won't be new forever :) ```suggestion /** Provides abstractions for schema-aware IOs. */ ``` ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubJsonTableProvider.java ## @@ -47,32 +44,40 @@ @Experimental @AutoService(TableProvider.class) public class PubsubJsonTable
[GitHub] [beam] epicfaace commented on pull request #11824: [BEAM-10101] Add HttpIO / HttpFileSystem (Python)
epicfaace commented on pull request #11824: URL: https://github.com/apache/beam/pull/11824#issuecomment-649895785 Oh, it appears the test failures are only on Python 2 -- perhaps I didn't define the class properly for Python 2 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #12085: [BEAM-10318] fix uninitialized grpc_server in FnApiRunner
chamikaramj commented on pull request #12085: URL: https://github.com/apache/beam/pull/12085#issuecomment-649895565 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #12010: [BEAM-10259] Use ref-counted connection to Spanner to prevent multiple connections.
chamikaramj commented on pull request #12010: URL: https://github.com/apache/beam/pull/12010#issuecomment-649895281 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ajamato commented on pull request #12083: [BEAM-10317] Java - Update BigQueryIO to tag BigQuery Jobs with the Dataflow Job ID
ajamato commented on pull request #12083: URL: https://github.com/apache/beam/pull/12083#issuecomment-649889368 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ajamato commented on pull request #12084: [BEAM-10317] Python - Update BigQueryIO to tag BigQuery Jobs with the Dataflow Job ID
ajamato commented on pull request #12084: URL: https://github.com/apache/beam/pull/12084#issuecomment-649888199 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #12076: [BEAM-10143] Add test verifying external windowing
chamikaramj commented on pull request #12076: URL: https://github.com/apache/beam/pull/12076#issuecomment-649888111 Closing this to avoid confusion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj closed pull request #12076: [BEAM-10143] Add test verifying external windowing
chamikaramj closed pull request #12076: URL: https://github.com/apache/beam/pull/12076 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11967: [BEAM-9992] | use Sets transform in BeamSQL
aaltay commented on pull request #11967: URL: https://github.com/apache/beam/pull/11967#issuecomment-649888075 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #12067: [BEAM-10308] Make component ID assignments consistent across PipelineContext instances
chamikaramj commented on pull request #12067: URL: https://github.com/apache/beam/pull/12067#issuecomment-649888003 Do you think we can add another unit test that shows how current state will result in an invalid proto during cross-language expansion ? That will help us understand the scope of the issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #12076: [BEAM-10143] Add test verifying external windowing
chamikaramj commented on pull request #12076: URL: https://github.com/apache/beam/pull/12076#issuecomment-649887627 Ah, sorry about that. Will look at the other one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] angoenka commented on a change in pull request #12032: [BEAM-10275] Don't eagerly parse pipeline options in sdk_worker_main.py
angoenka commented on a change in pull request #12032: URL: https://github.com/apache/beam/pull/12032#discussion_r445910583 ## File path: sdks/python/apache_beam/runners/worker/sdk_worker_main_test.py ## @@ -86,6 +92,20 @@ def test_parse_pipeline_options(self): '{"options": {"eam:option:m_option:v":"mock_val"}}'). get_all_options(drop_default=True)) + @unittest.skip( + 'BEAM-10274: type=json.loads pipeline options cannot be parsed in the ' + 'python SDK harness') + def test_parse_json_object(self): Review comment: Should we enable this tests after the fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] youngoli commented on pull request #12095: [BEAM-10311] Sharing restriction trackers between SDF and DataSource.
youngoli commented on pull request #12095: URL: https://github.com/apache/beam/pull/12095#issuecomment-649885158 R: @lostluck Ended up being a really small PR. I tried to think of a way to test it (like testing that the channel gets set in the DataSource maybe), but it seems too specific of an interaction to test, and any test for it would be a bit complicated. For context of what I'm intending when it comes to the actual splitting: The idea is the DataSource will receive the restriction tracker from the channel when it needs to check progress or perform a split, and then return the restriction tracker to the channel when done with it. The goal is to make sure there's only one copy of that tracker being sent around, so that nothing else is using the tracker when processing is done and the tracker is getting deleted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #12090: Added schema aware abstraction SchemaIO and implemented for pubsub
TheNeuralBit commented on pull request #12090: URL: https://github.com/apache/beam/pull/12090#issuecomment-649884063 Run SQL PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] youngoli opened a new pull request #12095: [BEAM-10311] Sharing restriction trackers between SDF and DataSource.
youngoli opened a new pull request #12095: URL: https://github.com/apache/beam/pull/12095 Adding a channel that can be used to retrieve the current restriction tracker from the DataSource. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_Post
[GitHub] [beam] TheNeuralBit commented on pull request #12076: [BEAM-10143] Add test verifying external windowing
TheNeuralBit commented on pull request #12076: URL: https://github.com/apache/beam/pull/12076#issuecomment-649880575 @chamikaramj unfortunately I don't think this PR will work. It breaks a lot of tests that rely on components getting a specific component id. Sorry for some reason I broke this up into two PRs, I left a comment to that effect in the other one: https://github.com/apache/beam/pull/12067#issuecomment-649046980 I don't think something like this will work unless I can scope the id cache/assigner to the pipeline. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs
pabloem commented on pull request #12082: URL: https://github.com/apache/beam/pull/12082#issuecomment-649880288 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12076: [BEAM-10143] Add test verifying external windowing
TheNeuralBit commented on a change in pull request #12076: URL: https://github.com/apache/beam/pull/12076#discussion_r445899639 ## File path: sdks/python/apache_beam/transforms/sql_test.py ## @@ -157,6 +157,21 @@ def test_zetasql_generate_data(self): dialect="zetasql") assert_that(out, equal_to([(1, "foo", 3.14)])) + def test_windowing_before_sql(self): +with TestPipeline() as p: + windowed = ( + p | beam.Create([ + SimpleRow(5, "foo", 1.), + SimpleRow(15, "bar", 2.), + SimpleRow(25, "baz", 3.) + ]) + | beam.Map(lambda v: beam.window.TimestampedValue(v, v.id)). + with_output_types(SimpleRow) + | beam.WindowInto( + beam.window.FixedWindows(10)).with_output_types(SimpleRow) + | SqlTransform("SELECT COUNT(*) as `count` FROM PCOLLECTION")) + assert_that(out, equal_to([(1, ), (1, ), (1, )])) Review comment: Before the fix it replicates the failure here: https://issues.apache.org/jira/browse/BEAM-10143 It fails at execution time because it expects a global window, but receives a (correct) interval window This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] darshanj commented on pull request #11967: [BEAM-9992] | use Sets transform in BeamSQL
darshanj commented on pull request #11967: URL: https://github.com/apache/beam/pull/11967#issuecomment-649873766 They are not related. Can you please retest this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs
pabloem commented on pull request #12082: URL: https://github.com/apache/beam/pull/12082#issuecomment-649873640 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] damgad commented on pull request #12072: Adding Jenkins job cleaning the /tmp directory on Jenkins node machine
damgad commented on pull request #12072: URL: https://github.com/apache/beam/pull/12072#issuecomment-649872679 Works like a charm ;) https://ci-beam.apache.org/job/beam_Clean_tmp_directory/3/console This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robinyqiu commented on a change in pull request #12054: [BEAM-10219] Support ZetaSQL TIME functions in BeamSQL
robinyqiu commented on a change in pull request #12054: URL: https://github.com/apache/beam/pull/12054#discussion_r445897849 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java ## @@ -431,11 +437,11 @@ private static Expression value( private static Expression value(Expression value, Schema.FieldType type) { if (type.getTypeName().isLogicalType()) { String logicalId = type.getLogicalType().getIdentifier(); -if (TimeType.IDENTIFIER.equals(logicalId)) { +if (SqlTypes.TIME.getIdentifier().equals(logicalId)) { return nullOr( - value, Expressions.convert_(Expressions.call(value, "getMillis"), int.class)); + value, Expressions.divide(value, Expressions.constant(NANOS_PER_MILLISECOND))); } else if (SqlTypes.DATE.getIdentifier().equals(logicalId)) { - value = nullOr(value, value); + return nullOr(value, value); Review comment: `nullOr()` also does boxing for `value` if it is not null. Is it necessary in this case? I tried to remove the call to `nullOr()` and it works fine as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robinyqiu commented on pull request #12054: [BEAM-10219] Support ZetaSQL TIME functions in BeamSQL
robinyqiu commented on pull request #12054: URL: https://github.com/apache/beam/pull/12054#issuecomment-649871710 This (https://github.com/robinyqiu/beam/commit/06c49a15ba36f23cf3dac54669d5d3fa1f5873c2) should fix the precision issue with TIME literal. Please patch it to your PR. The default ZetaSQL analyzer setup supports microsecond precision (if wants nanoseconds you need to set an language option). I believe it's enough for us to support microsecond precision. Other system like BigQuery also support microsecond. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #12076: [BEAM-10143] Add test verifying external windowing
chamikaramj commented on pull request #12076: URL: https://github.com/apache/beam/pull/12076#issuecomment-649870795 cc: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #12076: [BEAM-10143] Add test verifying external windowing
chamikaramj commented on pull request #12076: URL: https://github.com/apache/beam/pull/12076#issuecomment-649870730 Also, please make sure that existing x-lang tests pass for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on a change in pull request #12076: [BEAM-10143] Add test verifying external windowing
chamikaramj commented on a change in pull request #12076: URL: https://github.com/apache/beam/pull/12076#discussion_r445895416 ## File path: sdks/python/apache_beam/transforms/sql_test.py ## @@ -157,6 +157,21 @@ def test_zetasql_generate_data(self): dialect="zetasql") assert_that(out, equal_to([(1, "foo", 3.14)])) + def test_windowing_before_sql(self): Review comment: Do you think we could come up with a test where PTransforms get assigned to wrong IDs during pipeline expansion (resulting in an invalid pipeline proto) ? ## File path: sdks/python/apache_beam/transforms/sql_test.py ## @@ -157,6 +157,21 @@ def test_zetasql_generate_data(self): dialect="zetasql") assert_that(out, equal_to([(1, "foo", 3.14)])) + def test_windowing_before_sql(self): +with TestPipeline() as p: + windowed = ( + p | beam.Create([ + SimpleRow(5, "foo", 1.), + SimpleRow(15, "bar", 2.), + SimpleRow(25, "baz", 3.) + ]) + | beam.Map(lambda v: beam.window.TimestampedValue(v, v.id)). + with_output_types(SimpleRow) + | beam.WindowInto( + beam.window.FixedWindows(10)).with_output_types(SimpleRow) + | SqlTransform("SELECT COUNT(*) as `count` FROM PCOLLECTION")) + assert_that(out, equal_to([(1, ), (1, ), (1, )])) Review comment: Just curios, Is this failing before the fix or does it pass with a wrong pipeline ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on a change in pull request #12088: Add output typehints to GroupIntoBatches
aaltay commented on a change in pull request #12088: URL: https://github.com/apache/beam/pull/12088#discussion_r445895986 ## File path: sdks/python/apache_beam/transforms/util.py ## @@ -741,6 +741,7 @@ def WithKeys(pcoll, k): @experimental() @typehints.with_input_types(Tuple[K, V]) +@typehints.with_output_types(Tuple[K, List[V]]) Review comment: Thank you. This might be a bug with whatever runner Udi used (one of the direct runners). I am still puzzled by what would be a good implementation of this case in batch. Would a naive implementation be always insert a shuffle, and use an memory backed state store for the batch case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on a change in pull request #12076: [BEAM-10143] Add test verifying external windowing
chamikaramj commented on a change in pull request #12076: URL: https://github.com/apache/beam/pull/12076#discussion_r445894984 ## File path: sdks/python/apache_beam/runners/pipeline_context_test.py ## @@ -53,6 +53,18 @@ def test_serialization(self): self.assertEqual( coders.BytesCoder(), context2.coders.get_by_id(bytes_coder_ref)) + def test_ref_assignment_spans_multiple_instances(self): +context = pipeline_context.PipelineContext() +float_coder_ref = context.coders.get_id(coders.FloatCoder()) +bytes_coder_ref = context.coders.get_id(coders.BytesCoder()) +context2 = pipeline_context.PipelineContext() + Review comment: Nit: remove extra line This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kennknowles commented on a change in pull request #12088: Add output typehints to GroupIntoBatches
kennknowles commented on a change in pull request #12088: URL: https://github.com/apache/beam/pull/12088#discussion_r445894269 ## File path: sdks/python/apache_beam/transforms/util.py ## @@ -741,6 +741,7 @@ def WithKeys(pcoll, k): @experimental() @typehints.with_input_types(Tuple[K, V]) +@typehints.with_output_types(Tuple[K, List[V]]) Review comment: A very real situation is that it might already be partitioned however the runner wants, via analyzing the pipeline. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kennknowles commented on a change in pull request #12088: Add output typehints to GroupIntoBatches
kennknowles commented on a change in pull request #12088: URL: https://github.com/apache/beam/pull/12088#discussion_r445894084 ## File path: sdks/python/apache_beam/transforms/util.py ## @@ -741,6 +741,7 @@ def WithKeys(pcoll, k): @experimental() @typehints.with_input_types(Tuple[K, V]) +@typehints.with_output_types(Tuple[K, List[V]]) Review comment: Correct. It is up to the runner how to implement, but per key and window must be executed serially. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11824: [BEAM-10101] Add HttpIO / HttpFileSystem (Python)
pabloem commented on pull request #11824: URL: https://github.com/apache/beam/pull/11824#issuecomment-649867747 okay let me figure out why that's happening... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] damondouglas commented on pull request #12066: [BEAM-9679] Add Side Output to Core Transform Go SDK katas
damondouglas commented on pull request #12066: URL: https://github.com/apache/beam/pull/12066#issuecomment-649865228 No problem at all. Please take your time and thank you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #12083: [BEAM-10317] Java - Update BigQueryIO to tag BigQuery Jobs with the Dataflow Job ID
pabloem commented on pull request #12083: URL: https://github.com/apache/beam/pull/12083#issuecomment-649864733 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #12093: [BEAM-10303] Add support for the non-window observing optimization to DoFn execution.
lukecwik commented on pull request #12093: URL: https://github.com/apache/beam/pull/12093#issuecomment-649863134 CC: @reuvenlax This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #12093: [BEAM-10303] Add support for the non-window observing optimization to DoFn execution.
lukecwik commented on pull request #12093: URL: https://github.com/apache/beam/pull/12093#issuecomment-649862731 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #10900: [BEAM-9335] update hard-coded coder id when translating Java external transforms
TheNeuralBit commented on a change in pull request #10900: URL: https://github.com/apache/beam/pull/10900#discussion_r445880888 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java ## @@ -222,6 +226,20 @@ public OutputT expand(InputT input) { }); externalPCollectionIdMap = externalPCollectionIdMapBuilder.build(); + Map externalCoderIdMapBuilder = new HashMap<>(); + expandedComponents + .getPcollectionsMap() + .forEach( + (pcolId, pCol) -> { +try { + Coder coder = rehydratedComponents.getCoder(pCol.getCoderId()); + externalCoderIdMapBuilder.putIfAbsent(coder, pCol.getCoderId()); +} catch (IOException e) { + throw new RuntimeException("cannot rehydrate Coder."); Review comment: Won't it often be the case that we can't rehydrate the coder, because it's specific to another SDK? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] damgad closed pull request #12094: Jenkins rocks
damgad closed pull request #12094: URL: https://github.com/apache/beam/pull/12094 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] damgad opened a new pull request #12094: Jenkins rocks
damgad opened a new pull request #12094: URL: https://github.com/apache/beam/pull/12094 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/ic
[GitHub] [beam] saavan-google-intern commented on a change in pull request #12009: [BEAM-10258] Support type hint annotations on PTransform's expand()
saavan-google-intern commented on a change in pull request #12009: URL: https://github.com/apache/beam/pull/12009#discussion_r445877050 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -378,6 +379,61 @@ def has_simple_output_type(self): self.output_types and len(self.output_types[0]) == 1 and not self.output_types[1]) + def strip_pcoll_input(self): +# type: () -> IOTypeHints + +error_str = 'An input typehint to a PTransform must be a single (or nested) type wrapped by a PCollection or ' \ +'PBegin. ' + +if any(element is None for element in [self.input_types, self.input_types[0], self.input_types[0][0]]): Review comment: Now that I think about it this can be a simple try/catch IndexError instead of manually checking for None This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] angoenka merged pull request #11993: Change time granularity to seconds in ParDoTest TestStream timer test…
angoenka merged pull request #11993: URL: https://github.com/apache/beam/pull/11993 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #12093: [BEAM-10303] Add support for the non-window observing optimization to DoFn execution.
lukecwik commented on pull request #12093: URL: https://github.com/apache/beam/pull/12093#issuecomment-649850539 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] saavan-google-intern commented on a change in pull request #12009: [BEAM-10258] Support type hint annotations on PTransform's expand()
saavan-google-intern commented on a change in pull request #12009: URL: https://github.com/apache/beam/pull/12009#discussion_r445871780 ## File path: sdks/python/apache_beam/typehints/typehints_test_py3.py ## @@ -46,11 +51,59 @@ class MyDoFn(DoFn): def process(self, element: int) -> Iterable[str]: pass -print(MyDoFn().get_type_hints()) th = MyDoFn().get_type_hints() self.assertEqual(th.input_types, ((int, ), {})) self.assertEqual(th.output_types, ((str, ), {})) +class TestPTransformAnnotations(unittest.TestCase): + def test_pep484_annotations(self): +class MyPTransform(PTransform): + def expand(self, pcoll: PCollection[int]) -> PCollection[str]: +return pcoll | Map(lambda num: str(num)) + +th = MyPTransform().get_type_hints() +self.assertEqual(th.input_types, ((int, ), {})) +self.assertEqual(th.output_types, ((str, ), {})) + + def test_annotations_without_pcollection_wrapper(self): +class MyPTransform(PTransform): + def expand(self, pcoll: int) -> str: +return pcoll | Map(lambda num: str(num)) + +error_str = 'An input typehint to a PTransform must be a single (or nested) type wrapped by a PCollection or PBegin. ' + +with self.assertRaisesRegex(TypeCheckError, error_str): Review comment: Any idea why this test is failing? It says the strings don't match in stdout but they appear to match. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik opened a new pull request #12093: [BEAM-10303] Add support for the non-window observing optimization to DoFn execution.
lukecwik opened a new pull request #12093: URL: https://github.com/apache/beam/pull/12093 This covers all but the splittable DoFn processElements call since I wanted to limit the size of the change. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/b
[GitHub] [beam] lukecwik commented on pull request #12093: [BEAM-10303] Add support for the non-window observing optimization to DoFn execution.
lukecwik commented on pull request #12093: URL: https://github.com/apache/beam/pull/12093#issuecomment-649849369 R: @ibzib This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] saavan-google-intern commented on pull request #12009: [BEAM-10258] Support type hint annotations on PTransform's expand()
saavan-google-intern commented on pull request #12009: URL: https://github.com/apache/beam/pull/12009#issuecomment-649848772 The two functions **strip_pcoll_input** and **strip_pcoll_output_** are very similar. Could be refactored into one function. What do you think This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] damgad commented on pull request #12072: Adding Jenkins job cleaning the /tmp directory on Jenkins node machine
damgad commented on pull request #12072: URL: https://github.com/apache/beam/pull/12072#issuecomment-649844629 Sure, I will :+1: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay merged pull request #12072: Adding Jenkins job cleaning the /tmp directory on Jenkins node machine
aaltay merged pull request #12072: URL: https://github.com/apache/beam/pull/12072 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #12072: Adding Jenkins job cleaning the /tmp directory on Jenkins node machine
aaltay commented on pull request #12072: URL: https://github.com/apache/beam/pull/12072#issuecomment-649844411 Let's merge. Could you test it once it is merged? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] damgad commented on pull request #12072: Adding Jenkins job cleaning the /tmp directory on Jenkins node machine
damgad commented on pull request #12072: URL: https://github.com/apache/beam/pull/12072#issuecomment-649843232 @aaltay The cron version (already deployed on the nodes, today) is less strict. It executes once a week and deletes files that were not accessed during at least two days. That should be fine for most cases, and safer for the executing builds. Here, the mechanism is more powerful, on-demand, and configurable. By specifying the `unaccessed_for` to `0` you can even delete all the tmp files. I've also considered having only that job with a cron trigger along with the manual one. But that's less reliable. In that case, if the node was for some reason overloaded or builds were stuck the cleanup wouldn't happen. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] amaliujia opened a new pull request #12092: [DO NOT MERGE] Prototype TPCDS Benchmark for BeamSQL
amaliujia opened a new pull request #12092: URL: https://github.com/apache/beam/pull/12092 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/b
[GitHub] [beam] saavan-google-intern commented on a change in pull request #12009: [BEAM-10258] Support type hint annotations on PTransform's expand()
saavan-google-intern commented on a change in pull request #12009: URL: https://github.com/apache/beam/pull/12009#discussion_r445866150 ## File path: sdks/python/apache_beam/typehints/typed_pipeline_test_py3.py ## @@ -40,6 +40,14 @@ def process(self, element: int) -> typehints.Tuple[str]: with self.assertRaisesRegex(typehints.TypeCheckError, r'requires.*int.*got.*str'): _ = ['a', 'b', 'c'] | beam.ParDo(MyDoFn()) + def test_pardo_dofn(self): Review comment: That was an accident, thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #12073: [BEAM-9543] Support Match Recognition in Beam SQL
aaltay commented on pull request #12073: URL: https://github.com/apache/beam/pull/12073#issuecomment-649841664 > cc @aaltay Nice, thank you for cc'ing me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] y1chi commented on pull request #11993: Change time granularity to seconds in ParDoTest TestStream timer test…
y1chi commented on pull request #11993: URL: https://github.com/apache/beam/pull/11993#issuecomment-649823298 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] apilloud commented on a change in pull request #12054: [BEAM-10219] Support ZetaSQL TIME functions in BeamSQL
apilloud commented on a change in pull request #12054: URL: https://github.com/apache/beam/pull/12054#discussion_r445832856 ## File path: sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java ## @@ -452,21 +412,32 @@ public void testNullDatetimeFields() { .addNullableField("year_with_null", FieldType.INT64) .addField("mm", FieldType.INT64) .addNullableField("month_with_null", FieldType.INT64) -.addField("time_with_hour_added", FieldType.DATETIME) Review comment: If I remember right, `FieldType.DATETIME` is the SQL type of `TIMESTAMP`. Looks like all the test cases for `TIMESTAMP` have been dropped? ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java ## @@ -315,34 +317,38 @@ private static Expression castOutput(Expression value, FieldType toType) { private static Expression castOutputTime(Expression value, FieldType toType) { Expression valueDateTime = value; -// First, convert to millis (except for DATE type) +// Convert TIMESTAMP to joda Instant Review comment: nit: Can you move each of these comments into the appropriate if block? ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java ## @@ -315,34 +317,38 @@ private static Expression castOutput(Expression value, FieldType toType) { private static Expression castOutputTime(Expression value, FieldType toType) { Expression valueDateTime = value; -// First, convert to millis (except for DATE type) +// Convert TIMESTAMP to joda Instant +// Convert DATE to LocalDate +// Convert TIME to LocalTime if (CalciteUtils.TIMESTAMP.typesEqual(toType) || CalciteUtils.NULLABLE_TIMESTAMP.typesEqual(toType)) { if (value.getType() == java.sql.Timestamp.class) { valueDateTime = Expressions.call(BuiltInMethod.TIMESTAMP_TO_LONG.method, valueDateTime); } + valueDateTime = Expressions.new_(Instant.class, valueDateTime); } else if (CalciteUtils.TIME.typesEqual(toType) || CalciteUtils.NULLABLE_TIME.typesEqual(toType)) { if (value.getType() == java.sql.Time.class) { valueDateTime = Expressions.call(BuiltInMethod.TIME_TO_INT.method, valueDateTime); + } else if (value.getType() == Long.class) { Review comment: nit: There is no test case for this. ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java ## @@ -431,11 +437,11 @@ private static Expression value( private static Expression value(Expression value, Schema.FieldType type) { if (type.getTypeName().isLogicalType()) { String logicalId = type.getLogicalType().getIdentifier(); -if (TimeType.IDENTIFIER.equals(logicalId)) { +if (SqlTypes.TIME.getIdentifier().equals(logicalId)) { return nullOr( - value, Expressions.convert_(Expressions.call(value, "getMillis"), int.class)); + value, Expressions.divide(value, Expressions.constant(NANOS_PER_MILLISECOND))); } else if (SqlTypes.DATE.getIdentifier().equals(logicalId)) { - value = nullOr(value, value); + return nullOr(value, value); Review comment: This is going to generate the code `value == null ? null : value`, which does nothing. drop it all together? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] y1chi commented on a change in pull request #11993: Change time granularity to seconds in ParDoTest TestStream timer test…
y1chi commented on a change in pull request #11993: URL: https://github.com/apache/beam/pull/11993#discussion_r445836802 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java ## @@ -3817,24 +3817,15 @@ public void onTimer( TestStream> stream = TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarIntCoder.of())) // See GlobalWindow, - // END_OF_GLOBAL_WINDOW is TIMESTAMP_MAX_VALUE.minus(Duration.standardDays(1)), - // minus 775 milliseconds to make the grainularity to second as required by dataflow - // for TestStream. - .advanceWatermarkTo( - BoundedWindow.TIMESTAMP_MAX_VALUE - .minus(Duration.standardDays(1)) - .minus(BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis() % 1000)) + // END_OF_GLOBAL_WINDOW is TIMESTAMP_MAX_VALUE.minus(Duration.standardDays(1)) + .advanceWatermarkTo(BoundedWindow.TIMESTAMP_MAX_VALUE.minus(Duration.standardDays(1))) Review comment: done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] epicfaace commented on pull request #11824: [BEAM-10101] Add HttpIO / HttpFileSystem (Python)
epicfaace commented on pull request #11824: URL: https://github.com/apache/beam/pull/11824#issuecomment-649815907 I'm not sure why exactly those failures are happening, since `HttpIO` _does_ take `client` as a constructor. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] epicfaace edited a comment on pull request #11824: [BEAM-10101] Add HttpIO / HttpFileSystem (Python)
epicfaace edited a comment on pull request #11824: URL: https://github.com/apache/beam/pull/11824#issuecomment-649815907 I'm not sure why exactly those failures are happening, since `HttpIO` _does_ take `client` in its constructor. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] amaliujia commented on pull request #12073: [BEAM-9543] Support Match Recognition in Beam SQL
amaliujia commented on pull request #12073: URL: https://github.com/apache/beam/pull/12073#issuecomment-649815316 cc @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] epicfaace commented on a change in pull request #11824: [BEAM-10101] Add HttpIO / HttpFileSystem (Python)
epicfaace commented on a change in pull request #11824: URL: https://github.com/apache/beam/pull/11824#discussion_r445836086 ## File path: sdks/python/apache_beam/io/httpio.py ## @@ -0,0 +1,172 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""This class implements methods to interact with files at HTTP URLs. + +This I/O only implements methods to read with files at HTTP URLs, because +of the variability in methods by which HTTP content can be written +to a server. If you need to write your results to an HTTP endpoint, +you might want to make your own I/O or use another, more specific, +I/O connector. + +""" + +# pytype: skip-file + +from __future__ import absolute_import + +import io +from builtins import object + +from apache_beam.io.filesystem import BeamIOError +from apache_beam.io.filesystemio import Downloader +from apache_beam.io.filesystemio import DownloaderStream +from apache_beam.internal.http_client import get_new_http +import sys +from httplib2 import HttpLib2Error + +REQUEST_FAILED_ERROR_MSG = "HTTP request failed for URL {}: {}" +UNEXPECTED_STATUS_CODE_ERROR_MSG = "Unexpected status code received for URL {}: {} {}" + + +class HttpIO(object): Review comment: ```suggestion class HttpIO: ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] Ardagan commented on pull request #12091: dummy change
Ardagan commented on pull request #12091: URL: https://github.com/apache/beam/pull/12091#issuecomment-649812079 Run PythonFormatter PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] Ardagan removed a comment on pull request #12091: dummy change
Ardagan removed a comment on pull request #12091: URL: https://github.com/apache/beam/pull/12091#issuecomment-649810198 Run PythonFormatter PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] Ardagan commented on pull request #12091: dummy change
Ardagan commented on pull request #12091: URL: https://github.com/apache/beam/pull/12091#issuecomment-649810198 Run PythonFormatter PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on a change in pull request #11993: Change time granularity to seconds in ParDoTest TestStream timer test…
lukecwik commented on a change in pull request #11993: URL: https://github.com/apache/beam/pull/11993#discussion_r445826286 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java ## @@ -3817,24 +3817,15 @@ public void onTimer( TestStream> stream = TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarIntCoder.of())) // See GlobalWindow, - // END_OF_GLOBAL_WINDOW is TIMESTAMP_MAX_VALUE.minus(Duration.standardDays(1)), - // minus 775 milliseconds to make the grainularity to second as required by dataflow - // for TestStream. - .advanceWatermarkTo( - BoundedWindow.TIMESTAMP_MAX_VALUE - .minus(Duration.standardDays(1)) - .minus(BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis() % 1000)) + // END_OF_GLOBAL_WINDOW is TIMESTAMP_MAX_VALUE.minus(Duration.standardDays(1)) + .advanceWatermarkTo(BoundedWindow.TIMESTAMP_MAX_VALUE.minus(Duration.standardDays(1))) Review comment: nit: use GlobalWindow.INSTANCE.maxTimestamp(): https://github.com/apache/beam/blob/7269e3f8e2b7fb185347f8fcb80f0cd5afcbaeb6/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/GlobalWindow.java#L42 Here and below instead of calculating it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] Ardagan opened a new pull request #12091: dummy change
Ardagan opened a new pull request #12091: URL: https://github.com/apache/beam/pull/12091 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/i
[GitHub] [beam] lukecwik commented on a change in pull request #11993: Change time granularity to seconds in ParDoTest TestStream timer test…
lukecwik commented on a change in pull request #11993: URL: https://github.com/apache/beam/pull/11993#discussion_r445826286 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java ## @@ -3817,24 +3817,15 @@ public void onTimer( TestStream> stream = TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarIntCoder.of())) // See GlobalWindow, - // END_OF_GLOBAL_WINDOW is TIMESTAMP_MAX_VALUE.minus(Duration.standardDays(1)), - // minus 775 milliseconds to make the grainularity to second as required by dataflow - // for TestStream. - .advanceWatermarkTo( - BoundedWindow.TIMESTAMP_MAX_VALUE - .minus(Duration.standardDays(1)) - .minus(BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis() % 1000)) + // END_OF_GLOBAL_WINDOW is TIMESTAMP_MAX_VALUE.minus(Duration.standardDays(1)) + .advanceWatermarkTo(BoundedWindow.TIMESTAMP_MAX_VALUE.minus(Duration.standardDays(1))) Review comment: nit: use GlobalWindow.INSTANCE.maxTimestamp(): https://github.com/apache/beam/blob/7269e3f8e2b7fb185347f8fcb80f0cd5afcbaeb6/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/GlobalWindow.java#L42 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11850: [BEAM-1438] Allow 0 shards on WriteFiles streaming
chamikaramj commented on pull request #11850: URL: https://github.com/apache/beam/pull/11850#issuecomment-649804337 Looks like Reuven's last comment in the JIRA is from 2017 :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11850: [BEAM-1438] Allow 0 shards on WriteFiles streaming
chamikaramj commented on pull request #11850: URL: https://github.com/apache/beam/pull/11850#issuecomment-649803784 I think we should at least change some of the benchmarks to not set this flag to determine if there's a performance impact. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] saavan-google-intern commented on pull request #11939: [BEAM-10197] Support typehints for Python's frozenset
saavan-google-intern commented on pull request #11939: URL: https://github.com/apache/beam/pull/11939#issuecomment-649803985 It's going good! I think there are some internal google3 tests failing because frozenset was used incorrectly as a typehint in those tests but @udim is super nice and is fixing them since I don't have access, and it should be ready for merge after that This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem merged pull request #11929: [BEAM-10201] Add deadletter support to JsonToRow
pabloem merged pull request #11929: URL: https://github.com/apache/beam/pull/11929 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #12087: [BEAM-10321] retain environments in flatten for preventing it from be…
pabloem commented on pull request #12087: URL: https://github.com/apache/beam/pull/12087#issuecomment-649801292 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11765: [BEAM-9322] Remove passthrough_pcollection_output_ids and force_generated_pcollection_output_ids flags
pabloem commented on pull request #11765: URL: https://github.com/apache/beam/pull/11765#issuecomment-649800364 hm I don't know why this triggered all tests... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11765: [BEAM-9322] Remove passthrough_pcollection_output_ids and force_generated_pcollection_output_ids flags
pabloem commented on pull request #11765: URL: https://github.com/apache/beam/pull/11765#issuecomment-649800068 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on a change in pull request #11993: Change time granularity to seconds in ParDoTest TestStream timer test…
lukecwik commented on a change in pull request #11993: URL: https://github.com/apache/beam/pull/11993#discussion_r445817767 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java ## @@ -3808,15 +3817,24 @@ public void onTimer( TestStream> stream = TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarIntCoder.of())) // See GlobalWindow, - // END_OF_GLOBAL_WINDOW is TIMESTAMP_MAX_VALUE.minus(Duration.standardDays(1)) - .advanceWatermarkTo(BoundedWindow.TIMESTAMP_MAX_VALUE.minus(Duration.standardDays(1))) + // END_OF_GLOBAL_WINDOW is TIMESTAMP_MAX_VALUE.minus(Duration.standardDays(1)), Review comment: Lets leave this one tests as is and remove it from the change. We can update the others with the changes since they won't change the semantics of what is being tested. For Dataflow's TestStream implementation, to simulate production, it should round all the advance timestampts to the closest `>=` internal timestamp. It should also make sure that no two advance timestamps that aren't the same are not rounded to the same value. I believe this would make this test work and it would simulate what is happening in production. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11850: [BEAM-1438] Allow 0 shards on WriteFiles streaming
pabloem commented on pull request #11850: URL: https://github.com/apache/beam/pull/11850#issuecomment-649797891 I think this was already reviewed and discussed by Reuven in BEAM-1438 - the only issue was that the *gate* was not removed from the feature. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #12084: [BEAM-10317] Python - Update BigQueryIO to tag BigQuery Jobs with the Dataflow Job ID
pabloem commented on pull request #12084: URL: https://github.com/apache/beam/pull/12084#issuecomment-649793787 can you address formatter / lint / rat / test errors? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11824: [BEAM-10101] Add HttpIO / HttpFileSystem (Python)
pabloem commented on pull request #11824: URL: https://github.com/apache/beam/pull/11824#issuecomment-649793262 sorry about the delay - it looks like many of the tests are failing: https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/1920/ Do you htink you could fix those so we can review? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #10187: [BEAM-8376] Initial version of firestore connector JavaSDK
aaltay commented on pull request #10187: URL: https://github.com/apache/beam/pull/10187#issuecomment-649792398 What is the state of this PR? Do you need help? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11967: [BEAM-9992] | use Sets transform in BeamSQL
aaltay commented on pull request #11967: URL: https://github.com/apache/beam/pull/11967#issuecomment-649791917 Are the test errors related to this PR, or are they flakes? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #10487: [BEAM-5504] Introduce PubsubAvroTable
aaltay commented on pull request #10487: URL: https://github.com/apache/beam/pull/10487#issuecomment-649791695 @milantracy - How is this PR going? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11939: [BEAM-10197] Support typehints for Python's frozenset
aaltay commented on pull request #11939: URL: https://github.com/apache/beam/pull/11939#issuecomment-649791420 How is this PR going? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org