[jira] [Work logged] (BEAM-7274) Protobuf Beam Schema support
[ https://issues.apache.org/jira/browse/BEAM-7274?focusedWorklogId=392327&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392327 ] ASF GitHub Bot logged work on BEAM-7274: Author: ASF GitHub Bot Created on: 25/Feb/20 07:05 Start Date: 25/Feb/20 07:05 Worklog Time Spent: 10m Work Description: alexvanboxel commented on pull request #10502: [BEAM-7274] Add DynamicMessage Schema support URL: https://github.com/apache/beam/pull/10502#discussion_r383691619 ## File path: sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoDynamicMessageSchema.java ## @@ -0,0 +1,852 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.protobuf; + +import static org.apache.beam.sdk.extensions.protobuf.ProtoSchemaTranslator.getFieldNumber; +import static org.apache.beam.sdk.extensions.protobuf.ProtoSchemaTranslator.getMapKeyMessageName; +import static org.apache.beam.sdk.extensions.protobuf.ProtoSchemaTranslator.getMapValueMessageName; +import static org.apache.beam.sdk.extensions.protobuf.ProtoSchemaTranslator.getMessageName; +import static org.apache.beam.sdk.extensions.protobuf.ProtoSchemaTranslator.withFieldNumber; +import static org.apache.beam.sdk.extensions.protobuf.ProtoSchemaTranslator.withMessageName; + +import com.google.protobuf.ByteString; +import com.google.protobuf.Descriptors; +import com.google.protobuf.Descriptors.FieldDescriptor; +import com.google.protobuf.DynamicMessage; +import com.google.protobuf.Message; +import java.io.Serializable; +import java.time.Duration; +import java.time.Instant; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.SchemaCoder; +import org.apache.beam.sdk.schemas.logicaltypes.EnumerationType; +import org.apache.beam.sdk.schemas.logicaltypes.NanosDuration; +import org.apache.beam.sdk.schemas.logicaltypes.NanosInstant; +import org.apache.beam.sdk.schemas.logicaltypes.OneOfType; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.sdk.values.TypeDescriptor; + +@Experimental(Experimental.Kind.SCHEMAS) +public class ProtoDynamicMessageSchema implements Serializable { + public static final long serialVersionUID = 1L; + + /** + * Context of the schema, the context can be generated from a source schema or descriptors. The + * ability of converting back from Row to proto depends on the type of context. + */ + private final Context context; + + /** THe SchemaCoder holds the resolved schema and to/fromRow functions. */ + private transient SchemaCoder schemaCoder; + + /** List of field converters for each field in the row. */ + private transient List converters; + + private ProtoDynamicMessageSchema(String messageName, ProtoDomain domain) { +this.context = new DescriptorContext(messageName, domain); +readResolve(); + } + + private ProtoDynamicMessageSchema(Context context) { +this.context = context; +readResolve(); + } + + /** + * Create a new ProtoDynamicMessageSchema from a {@link ProtoDomain} and for a message. The + * message need to be in the domain and needs to be the fully qualified name. + */ + public static ProtoDynamicMessageSchema forDescriptor(ProtoDomain domain, String messageName) { +return new ProtoDynamicMessageSchema(messageName, domain); + } + + /** + * Create a new ProtoDynamicMessageSchema from a {@link ProtoDomain} and for a descriptor. The + * descriptor is only used for it's name, that name will be used for a search in the domain. + */ + public static ProtoDynamicMessageSchema forDescriptor( + ProtoDomain domain, Descriptors.Descriptor descriptor) { +return new ProtoDynamicMessageSchema<>(descriptor.getFullName(), domain); + } + + static ProtoDynamicMessageSchema forContext(Context context, Schema.Field field) { +return new
[jira] [Work logged] (BEAM-8618) Tear down unused DoFns periodically in Python SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8618?focusedWorklogId=392320&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392320 ] ASF GitHub Bot logged work on BEAM-8618: Author: ASF GitHub Bot Created on: 25/Feb/20 06:44 Start Date: 25/Feb/20 06:44 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #10655: [BEAM-8618] Tear down unused DoFns periodically in Python SDK harness. URL: https://github.com/apache/beam/pull/10655#issuecomment-590710428 Run PythonLint PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392320) Time Spent: 5h (was: 4h 50m) > Tear down unused DoFns periodically in Python SDK harness > - > > Key: BEAM-8618 > URL: https://issues.apache.org/jira/browse/BEAM-8618 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > Time Spent: 5h > Remaining Estimate: 0h > > Per the discussion in the ML, detail can be found [1], the teardown of DoFns > should be supported in the portability framework. It happens at two places: > 1) Upon the control service termination > 2) Tear down the unused DoFns periodically > The aim of this JIRA is to add support for tear down the unused DoFns > periodically in Python SDK harness. > [1] > https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2822) Add support for progress reporting in fn API
[ https://issues.apache.org/jira/browse/BEAM-2822?focusedWorklogId=392299&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392299 ] ASF GitHub Bot logged work on BEAM-2822: Author: ASF GitHub Bot Created on: 25/Feb/20 05:54 Start Date: 25/Feb/20 05:54 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #10956: [BEAM-2822, BEAM-2939, BEAM-6189, BEAM-4374] Enable passing completed and remaining amounts of work using new metrics format for splittable dofns in the Python SDK. URL: https://github.com/apache/beam/pull/10956#issuecomment-590697229 LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392299) Time Spent: 40m (was: 0.5h) > Add support for progress reporting in fn API > > > Key: BEAM-2822 > URL: https://issues.apache.org/jira/browse/BEAM-2822 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Vikas Kedigehalli >Priority: Minor > Labels: portability > Time Spent: 40m > Remaining Estimate: 0h > > https://s.apache.org/beam-fn-api-progress-reporting > Note that the ULR reference implementation, when ready, should be useful for > every runner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2822) Add support for progress reporting in fn API
[ https://issues.apache.org/jira/browse/BEAM-2822?focusedWorklogId=392301&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392301 ] ASF GitHub Bot logged work on BEAM-2822: Author: ASF GitHub Bot Created on: 25/Feb/20 05:54 Start Date: 25/Feb/20 05:54 Worklog Time Spent: 10m Work Description: HuangLED commented on pull request #10956: [BEAM-2822, BEAM-2939, BEAM-6189, BEAM-4374] Enable passing completed and remaining amounts of work using new metrics format for splittable dofns in the Python SDK. URL: https://github.com/apache/beam/pull/10956#discussion_r383671515 ## File path: sdks/python/apache_beam/runners/worker/operations.py ## @@ -812,6 +813,36 @@ def progress_metrics(self): current_element_progress.fraction_remaining) return metrics + def monitoring_infos(self, transform_id): Review comment: a bit surprised there is no testing file corresponding to operations.py. Maybe I missed something? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392301) Time Spent: 1h (was: 50m) > Add support for progress reporting in fn API > > > Key: BEAM-2822 > URL: https://issues.apache.org/jira/browse/BEAM-2822 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Vikas Kedigehalli >Priority: Minor > Labels: portability > Time Spent: 1h > Remaining Estimate: 0h > > https://s.apache.org/beam-fn-api-progress-reporting > Note that the ULR reference implementation, when ready, should be useful for > every runner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2822) Add support for progress reporting in fn API
[ https://issues.apache.org/jira/browse/BEAM-2822?focusedWorklogId=392300&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392300 ] ASF GitHub Bot logged work on BEAM-2822: Author: ASF GitHub Bot Created on: 25/Feb/20 05:54 Start Date: 25/Feb/20 05:54 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #10956: [BEAM-2822, BEAM-2939, BEAM-6189, BEAM-4374] Enable passing completed and remaining amounts of work using new metrics format for splittable dofns in the Python SDK. URL: https://github.com/apache/beam/pull/10956#issuecomment-590697229 LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392300) Time Spent: 50m (was: 40m) > Add support for progress reporting in fn API > > > Key: BEAM-2822 > URL: https://issues.apache.org/jira/browse/BEAM-2822 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Vikas Kedigehalli >Priority: Minor > Labels: portability > Time Spent: 50m > Remaining Estimate: 0h > > https://s.apache.org/beam-fn-api-progress-reporting > Note that the ULR reference implementation, when ready, should be useful for > every runner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-9240) Check for Nullability in typesEqual() method of FieldType class
[ https://issues.apache.org/jira/browse/BEAM-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang closed BEAM-9240. -- Resolution: Fixed > Check for Nullability in typesEqual() method of FieldType class > --- > > Key: BEAM-9240 > URL: https://issues.apache.org/jira/browse/BEAM-9240 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.18.0 >Reporter: Rahul Patwari >Assignee: Rahul Patwari >Priority: Major > Fix For: 2.20.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > {{If two schemas are created like this:}} > {{Schema schema1 = Schema.builder().addStringField("col1").build();}} > {{Schema schema2 = Schema.builder().addNullableField("col1", > FieldType.STRING).build();}} > > {{schema1.typeEquals(schema2) returns "true" even though the schemas differ > by Nullability}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9240) Check for Nullability in typesEqual() method of FieldType class
[ https://issues.apache.org/jira/browse/BEAM-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044129#comment-17044129 ] Rui Wang commented on BEAM-9240: I see. I was waiting for [~reuvenlax] to comment. But since it's been a while, I have just merged it. > Check for Nullability in typesEqual() method of FieldType class > --- > > Key: BEAM-9240 > URL: https://issues.apache.org/jira/browse/BEAM-9240 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.18.0 >Reporter: Rahul Patwari >Assignee: Rahul Patwari >Priority: Major > Fix For: 2.20.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > {{If two schemas are created like this:}} > {{Schema schema1 = Schema.builder().addStringField("col1").build();}} > {{Schema schema2 = Schema.builder().addNullableField("col1", > FieldType.STRING).build();}} > > {{schema1.typeEquals(schema2) returns "true" even though the schemas differ > by Nullability}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9240) Check for Nullability in typesEqual() method of FieldType class
[ https://issues.apache.org/jira/browse/BEAM-9240?focusedWorklogId=392288&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392288 ] ASF GitHub Bot logged work on BEAM-9240: Author: ASF GitHub Bot Created on: 25/Feb/20 05:32 Start Date: 25/Feb/20 05:32 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #10744: [BEAM-9240]: Check for Nullability in typesEqual() method of FieldTyp… URL: https://github.com/apache/beam/pull/10744#issuecomment-590691794 I will merge this PR seems it's been a while. @reuvenlax feel free to comment anything in the future. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392288) Time Spent: 3.5h (was: 3h 20m) > Check for Nullability in typesEqual() method of FieldType class > --- > > Key: BEAM-9240 > URL: https://issues.apache.org/jira/browse/BEAM-9240 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.18.0 >Reporter: Rahul Patwari >Assignee: Rahul Patwari >Priority: Major > Fix For: 2.20.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > {{If two schemas are created like this:}} > {{Schema schema1 = Schema.builder().addStringField("col1").build();}} > {{Schema schema2 = Schema.builder().addNullableField("col1", > FieldType.STRING).build();}} > > {{schema1.typeEquals(schema2) returns "true" even though the schemas differ > by Nullability}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9240) Check for Nullability in typesEqual() method of FieldType class
[ https://issues.apache.org/jira/browse/BEAM-9240?focusedWorklogId=392289&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392289 ] ASF GitHub Bot logged work on BEAM-9240: Author: ASF GitHub Bot Created on: 25/Feb/20 05:32 Start Date: 25/Feb/20 05:32 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #10744: [BEAM-9240]: Check for Nullability in typesEqual() method of FieldTyp… URL: https://github.com/apache/beam/pull/10744 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392289) Time Spent: 3h 40m (was: 3.5h) > Check for Nullability in typesEqual() method of FieldType class > --- > > Key: BEAM-9240 > URL: https://issues.apache.org/jira/browse/BEAM-9240 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.18.0 >Reporter: Rahul Patwari >Assignee: Rahul Patwari >Priority: Major > Fix For: 2.20.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > {{If two schemas are created like this:}} > {{Schema schema1 = Schema.builder().addStringField("col1").build();}} > {{Schema schema2 = Schema.builder().addNullableField("col1", > FieldType.STRING).build();}} > > {{schema1.typeEquals(schema2) returns "true" even though the schemas differ > by Nullability}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8620) Tear down unused DoFns periodically in Java SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated BEAM-8620: -- Fix Version/s: (was: 2.20.0) > Tear down unused DoFns periodically in Java SDK harness > --- > > Key: BEAM-8620 > URL: https://issues.apache.org/jira/browse/BEAM-8620 > Project: Beam > Issue Type: Improvement > Components: sdk-java-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > > Per the discussion in the ML the detail can be found here[1], the teardown of > DoFns should be supported in the portability framework. It happens at two > places: > 1) Upon the control service termination > 2) Tear down the unused DoFns periodically > The aim of this JIRA is to add support for tear down the unused DoFns > periodically in Java SDK harness. > [1] > https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8620) Tear down unused DoFns periodically in Java SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044119#comment-17044119 ] sunjincheng commented on BEAM-8620: --- Thanks for the reminder, I would like to reset the fix version to 2.21.0, and appreciate if you(or some one) can add unrelease version of 2.21.0. :) for now, reset it as None. > Tear down unused DoFns periodically in Java SDK harness > --- > > Key: BEAM-8620 > URL: https://issues.apache.org/jira/browse/BEAM-8620 > Project: Beam > Issue Type: Improvement > Components: sdk-java-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > > Per the discussion in the ML the detail can be found here[1], the teardown of > DoFns should be supported in the portability framework. It happens at two > places: > 1) Upon the control service termination > 2) Tear down the unused DoFns periodically > The aim of this JIRA is to add support for tear down the unused DoFns > periodically in Java SDK harness. > [1] > https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7274) Protobuf Beam Schema support
[ https://issues.apache.org/jira/browse/BEAM-7274?focusedWorklogId=392266&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392266 ] ASF GitHub Bot logged work on BEAM-7274: Author: ASF GitHub Bot Created on: 25/Feb/20 03:15 Start Date: 25/Feb/20 03:15 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #10502: [BEAM-7274] Add DynamicMessage Schema support URL: https://github.com/apache/beam/pull/10502#discussion_r383637559 ## File path: sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoDynamicMessageSchema.java ## @@ -0,0 +1,852 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.protobuf; + +import static org.apache.beam.sdk.extensions.protobuf.ProtoSchemaTranslator.getFieldNumber; +import static org.apache.beam.sdk.extensions.protobuf.ProtoSchemaTranslator.getMapKeyMessageName; +import static org.apache.beam.sdk.extensions.protobuf.ProtoSchemaTranslator.getMapValueMessageName; +import static org.apache.beam.sdk.extensions.protobuf.ProtoSchemaTranslator.getMessageName; +import static org.apache.beam.sdk.extensions.protobuf.ProtoSchemaTranslator.withFieldNumber; +import static org.apache.beam.sdk.extensions.protobuf.ProtoSchemaTranslator.withMessageName; + +import com.google.protobuf.ByteString; +import com.google.protobuf.Descriptors; +import com.google.protobuf.Descriptors.FieldDescriptor; +import com.google.protobuf.DynamicMessage; +import com.google.protobuf.Message; +import java.io.Serializable; +import java.time.Duration; +import java.time.Instant; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.SchemaCoder; +import org.apache.beam.sdk.schemas.logicaltypes.EnumerationType; +import org.apache.beam.sdk.schemas.logicaltypes.NanosDuration; +import org.apache.beam.sdk.schemas.logicaltypes.NanosInstant; +import org.apache.beam.sdk.schemas.logicaltypes.OneOfType; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.sdk.values.TypeDescriptor; + +@Experimental(Experimental.Kind.SCHEMAS) +public class ProtoDynamicMessageSchema implements Serializable { + public static final long serialVersionUID = 1L; + + /** + * Context of the schema, the context can be generated from a source schema or descriptors. The + * ability of converting back from Row to proto depends on the type of context. + */ + private final Context context; + + /** THe SchemaCoder holds the resolved schema and to/fromRow functions. */ + private transient SchemaCoder schemaCoder; + + /** List of field converters for each field in the row. */ + private transient List converters; + + private ProtoDynamicMessageSchema(String messageName, ProtoDomain domain) { +this.context = new DescriptorContext(messageName, domain); +readResolve(); + } + + private ProtoDynamicMessageSchema(Context context) { +this.context = context; +readResolve(); + } + + /** + * Create a new ProtoDynamicMessageSchema from a {@link ProtoDomain} and for a message. The + * message need to be in the domain and needs to be the fully qualified name. + */ + public static ProtoDynamicMessageSchema forDescriptor(ProtoDomain domain, String messageName) { +return new ProtoDynamicMessageSchema(messageName, domain); + } + + /** + * Create a new ProtoDynamicMessageSchema from a {@link ProtoDomain} and for a descriptor. The + * descriptor is only used for it's name, that name will be used for a search in the domain. + */ + public static ProtoDynamicMessageSchema forDescriptor( + ProtoDomain domain, Descriptors.Descriptor descriptor) { +return new ProtoDynamicMessageSchema<>(descriptor.getFullName(), domain); + } + + static ProtoDynamicMessageSchema forContext(Context context, Schema.Field field) { +return new Pro
[jira] [Work logged] (BEAM-9357) Bump upper end of Google Bigquery dependencies for python
[ https://issues.apache.org/jira/browse/BEAM-9357?focusedWorklogId=392251&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392251 ] ASF GitHub Bot logged work on BEAM-9357: Author: ASF GitHub Bot Created on: 25/Feb/20 02:43 Start Date: 25/Feb/20 02:43 Worklog Time Spent: 10m Work Description: drubinstein commented on issue #10929: [BEAM-9357] Bump google cloud bigquery to 1.24.0 URL: https://github.com/apache/beam/pull/10929#issuecomment-590653468 Thanks for merging this @aaltay ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392251) Time Spent: 1h 10m (was: 1h) > Bump upper end of Google Bigquery dependencies for python > - > > Key: BEAM-9357 > URL: https://issues.apache.org/jira/browse/BEAM-9357 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness > Environment: Python >Reporter: David Rubinstein >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > I am trying to use Dataflow with Beam Python and a package that depends on > google-resumable-media 0.5.0. The current google-cloud-bigquery (which is > only used for testing) depends on google-resumable-media <= 0.4.1. The upper > bound on the google-cloud-bigquery version should be loosened to solve > possible transitive dependency issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9298) Drop support for Flink 1.7
[ https://issues.apache.org/jira/browse/BEAM-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated BEAM-9298: -- Fix Version/s: (was: 2.20.0) > Drop support for Flink 1.7 > --- > > Key: BEAM-9298 > URL: https://issues.apache.org/jira/browse/BEAM-9298 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > With Flink 1.10 around the corner, more detail can be found in BEAM-9295, we > should consider dropping support for Flink 1.7. Then dropping 1.7 will also > decrease the build time. > What do you think? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2
[ https://issues.apache.org/jira/browse/BEAM-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated BEAM-9299: -- Fix Version/s: (was: 2.20.0) > Upgrade Flink Runner to 1.8.3 and 1.9.2 > --- > > Key: BEAM-9299 > URL: https://issues.apache.org/jira/browse/BEAM-9299 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache > Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. > What do you think? > [1] https://dist.apache.org/repos/dist/release/flink/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9295) Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10
[ https://issues.apache.org/jira/browse/BEAM-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044074#comment-17044074 ] sunjincheng edited comment on BEAM-9295 at 2/25/20 2:41 AM: I would like reset the fix version to 2.21. Thank you . [~amaliujia] Could you please add the un-release version 2.21.0 ? was (Author: sunjincheng121): I would like reset the fix version to 2.21. Thank you . [~amaliujia] > Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10 > --- > > Key: BEAM-9295 > URL: https://issues.apache.org/jira/browse/BEAM-9295 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Apache Flink 1.10 has completed the final release vote, see [1]. So, I would > like to add Flink 1.10 build target and make Flink Runner compatible with > Flink 1.10. > And I appreciate it if you can leave your suggestions or comments! > [1] > https://lists.apache.org/thread.html/r97672d4d1e47372cebf23e6643a6cc30a06bfbdf3f277b0be3695b15%40%3Cdev.flink.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9295) Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10
[ https://issues.apache.org/jira/browse/BEAM-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated BEAM-9295: -- Fix Version/s: (was: 2.20.0) > Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10 > --- > > Key: BEAM-9295 > URL: https://issues.apache.org/jira/browse/BEAM-9295 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Apache Flink 1.10 has completed the final release vote, see [1]. So, I would > like to add Flink 1.10 build target and make Flink Runner compatible with > Flink 1.10. > And I appreciate it if you can leave your suggestions or comments! > [1] > https://lists.apache.org/thread.html/r97672d4d1e47372cebf23e6643a6cc30a06bfbdf3f277b0be3695b15%40%3Cdev.flink.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9295) Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10
[ https://issues.apache.org/jira/browse/BEAM-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044074#comment-17044074 ] sunjincheng commented on BEAM-9295: --- I would like reset the fix version to 2.21. Thank you . [~amaliujia] > Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10 > --- > > Key: BEAM-9295 > URL: https://issues.apache.org/jira/browse/BEAM-9295 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Apache Flink 1.10 has completed the final release vote, see [1]. So, I would > like to add Flink 1.10 build target and make Flink Runner compatible with > Flink 1.10. > And I appreciate it if you can leave your suggestions or comments! > [1] > https://lists.apache.org/thread.html/r97672d4d1e47372cebf23e6643a6cc30a06bfbdf3f277b0be3695b15%40%3Cdev.flink.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9359) Use DataCatalog client libraries rather than gRPC stubs
[ https://issues.apache.org/jira/browse/BEAM-9359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044073#comment-17044073 ] Rui Wang commented on BEAM-9359: Thank you! > Use DataCatalog client libraries rather than gRPC stubs > --- > > Key: BEAM-9359 > URL: https://issues.apache.org/jira/browse/BEAM-9359 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.20.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The [GCP docs|https://cloud.google.com/data-catalog/docs/reference/libraries] > indicate this is the preferred way to use the service. > The client library sets some headers in requests that ensure they are > consistently routed properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=392247&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392247 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 25/Feb/20 02:33 Start Date: 25/Feb/20 02:33 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#issuecomment-590651248 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392247) Time Spent: 72h (was: 71h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 72h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=392246&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392246 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 25/Feb/20 02:33 Start Date: 25/Feb/20 02:33 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#issuecomment-590651211 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392246) Time Spent: 71h 50m (was: 71h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 71h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=392245&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392245 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 25/Feb/20 02:32 Start Date: 25/Feb/20 02:32 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#issuecomment-590650918 Rebased to resolve merge conflict. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392245) Time Spent: 71h 40m (was: 71.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 71h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9359) Use DataCatalog client libraries rather than gRPC stubs
[ https://issues.apache.org/jira/browse/BEAM-9359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044060#comment-17044060 ] Brian Hulette commented on BEAM-9359: - PR #10917 fixed this, and I just forgot to close it. This is ready for 2.20 release > Use DataCatalog client libraries rather than gRPC stubs > --- > > Key: BEAM-9359 > URL: https://issues.apache.org/jira/browse/BEAM-9359 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.20.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The [GCP docs|https://cloud.google.com/data-catalog/docs/reference/libraries] > indicate this is the preferred way to use the service. > The client library sets some headers in requests that ensure they are > consistently routed properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9359) Use DataCatalog client libraries rather than gRPC stubs
[ https://issues.apache.org/jira/browse/BEAM-9359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette resolved BEAM-9359. - Resolution: Fixed > Use DataCatalog client libraries rather than gRPC stubs > --- > > Key: BEAM-9359 > URL: https://issues.apache.org/jira/browse/BEAM-9359 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.20.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The [GCP docs|https://cloud.google.com/data-catalog/docs/reference/libraries] > indicate this is the preferred way to use the service. > The client library sets some headers in requests that ensure they are > consistently routed properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=392242&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392242 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 25/Feb/20 02:12 Start Date: 25/Feb/20 02:12 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#issuecomment-590645959 > Can you resolve the merge conflicts? On it! Introduced by the other PR just got merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392242) Time Spent: 71.5h (was: 71h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 71.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9240) Check for Nullability in typesEqual() method of FieldType class
[ https://issues.apache.org/jira/browse/BEAM-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044052#comment-17044052 ] Rahul Patwari commented on BEAM-9240: - Hi [~amaliujia], I have committed the required changes to fix the bug. The review is pending though. I have added you as a reviewer. Can you please review and merge the PR. If not, can you suggest other committers that I can add as a reviewer? I have raised the bug before 2.19.0 release, but couldn't commit the required changes. I would prefer to fix the bug with 2.20.0 release. If the PR cannot be merged by then, it can be pushed back to 2.21.0. > Check for Nullability in typesEqual() method of FieldType class > --- > > Key: BEAM-9240 > URL: https://issues.apache.org/jira/browse/BEAM-9240 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.18.0 >Reporter: Rahul Patwari >Assignee: Rahul Patwari >Priority: Major > Fix For: 2.20.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > {{If two schemas are created like this:}} > {{Schema schema1 = Schema.builder().addStringField("col1").build();}} > {{Schema schema2 = Schema.builder().addNullableField("col1", > FieldType.STRING).build();}} > > {{schema1.typeEquals(schema2) returns "true" even though the schemas differ > by Nullability}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=392235&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392235 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 25/Feb/20 02:00 Start Date: 25/Feb/20 02:00 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#issuecomment-590642996 Can you resolve the merge conflicts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392235) Time Spent: 71h 20m (was: 71h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 71h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=392234&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392234 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 25/Feb/20 01:59 Start Date: 25/Feb/20 01:59 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10731: [BEAM-7926] Data-centric Interactive Part3 URL: https://github.com/apache/beam/pull/10731 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392234) Time Spent: 48h 50m (was: 48h 40m) > Show PCollection with Interactive Beam in a data-centric user flow > -- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 48h 50m > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > > {code:java} > p = beam.Pipeline(InteractiveRunner()) > pcoll = p | 'Transform' >> transform() > pcoll2 = ... > pcoll3 = ...{code} > The use can call a single function and get auto-magical charting of the data. > e.g., > {code:java} > show(pcoll, pcoll2) > {code} > Throughout the process, a pipeline fragment is built to include only > transforms necessary to produce the desired pcolls (pcoll and pcoll2) and > execute that fragment. > This makes the Interactive Beam user flow data-centric. > > Detailed > [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044050#comment-17044050 ] Rui Wang commented on BEAM-9322: Thanks [~rohdesam] Do you expect PR#10934 be merged into 2.20.0 release as a fix and the full version of fix to this Jira will be moved to future release? > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-4948) Beam Dependency Update Request: com.google.guava
[ https://issues.apache.org/jira/browse/BEAM-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044041#comment-17044041 ] Luke Cwik commented on BEAM-4948: - Why the change to invalid? > Beam Dependency Update Request: com.google.guava > > > Key: BEAM-4948 > URL: https://issues.apache.org/jira/browse/BEAM-4948 > Project: Beam > Issue Type: Bug > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Luke Cwik >Priority: Major > Fix For: 2.15.0 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > 2018-07-25 20:28:03.628639 > Please review and upgrade the com.google.guava to the latest version > None > > cc: -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-6309) Build race: Can't find bundle for base name com.google.errorprone.errors
[ https://issues.apache.org/jira/browse/BEAM-6309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles closed BEAM-6309. - Fix Version/s: Not applicable Resolution: Cannot Reproduce > Build race: Can't find bundle for base name com.google.errorprone.errors > > > Key: BEAM-6309 > URL: https://issues.apache.org/jira/browse/BEAM-6309 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Andrew Pilloud >Priority: Major > Fix For: Not applicable > > > https://builds.apache.org/job/beam_PreCommit_Java_Cron/741/ > https://scans.gradle.com/s/2b2qh7to734m4 > {code:java} > org.gradle.api.tasks.TaskExecutionException > : > Execution failed for task ':beam-sdks-java-io-hadoop-format:compileTestJava'. > Open stacktrace > Caused by: > org.gradle.internal.UncheckedException > : > java.lang.reflect.InvocationTargetException > Open stacktrace > Caused by: > java.lang.reflect.InvocationTargetException > : > (No message provided) > Open stacktrace > Caused by: > java.util.MissingResourceException > : > Can't find bundle for base name com.google.errorprone.errors, locale en_US > Open stacktrace > Caused by: > java.lang.NullPointerException > : > (No message provided) > Close stacktrace > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-5559) Beam Dependency Update Request: com.google.guava:guava
[ https://issues.apache.org/jira/browse/BEAM-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles closed BEAM-5559. - Fix Version/s: Not applicable Assignee: Tomo Suzuki Resolution: Won't Fix > Beam Dependency Update Request: com.google.guava:guava > -- > > Key: BEAM-5559 > URL: https://issues.apache.org/jira/browse/BEAM-5559 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Fix For: Not applicable > > > - 2018-10-01 19:30:53.471497 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 26.0-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-08 12:18:05.174889 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 26.0-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-04-15 12:32:27.737694 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 27.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-04-22 12:10:18.539470 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 27.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-12 22:48:00.063941 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 26.0-jre. The latest version is 28.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:06:09.552946 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 26.0-jre. The latest version is 28.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:11:56.870028 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 26.0-jre. The latest version is 28.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:11:09.244912 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 26.0-jre. The latest version is 28.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-23 12:11:03.393600 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 26.0-jre. The latest version is 28.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-30 14:06:35.828547 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 26.0-jre. The latest version is 28.2-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2020-01-06 12:10:10.446048 > - > Please consider upgrading the dependency com.google.guava:guava. > The
[jira] [Commented] (BEAM-5559) Beam Dependency Update Request: com.google.guava:guava
[ https://issues.apache.org/jira/browse/BEAM-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044037#comment-17044037 ] Kenneth Knowles commented on BEAM-5559: --- So is this now a wontfix, based on compatibility? > Beam Dependency Update Request: com.google.guava:guava > -- > > Key: BEAM-5559 > URL: https://issues.apache.org/jira/browse/BEAM-5559 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Priority: Major > > - 2018-10-01 19:30:53.471497 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 26.0-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-08 12:18:05.174889 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 26.0-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-04-15 12:32:27.737694 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 27.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-04-22 12:10:18.539470 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 20.0. The latest version is 27.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-12 22:48:00.063941 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 26.0-jre. The latest version is 28.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:06:09.552946 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 26.0-jre. The latest version is 28.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:11:56.870028 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 26.0-jre. The latest version is 28.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:11:09.244912 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 26.0-jre. The latest version is 28.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-23 12:11:03.393600 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 26.0-jre. The latest version is 28.1-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-30 14:06:35.828547 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 26.0-jre. The latest version is 28.2-jre > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2020-01-06 12:10:10.446048 > - > Please consider upgrading the dependency com.google.guava:guava. > The current version is 26.0-jre. The latest version is 28.2-jre
[jira] [Closed] (BEAM-4948) Beam Dependency Update Request: com.google.guava
[ https://issues.apache.org/jira/browse/BEAM-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles closed BEAM-4948. - Resolution: Invalid > Beam Dependency Update Request: com.google.guava > > > Key: BEAM-4948 > URL: https://issues.apache.org/jira/browse/BEAM-4948 > Project: Beam > Issue Type: Bug > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Luke Cwik >Priority: Major > Fix For: 2.15.0 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > 2018-07-25 20:28:03.628639 > Please review and upgrade the com.google.guava to the latest version > None > > cc: -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8965) WriteToBigQuery failed in BundleBasedDirectRunner
[ https://issues.apache.org/jira/browse/BEAM-8965?focusedWorklogId=392224&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392224 ] ASF GitHub Bot logged work on BEAM-8965: Author: ASF GitHub Bot Created on: 25/Feb/20 01:18 Start Date: 25/Feb/20 01:18 Worklog Time Spent: 10m Work Description: bobingm commented on issue #10901: [BEAM-8965] Remove duplicate sideinputs in ConsumerTrackingPipelineVisitor URL: https://github.com/apache/beam/pull/10901#issuecomment-590631677 Run Python Postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392224) Time Spent: 0.5h (was: 20m) > WriteToBigQuery failed in BundleBasedDirectRunner > - > > Key: BEAM-8965 > URL: https://issues.apache.org/jira/browse/BEAM-8965 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0 >Reporter: Wenbing Bai >Assignee: Wenbing Bai >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > *{{WriteToBigQuery}}* fails in *{{BundleBasedDirectRunner}}* with error > {{PCollection of size 2 with more than one element accessed as a singleton > view.}} > Here is the code > > {code:python} > with Pipeline() as p: > query_results = ( > p > | beam.io.Read(beam.io.BigQuerySource( > query='SELECT ... FROM ...') > ) > query_results | beam.io.gcp.WriteToBigQuery( > table=, > method=WriteToBigQuery.Method.FILE_LOADS, > schema={"fields": []} > ) > {code} > > Here is the error > > {code:none} > File "apache_beam/runners/common.py", line 778, in > apache_beam.runners.common.DoFnRunner.process > def process(self, windowed_value): > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > self._reraise_augmented(exn) > File "apache_beam/runners/common.py", line 849, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > raise_with_traceback(new_exn) > File "apache_beam/runners/common.py", line 780, in > apache_beam.runners.common.DoFnRunner.process > return self.do_fn_invoker.invoke_process(windowed_value) > File "apache_beam/runners/common.py", line 587, in > apache_beam.runners.common.PerWindowInvoker.invoke_process > self._invoke_process_per_window( > File "apache_beam/runners/common.py", line 610, in > apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window > [si[global_window] for si in self.side_inputs])) > File > "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/transforms/sideinputs.py", > line 65, in __getitem__ > _FilteringIterable(self._iterable, target_window), self._view_options) > File > "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/pvalue.py", > line 443, in _from_runtime_iterable > len(head), str(head[0]), str(head[1]))) > ValueError: PCollection of size 2 with more than one element accessed as a > singleton view. First two elements encountered are > "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f", > "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f". [while running > 'WriteToBigQuery/BigQueryBatchFileLoads/ParDo(WriteRecordsToFile)/ParDo(WriteRecordsToFile)'] > {code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
[ https://issues.apache.org/jira/browse/BEAM-9228?focusedWorklogId=392212&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392212 ] ASF GitHub Bot logged work on BEAM-9228: Author: ASF GitHub Bot Created on: 25/Feb/20 00:28 Start Date: 25/Feb/20 00:28 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10847: [BEAM-9228] Support further partition for FnApi ListBuffer URL: https://github.com/apache/beam/pull/10847 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392212) Time Spent: 3h 20m (was: 3h 10m) > _SDFBoundedSourceWrapper doesn't distribute data to multiple workers > > > Key: BEAM-9228 > URL: https://issues.apache.org/jira/browse/BEAM-9228 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.16.0, 2.18.0, 2.19.0 >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.20.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > A user reported following issue. > - > I have a set of tfrecord files, obtained by converting parquet files with > Spark. Each file is roughly 1GB and I have 11 of those. > I would expect simple statistics gathering (ie counting number of items of > all files) to scale linearly with respect to the number of cores on my system. > I am able to reproduce the issue with the minimal snippet below > {code:java} > import apache_beam as beam > from apache_beam.options.pipeline_options import PipelineOptions > from apache_beam.runners.portability import fn_api_runner > from apache_beam.portability.api import beam_runner_api_pb2 > from apache_beam.portability import python_urns > import sys > pipeline_options = PipelineOptions(['--direct_num_workers', '4']) > file_pattern = 'part-r-00* > runner=fn_api_runner.FnApiRunner( > default_environment=beam_runner_api_pb2.Environment( > urn=python_urns.SUBPROCESS_SDK, > payload=b'%s -m apache_beam.runners.worker.sdk_worker_main' > % sys.executable.encode('ascii'))) > p = beam.Pipeline(runner=runner, options=pipeline_options) > lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern) > | beam.combiners.Count.Globally() > | beam.io.WriteToText('/tmp/output')) > p.run() > {code} > Only one combination of apache_beam revision / worker type seems to work (I > refer to https://beam.apache.org/documentation/runners/direct/ for the worker > types) > * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on > multiple cores > * beam 2.17: able to achieve high cpu usage on all 4 cores > * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails > when trying to serialize the Environment instance most likely because of a > change from 2.17 to 2.18. > I also tried briefly SparkRunner with version 2.16 but was no able to achieve > any throughput. > What is the recommnended way to achieve what I am trying to ? How can I > troubleshoot ? > -- > This is caused by [this > PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60]. > A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is > rolling back iobase.py not to use _SDFBoundedSourceWrapper. This confirmed > that data is distributed to multiple workers, however, there are some > regressions with SDF wrapper tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043991#comment-17043991 ] Sam Rohde commented on BEAM-9322: - No this Jira won't be able to be solved by by the 2.20.0 version. But I will have a small fix in [https://github.com/apache/beam/pull/10934]. > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9263) Bump python sdk fnapi version to enable status reporting
[ https://issues.apache.org/jira/browse/BEAM-9263?focusedWorklogId=392209&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392209 ] ASF GitHub Bot logged work on BEAM-9263: Author: ASF GitHub Bot Created on: 25/Feb/20 00:10 Start Date: 25/Feb/20 00:10 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10870: [BEAM-9263] Bump up sdk dataflow environment major versions URL: https://github.com/apache/beam/pull/10870#issuecomment-590613101 test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392209) Time Spent: 20m (was: 10m) > Bump python sdk fnapi version to enable status reporting > > > Key: BEAM-9263 > URL: https://issues.apache.org/jira/browse/BEAM-9263 > Project: Beam > Issue Type: Task > Components: sdk-py-core >Affects Versions: 2.20.0 >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Bump python sdk fn api environment version to 8 for roll out the status > feature for sdk harness status reporting. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
[ https://issues.apache.org/jira/browse/BEAM-9228?focusedWorklogId=392208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392208 ] ASF GitHub Bot logged work on BEAM-9228: Author: ASF GitHub Bot Created on: 25/Feb/20 00:07 Start Date: 25/Feb/20 00:07 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10847: [BEAM-9228] Support further partition for FnApi ListBuffer URL: https://github.com/apache/beam/pull/10847#discussion_r383587744 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py ## @@ -1603,23 +1603,6 @@ def restriction_size(self, element, restriction): return restriction.size() -class FnApiRunnerSplitTestWithMultiWorkers(FnApiRunnerSplitTest): Review comment: Is there any loss of coverage in removing these tests? I can see how perhaps they're too strict. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392208) Time Spent: 3h 10m (was: 3h) > _SDFBoundedSourceWrapper doesn't distribute data to multiple workers > > > Key: BEAM-9228 > URL: https://issues.apache.org/jira/browse/BEAM-9228 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.16.0, 2.18.0, 2.19.0 >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.20.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > A user reported following issue. > - > I have a set of tfrecord files, obtained by converting parquet files with > Spark. Each file is roughly 1GB and I have 11 of those. > I would expect simple statistics gathering (ie counting number of items of > all files) to scale linearly with respect to the number of cores on my system. > I am able to reproduce the issue with the minimal snippet below > {code:java} > import apache_beam as beam > from apache_beam.options.pipeline_options import PipelineOptions > from apache_beam.runners.portability import fn_api_runner > from apache_beam.portability.api import beam_runner_api_pb2 > from apache_beam.portability import python_urns > import sys > pipeline_options = PipelineOptions(['--direct_num_workers', '4']) > file_pattern = 'part-r-00* > runner=fn_api_runner.FnApiRunner( > default_environment=beam_runner_api_pb2.Environment( > urn=python_urns.SUBPROCESS_SDK, > payload=b'%s -m apache_beam.runners.worker.sdk_worker_main' > % sys.executable.encode('ascii'))) > p = beam.Pipeline(runner=runner, options=pipeline_options) > lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern) > | beam.combiners.Count.Globally() > | beam.io.WriteToText('/tmp/output')) > p.run() > {code} > Only one combination of apache_beam revision / worker type seems to work (I > refer to https://beam.apache.org/documentation/runners/direct/ for the worker > types) > * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on > multiple cores > * beam 2.17: able to achieve high cpu usage on all 4 cores > * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails > when trying to serialize the Environment instance most likely because of a > change from 2.17 to 2.18. > I also tried briefly SparkRunner with version 2.16 but was no able to achieve > any throughput. > What is the recommnended way to achieve what I am trying to ? How can I > troubleshoot ? > -- > This is caused by [this > PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60]. > A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is > rolling back iobase.py not to use _SDFBoundedSourceWrapper. This confirmed > that data is distributed to multiple workers, however, there are some > regressions with SDF wrapper tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
[ https://issues.apache.org/jira/browse/BEAM-9228?focusedWorklogId=392207&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392207 ] ASF GitHub Bot logged work on BEAM-9228: Author: ASF GitHub Bot Created on: 25/Feb/20 00:07 Start Date: 25/Feb/20 00:07 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10847: [BEAM-9228] Support further partition for FnApi ListBuffer URL: https://github.com/apache/beam/pull/10847#discussion_r383587172 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -932,10 +994,14 @@ def get_buffer(buffer_id): return pcoll_buffers[buffer_id] def get_input_coder_impl(transform_id): - return context.coders[ - safe_coders[beam_fn_api_pb2.RemoteGrpcPort.FromString( - process_bundle_descriptor.transforms[transform_id].spec.payload). - coder_id]].get_impl() + coder_id = beam_fn_api_pb2.RemoteGrpcPort.FromString( + process_bundle_descriptor.transforms[transform_id].spec.payload + ).coder_id + assert coder_id + if coder_id in safe_coders: Review comment: An alternative would be `safe_coders.get(coder_id, coder_id)` which means look up the key (first argument) and if it doesn't exist return the second argument. This would eliminate some of the repetition of logic between the lines as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392207) Time Spent: 3h 10m (was: 3h) > _SDFBoundedSourceWrapper doesn't distribute data to multiple workers > > > Key: BEAM-9228 > URL: https://issues.apache.org/jira/browse/BEAM-9228 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.16.0, 2.18.0, 2.19.0 >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.20.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > A user reported following issue. > - > I have a set of tfrecord files, obtained by converting parquet files with > Spark. Each file is roughly 1GB and I have 11 of those. > I would expect simple statistics gathering (ie counting number of items of > all files) to scale linearly with respect to the number of cores on my system. > I am able to reproduce the issue with the minimal snippet below > {code:java} > import apache_beam as beam > from apache_beam.options.pipeline_options import PipelineOptions > from apache_beam.runners.portability import fn_api_runner > from apache_beam.portability.api import beam_runner_api_pb2 > from apache_beam.portability import python_urns > import sys > pipeline_options = PipelineOptions(['--direct_num_workers', '4']) > file_pattern = 'part-r-00* > runner=fn_api_runner.FnApiRunner( > default_environment=beam_runner_api_pb2.Environment( > urn=python_urns.SUBPROCESS_SDK, > payload=b'%s -m apache_beam.runners.worker.sdk_worker_main' > % sys.executable.encode('ascii'))) > p = beam.Pipeline(runner=runner, options=pipeline_options) > lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern) > | beam.combiners.Count.Globally() > | beam.io.WriteToText('/tmp/output')) > p.run() > {code} > Only one combination of apache_beam revision / worker type seems to work (I > refer to https://beam.apache.org/documentation/runners/direct/ for the worker > types) > * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on > multiple cores > * beam 2.17: able to achieve high cpu usage on all 4 cores > * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails > when trying to serialize the Environment instance most likely because of a > change from 2.17 to 2.18. > I also tried briefly SparkRunner with version 2.16 but was no able to achieve > any throughput. > What is the recommnended way to achieve what I am trying to ? How can I > troubleshoot ? > -- > This is caused by [this > PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60]. > A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is > rolling back iobase.py n
[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python
[ https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=392205&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392205 ] ASF GitHub Bot logged work on BEAM-3342: Author: ASF GitHub Bot Created on: 25/Feb/20 00:02 Start Date: 25/Feb/20 00:02 Worklog Time Spent: 10m Work Description: mf2199 commented on issue #8457: [BEAM-3342] Create a Cloud Bigtable IO connector for Python URL: https://github.com/apache/beam/pull/8457#issuecomment-590610828 @chamikaramj PTAL. The long standing issue now seems to be resolved. The reason was the namespace conflict between the existing `bigtableio.py` file and the extra package necessary for running the integration test. As it turns out, the two must have different names, otherwise Dataflow discards the package and uses the existing file, which of course does not have the newly added classes and hence the `AttributeError: 'module' object has no attribute...` error. Apparently this was not the case when this PR was originally created, so now we also have a caveat: - Until the code is merged, the only way to run the test is to change the package name in the import directive, `from bigtableio import ReadFromBigtable`, to something different, and use the external tarball package named accordingly. Upon merge, using an extra package will no longer be necessary and the test should run as-is. This was confirmed by running a sequence of nearly identical tests back-to-back, the instructions to which I can provide separately. In an attempt to reduce the code changes to a bare minimum, the write part of the test has been discarded, as it would test an already accepted code anyway. Finally, I'd also suggest closing this PR and open a new one, so to make things cleaner. Let me know if this is a viable option. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392205) Time Spent: 44h 10m (was: 44h) > Create a Cloud Bigtable IO connector for Python > --- > > Key: BEAM-3342 > URL: https://issues.apache.org/jira/browse/BEAM-3342 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Solomon Duskis >Assignee: Solomon Duskis >Priority: Major > Time Spent: 44h 10m > Remaining Estimate: 0h > > I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9056) Staging artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9056?focusedWorklogId=392204&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392204 ] ASF GitHub Bot logged work on BEAM-9056: Author: ASF GitHub Bot Created on: 25/Feb/20 00:00 Start Date: 25/Feb/20 00:00 Worklog Time Spent: 10m Work Description: ihji commented on issue #10621: [BEAM-9056] Staging artifacts from environment URL: https://github.com/apache/beam/pull/10621#issuecomment-590610385 @chamikaramj @robertwb Friendly ping for the review. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392204) Time Spent: 2h 40m (was: 2.5h) > Staging artifacts from environment > -- > > Key: BEAM-9056 > URL: https://issues.apache.org/jira/browse/BEAM-9056 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > staging artifacts from artifact information embedded in environment proto. > detail: > https://docs.google.com/document/d/1L7MJcfyy9mg2Ahfw5XPhUeBe-dyvAPMOYOiFA1-kAog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9296) Add typing annotation to python SDF
[ https://issues.apache.org/jira/browse/BEAM-9296?focusedWorklogId=392203&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392203 ] ASF GitHub Bot logged work on BEAM-9296: Author: ASF GitHub Bot Created on: 24/Feb/20 23:57 Start Date: 24/Feb/20 23:57 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #10935: [BEAM-9296] Clean up and add type-hints to SDF API URL: https://github.com/apache/beam/pull/10935 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392203) Time Spent: 1.5h (was: 1h 20m) > Add typing annotation to python SDF > --- > > Key: BEAM-9296 > URL: https://issues.apache.org/jira/browse/BEAM-9296 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9347) Remove default image for Unified Worker
[ https://issues.apache.org/jira/browse/BEAM-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Goenka resolved BEAM-9347. Resolution: Fixed > Remove default image for Unified Worker > --- > > Key: BEAM-9347 > URL: https://issues.apache.org/jira/browse/BEAM-9347 > Project: Beam > Issue Type: Test > Components: runner-dataflow >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > The runner will choose the Runner Harness image for UW so we don't need to > overwrite the image in default behavior. > Also, this will help us distinguish between user requested overwrites for the > default overwrites(which is not used). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9347) Remove default image for Unified Worker
[ https://issues.apache.org/jira/browse/BEAM-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043982#comment-17043982 ] Ankur Goenka commented on BEAM-9347: Yes, The PR is merged. > Remove default image for Unified Worker > --- > > Key: BEAM-9347 > URL: https://issues.apache.org/jira/browse/BEAM-9347 > Project: Beam > Issue Type: Test > Components: runner-dataflow >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > The runner will choose the Runner Harness image for UW so we don't need to > overwrite the image in default behavior. > Also, this will help us distinguish between user requested overwrites for the > default overwrites(which is not used). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043976#comment-17043976 ] Rui Wang commented on BEAM-9322: I am going to cut 2.20.0 branch on 02/26/2020. Can this Jira be finished by that date? Will you able to push this back to 2.21.0? > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9296) Add typing annotation to python SDF
[ https://issues.apache.org/jira/browse/BEAM-9296?focusedWorklogId=392202&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392202 ] ASF GitHub Bot logged work on BEAM-9296: Author: ASF GitHub Bot Created on: 24/Feb/20 23:54 Start Date: 24/Feb/20 23:54 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #10935: [BEAM-9296] Clean up and add type-hints to SDF API URL: https://github.com/apache/beam/pull/10935#issuecomment-590608697 All tests passed. I'm going to merge it. Thanks, Chad! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392202) Time Spent: 1h 20m (was: 1h 10m) > Add typing annotation to python SDF > --- > > Key: BEAM-9296 > URL: https://issues.apache.org/jira/browse/BEAM-9296 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9298) Drop support for Flink 1.7
[ https://issues.apache.org/jira/browse/BEAM-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043974#comment-17043974 ] Rui Wang commented on BEAM-9298: I am going to cut 2.20.0 branch on 02/26/2020. Can this Jira be finished by that date? Will you able to push this back to 2.21.0? > Drop support for Flink 1.7 > --- > > Key: BEAM-9298 > URL: https://issues.apache.org/jira/browse/BEAM-9298 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > Time Spent: 40m > Remaining Estimate: 0h > > With Flink 1.10 around the corner, more detail can be found in BEAM-9295, we > should consider dropping support for Flink 1.7. Then dropping 1.7 will also > decrease the build time. > What do you think? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9359) Use DataCatalog client libraries rather than gRPC stubs
[ https://issues.apache.org/jira/browse/BEAM-9359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043979#comment-17043979 ] Rui Wang commented on BEAM-9359: I am going to cut 2.20.0 branch on 02/26/2020. Can this Jira be finished by that date? Will you able to push this back to 2.21.0? > Use DataCatalog client libraries rather than gRPC stubs > --- > > Key: BEAM-9359 > URL: https://issues.apache.org/jira/browse/BEAM-9359 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.20.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The [GCP docs|https://cloud.google.com/data-catalog/docs/reference/libraries] > indicate this is the preferred way to use the service. > The client library sets some headers in requests that ensure they are > consistently routed properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9352) Ensure consistent usage of jackson version brought in by transitive dependencies
[ https://issues.apache.org/jira/browse/BEAM-9352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043980#comment-17043980 ] Rui Wang commented on BEAM-9352: I am going to cut 2.20.0 branch on 02/26/2020. Can this Jira be finished by that date? Will you able to push this back to 2.21.0? > Ensure consistent usage of jackson version brought in by transitive > dependencies > > > Key: BEAM-9352 > URL: https://issues.apache.org/jira/browse/BEAM-9352 > Project: Beam > Issue Type: Improvement > Components: build-system, sdk-java-core >Reporter: Luke Cwik >Assignee: Ismaël Mejía >Priority: Minor > Fix For: 2.20.0 > > Time Spent: 3h > Remaining Estimate: 0h > > Jackson relies on delivering a set of packages at a specific version which > are internally compatible with each other. Some of Apache Beam's transitive > dependencies bring in versions which may not be compatible with the version > used with Apache Beam. > > Analysis on [https://github.com/apache/beam/pull/10643] found that these are > some of those inconsistencies: > > {noformat} > > jackson-dataformat-xml-2.8.7.jar is at: > > org.apache.beam:beam-sdks-java-extensions-sql:2.20.0-SNAPSHOT (compile) / > > com.alibaba:fastjson:1.2.49 (compile) / > > org.springframework:spring-webmvc:4.3.7.RELEASE (provided, optional) / > > com.fasterxml.jackson.dataformat:jackson-dataformat-xml:2.8.7 (compile, > > optional) > > jackson-dataformat-xml-2.9.9.jar is at: > > org.apache.beam:beam-sdks-java-io-rabbitmq:2.20.0-SNAPSHOT (compile) / > > com.rabbitmq:amqp-client:5.7.3 (compile) / > > io.micrometer:micrometer-core:1.2.0 (compile, optional) / > > org.apache.logging.log4j:log4j-core:2.12.0 (compile, optional) / > > com.fasterxml.jackson.dataformat:jackson-dataformat-xml:2.9.9 (compile, > > optional) > < jackson-dataformat-csv-2.10.0.jar is at: > org.apache.beam:beam-sdks-java-io-kafka:2.20.0-SNAPSHOT (compile) / > io.confluent:kafka-avro-serializer:5.3.2 (compile) / > org.apache.kafka:kafka_2.12:5.3.2-ccs (provided) / > com.fasterxml.jackson.dataformat:jackson-dataformat-csv:2.10.0 (provided) > {noformat} > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9347) Remove default image for Unified Worker
[ https://issues.apache.org/jira/browse/BEAM-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043978#comment-17043978 ] Rui Wang commented on BEAM-9347: I am going to cut 2.20.0 branch on 02/26/2020. Can this Jira be finished by that date? Will you able to push this back to 2.21.0? > Remove default image for Unified Worker > --- > > Key: BEAM-9347 > URL: https://issues.apache.org/jira/browse/BEAM-9347 > Project: Beam > Issue Type: Test > Components: runner-dataflow >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > The runner will choose the Runner Harness image for UW so we don't need to > overwrite the image in default behavior. > Also, this will help us distinguish between user requested overwrites for the > default overwrites(which is not used). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2
[ https://issues.apache.org/jira/browse/BEAM-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043975#comment-17043975 ] Rui Wang commented on BEAM-9299: I am going to cut 2.20.0 branch on 02/26/2020. Can this Jira be finished by that date? Will you able to push this back to 2.21.0? > Upgrade Flink Runner to 1.8.3 and 1.9.2 > --- > > Key: BEAM-9299 > URL: https://issues.apache.org/jira/browse/BEAM-9299 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache > Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. > What do you think? > [1] https://dist.apache.org/repos/dist/release/flink/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9240) Check for Nullability in typesEqual() method of FieldType class
[ https://issues.apache.org/jira/browse/BEAM-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043971#comment-17043971 ] Rui Wang commented on BEAM-9240: I am going to cut 2.20.0 branch on 02/26/2020. Can this Jira be finished by that date? Will you able to push this back to 2.21.0? > Check for Nullability in typesEqual() method of FieldType class > --- > > Key: BEAM-9240 > URL: https://issues.apache.org/jira/browse/BEAM-9240 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.18.0 >Reporter: Rahul Patwari >Assignee: Rahul Patwari >Priority: Major > Fix For: 2.20.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > {{If two schemas are created like this:}} > {{Schema schema1 = Schema.builder().addStringField("col1").build();}} > {{Schema schema2 = Schema.builder().addNullableField("col1", > FieldType.STRING).build();}} > > {{schema1.typeEquals(schema2) returns "true" even though the schemas differ > by Nullability}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9295) Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10
[ https://issues.apache.org/jira/browse/BEAM-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043973#comment-17043973 ] Rui Wang commented on BEAM-9295: I am going to cut 2.20.0 branch on 02/26/2020. Can this Jira be finished by that date? Will you able to push this back to 2.21.0? > Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10 > --- > > Key: BEAM-9295 > URL: https://issues.apache.org/jira/browse/BEAM-9295 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Apache Flink 1.10 has completed the final release vote, see [1]. So, I would > like to add Flink 1.10 build target and make Flink Runner compatible with > Flink 1.10. > And I appreciate it if you can leave your suggestions or comments! > [1] > https://lists.apache.org/thread.html/r97672d4d1e47372cebf23e6643a6cc30a06bfbdf3f277b0be3695b15%40%3Cdev.flink.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9225) Flink uber jar job server hangs
[ https://issues.apache.org/jira/browse/BEAM-9225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043970#comment-17043970 ] Rui Wang commented on BEAM-9225: I am going to cut 2.20.0 branch on 02/26/2020. Can this Jira be finished by that date? Will you able to push this back to 2.21.0? > Flink uber jar job server hangs > --- > > Key: BEAM-9225 > URL: https://issues.apache.org/jira/browse/BEAM-9225 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.20.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > This was observed on Kubernetes. I suspect this behavior might also be the > reason beam_PostCommit_PortableJar_Flink is timing out. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8620) Tear down unused DoFns periodically in Java SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043969#comment-17043969 ] Rui Wang commented on BEAM-8620: I am going to cut 2.20.0 branch on 02/26/2020. Can this Jira be finished by that date? Will you able to push this back to 2.21.0? > Tear down unused DoFns periodically in Java SDK harness > --- > > Key: BEAM-8620 > URL: https://issues.apache.org/jira/browse/BEAM-8620 > Project: Beam > Issue Type: Improvement > Components: sdk-java-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > > Per the discussion in the ML the detail can be found here[1], the teardown of > DoFns should be supported in the portability framework. It happens at two > places: > 1) Upon the control service termination > 2) Tear down the unused DoFns periodically > The aim of this JIRA is to add support for tear down the unused DoFns > periodically in Java SDK harness. > [1] > https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8618) Tear down unused DoFns periodically in Python SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043968#comment-17043968 ] Rui Wang commented on BEAM-8618: I am going to cut 2.20.0 branch on 02/26/2020. Can this Jira be finished by that date? Will you able to push this back to 2.21.0? > Tear down unused DoFns periodically in Python SDK harness > - > > Key: BEAM-8618 > URL: https://issues.apache.org/jira/browse/BEAM-8618 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > Per the discussion in the ML, detail can be found [1], the teardown of DoFns > should be supported in the portability framework. It happens at two places: > 1) Upon the control service termination > 2) Tear down the unused DoFns periodically > The aim of this JIRA is to add support for tear down the unused DoFns > periodically in Python SDK harness. > [1] > https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=392201&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392201 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 24/Feb/20 23:53 Start Date: 24/Feb/20 23:53 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10731: [BEAM-7926] Data-centric Interactive Part3 URL: https://github.com/apache/beam/pull/10731#issuecomment-590608326 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392201) Time Spent: 48h 40m (was: 48.5h) > Show PCollection with Interactive Beam in a data-centric user flow > -- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 48h 40m > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > > {code:java} > p = beam.Pipeline(InteractiveRunner()) > pcoll = p | 'Transform' >> transform() > pcoll2 = ... > pcoll3 = ...{code} > The use can call a single function and get auto-magical charting of the data. > e.g., > {code:java} > show(pcoll, pcoll2) > {code} > Throughout the process, a pipeline fragment is built to include only > transforms necessary to produce the desired pcolls (pcoll and pcoll2) and > execute that fragment. > This makes the Interactive Beam user flow data-centric. > > Detailed > [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9252) Problem shading Beam pipeline with Beam 2.20.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/BEAM-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043966#comment-17043966 ] Rui Wang commented on BEAM-9252: Just FYI, I am going to cut 2.20.0 branch on 02/26/2020. > Problem shading Beam pipeline with Beam 2.20.0-SNAPSHOT > --- > > Key: BEAM-9252 > URL: https://issues.apache.org/jira/browse/BEAM-9252 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.20.0 >Reporter: Ismaël Mejía >Assignee: Luke Cwik >Priority: Critical > Fix For: 2.20.0 > > Attachments: ArrayIndexOutOfBoundsException.png, > image-2020-02-21-13-22-01-341.png > > Time Spent: 2h 10m > Remaining Estimate: 0h > > I was checking today a pipeline against the latest 2.20.0-SNAPSHOT and I > found that it works perfectly with version 2.19.0, but it is failing with a > shade related exception that refers to grpc 1.26.0: > {{[ERROR] Failed to execute goal > org.apache.maven.plugins:maven-shade-plugin:3.2.1:shade (default) on project > EventsToIOs: Error creating shaded jar: Problem shading JAR > /home/ismael/.m2/repository/org/apache/beam/beam-vendor-grpc-1_26_0/0.1/beam-vendor-grpc-1_26_0-0.1.jar > entry org/apache/beam/vendor/grpc/v1p26p0/org/jboss/modules/Main.class: > org.apache.maven.plugin.MojoExecutionException: Error in ASM processing class > org/apache/beam/vendor/grpc/v1p26p0/org/jboss/modules/Main.class: 65536 -> > [Help 1]}} > {{There is also a warning that is not present in the build against 2.19.0}} > {{[WARNING] Discovered module-info.class. Shading will break its strong > encapsulation.}} > > I wonder if we are not doing something wrong during our vendoring, can > someone take a look please. > This is relatively easy to reproduce with the beam-samples repo, just clone > it and run: > {noformat} > git clone https://github.com/jbonofre/beam-samples > mvn clean verify -Pbeam-release-repo -Dbeam.version=2.20.0-SNAPSHOT > {noformat} > Available logs of the latest run: > [https://github.com/jbonofre/beam-samples/runs/427537544?check_suite_focus=true] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=392200&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392200 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 24/Feb/20 23:52 Start Date: 24/Feb/20 23:52 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#issuecomment-590608244 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392200) Time Spent: 71h 10m (was: 71h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 71h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9288) Conscrypt shaded dependency
[ https://issues.apache.org/jira/browse/BEAM-9288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043967#comment-17043967 ] Rui Wang commented on BEAM-9288: I am going to cut 2.20.0 branch on 02/26/2020. Can this Jira be finished by that date? > Conscrypt shaded dependency > --- > > Key: BEAM-9288 > URL: https://issues.apache.org/jira/browse/BEAM-9288 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Esun Kim >Assignee: sunjincheng >Priority: Critical > Fix For: 2.20.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Conscrypt is not designed to be shaded properly mainly because of so files. I > happened to see BEAM-9030 (*1) creating a new vendored gRPC shading Conscrypt > (*2) in it. I think this could make a problem when new Conscrypt is brought > by new gcsio depending on gRPC-alts (*4) in a dependency chain. (*5) In this > case, it may have a conflict when finding proper so files for Conscrypt. > *1: https://issues.apache.org/jira/browse/BEAM-9030 > *2: > [https://github.com/apache/beam/blob/e24d1e51cbabe27cb3cc381fd95b334db639c45d/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_26_0.groovy#L78] > *3: https://issues.apache.org/jira/browse/BEAM-6136 > *4: [https://mvnrepository.com/artifact/io.grpc/grpc-alts/1.27.0] > *5: https://issues.apache.org/jira/browse/BEAM-8889 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8989) Backwards incompatible change in ParDo.getSideInputs (caught by failure when running Apache Nemo quickstart)
[ https://issues.apache.org/jira/browse/BEAM-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043964#comment-17043964 ] Rui Wang edited comment on BEAM-8989 at 2/24/20 11:51 PM: -- I am going to cut 2.20.0 branch on 02/26/2020. Can this Jira be finished by that date? Will you able to push this back to 2.21.0? was (Author: amaliujia): I am going to cut 2.20.0 branch on 02/26/2020. Will you able to push this back to 2.21.0? > Backwards incompatible change in ParDo.getSideInputs (caught by failure when > running Apache Nemo quickstart) > > > Key: BEAM-8989 > URL: https://issues.apache.org/jira/browse/BEAM-8989 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0 >Reporter: Luke Cwik >Assignee: Reuven Lax >Priority: Critical > Fix For: 2.20.0 > > > [PR/9275|https://github.com/apache/beam/pull/9275] changed > *ParDo.getSideInputs* from *List* to *Map PCollectionView>* which is backwards incompatible change and was released as > part of Beam 2.16.0 erroneously. > Running the Apache Nemo Quickstart fails with: > > {code:java} > Exception in thread "main" java.lang.RuntimeException: Translator private > static void > org.apache.nemo.compiler.frontend.beam.PipelineTranslator.parDoMultiOutputTranslator(org.apache.nemo.compiler.frontend.beam.PipelineTranslationContext,org.apache.beam.sdk.runners.TransformHierarchy$Node,org.apache.beam.sdk.transforms.ParDo$MultiOutput) > have failed to translate > org.apache.beam.examples.WordCount$ExtractWordsFn@600b9d27Exception in thread > "main" java.lang.RuntimeException: Translator private static void > org.apache.nemo.compiler.frontend.beam.PipelineTranslator.parDoMultiOutputTranslator(org.apache.nemo.compiler.frontend.beam.PipelineTranslationContext,org.apache.beam.sdk.runners.TransformHierarchy$Node,org.apache.beam.sdk.transforms.ParDo$MultiOutput) > have failed to translate > org.apache.beam.examples.WordCount$ExtractWordsFn@600b9d27 at > org.apache.nemo.compiler.frontend.beam.PipelineTranslator.translatePrimitive(PipelineTranslator.java:113) > at > org.apache.nemo.compiler.frontend.beam.PipelineVisitor.visitPrimitiveTransform(PipelineVisitor.java:46) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:665) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317) > at > org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251) > at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:460) at > org.apache.nemo.compiler.frontend.beam.NemoRunner.run(NemoRunner.java:80) at > org.apache.nemo.compiler.frontend.beam.NemoRunner.run(NemoRunner.java:31) at > org.apache.beam.sdk.Pipeline.run(Pipeline.java:315) at > org.apache.beam.sdk.Pipeline.run(Pipeline.java:301) at > org.apache.beam.examples.WordCount.runWordCount(WordCount.java:185) at > org.apache.beam.examples.WordCount.main(WordCount.java:192)Caused by: > java.lang.reflect.InvocationTargetException at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.nemo.compiler.frontend.beam.PipelineTranslator.translatePrimitive(PipelineTranslator.java:109) > ... 14 moreCaused by: java.lang.NoSuchMethodError: > org.apache.beam.sdk.transforms.ParDo$MultiOutput.getSideInputs()Ljava/util/List; > at > org.apache.nemo.compiler.frontend.beam.PipelineTranslator.parDoMultiOutputTranslator(PipelineTranslator.java:236) > ... 19 more{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8989) Backwards incompatible change in ParDo.getSideInputs (caught by failure when running Apache Nemo quickstart)
[ https://issues.apache.org/jira/browse/BEAM-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043964#comment-17043964 ] Rui Wang commented on BEAM-8989: I am going to cut 2.20.0 branch on 02/26/2020. Will you able to push this back to 2.21.0? > Backwards incompatible change in ParDo.getSideInputs (caught by failure when > running Apache Nemo quickstart) > > > Key: BEAM-8989 > URL: https://issues.apache.org/jira/browse/BEAM-8989 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0 >Reporter: Luke Cwik >Assignee: Reuven Lax >Priority: Critical > Fix For: 2.20.0 > > > [PR/9275|https://github.com/apache/beam/pull/9275] changed > *ParDo.getSideInputs* from *List* to *Map PCollectionView>* which is backwards incompatible change and was released as > part of Beam 2.16.0 erroneously. > Running the Apache Nemo Quickstart fails with: > > {code:java} > Exception in thread "main" java.lang.RuntimeException: Translator private > static void > org.apache.nemo.compiler.frontend.beam.PipelineTranslator.parDoMultiOutputTranslator(org.apache.nemo.compiler.frontend.beam.PipelineTranslationContext,org.apache.beam.sdk.runners.TransformHierarchy$Node,org.apache.beam.sdk.transforms.ParDo$MultiOutput) > have failed to translate > org.apache.beam.examples.WordCount$ExtractWordsFn@600b9d27Exception in thread > "main" java.lang.RuntimeException: Translator private static void > org.apache.nemo.compiler.frontend.beam.PipelineTranslator.parDoMultiOutputTranslator(org.apache.nemo.compiler.frontend.beam.PipelineTranslationContext,org.apache.beam.sdk.runners.TransformHierarchy$Node,org.apache.beam.sdk.transforms.ParDo$MultiOutput) > have failed to translate > org.apache.beam.examples.WordCount$ExtractWordsFn@600b9d27 at > org.apache.nemo.compiler.frontend.beam.PipelineTranslator.translatePrimitive(PipelineTranslator.java:113) > at > org.apache.nemo.compiler.frontend.beam.PipelineVisitor.visitPrimitiveTransform(PipelineVisitor.java:46) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:665) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317) > at > org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251) > at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:460) at > org.apache.nemo.compiler.frontend.beam.NemoRunner.run(NemoRunner.java:80) at > org.apache.nemo.compiler.frontend.beam.NemoRunner.run(NemoRunner.java:31) at > org.apache.beam.sdk.Pipeline.run(Pipeline.java:315) at > org.apache.beam.sdk.Pipeline.run(Pipeline.java:301) at > org.apache.beam.examples.WordCount.runWordCount(WordCount.java:185) at > org.apache.beam.examples.WordCount.main(WordCount.java:192)Caused by: > java.lang.reflect.InvocationTargetException at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.nemo.compiler.frontend.beam.PipelineTranslator.translatePrimitive(PipelineTranslator.java:109) > ... 14 moreCaused by: java.lang.NoSuchMethodError: > org.apache.beam.sdk.transforms.ParDo$MultiOutput.getSideInputs()Ljava/util/List; > at > org.apache.nemo.compiler.frontend.beam.PipelineTranslator.parDoMultiOutputTranslator(PipelineTranslator.java:236) > ... 19 more{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=392198&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392198 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 24/Feb/20 23:50 Start Date: 24/Feb/20 23:50 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #10933: [BEAM-8537] Update docstring of ManualWatermarkEstimator.set_watermark() URL: https://github.com/apache/beam/pull/10933 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392198) Time Spent: 18h 10m (was: 18h) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 18h 10m > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=392195&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392195 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 24/Feb/20 23:50 Start Date: 24/Feb/20 23:50 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #10933: [BEAM-8537] Update docstring of ManualWatermarkEstimator.set_watermark() URL: https://github.com/apache/beam/pull/10933#issuecomment-590607584 All tests passed. I'm going to merge it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392195) Time Spent: 18h (was: 17h 50m) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 18h > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9378) Support nested Row after upgrading to Calcite 1.22.0
[ https://issues.apache.org/jira/browse/BEAM-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-9378: --- Status: Open (was: Triage Needed) > Support nested Row after upgrading to Calcite 1.22.0 > - > > Key: BEAM-9378 > URL: https://issues.apache.org/jira/browse/BEAM-9378 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > > CALCITE-3138 improves the struct flattener in Calcite after upgrading > Calcite, we will have better environment to test nested row support in BeamSQL -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9379) Upgrade to Calcite 1.22.0
[ https://issues.apache.org/jira/browse/BEAM-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-9379: --- Status: Open (was: Triage Needed) > Upgrade to Calcite 1.22.0 > - > > Key: BEAM-9379 > URL: https://issues.apache.org/jira/browse/BEAM-9379 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > > Upgrade to Calcite 1.22.0 after it gets released (expected by end of Feb > 2020). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9378) Support nested Row after upgrading to Calcite 1.22.0
[ https://issues.apache.org/jira/browse/BEAM-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-9378: --- Description: CALCITE-3138 improves the struct flattener in Calcite. So after upgrading Calcite, we will have better environment to test nested row support in BeamSQL (was: CALCITE-3138 improves the struct flattener in Calcite after upgrading Calcite, we will have better environment to test nested row support in BeamSQL) > Support nested Row after upgrading to Calcite 1.22.0 > - > > Key: BEAM-9378 > URL: https://issues.apache.org/jira/browse/BEAM-9378 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > > CALCITE-3138 improves the struct flattener in Calcite. So after upgrading > Calcite, we will have better environment to test nested row support in BeamSQL -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9379) Upgrade to Calcite 1.22.0
Rui Wang created BEAM-9379: -- Summary: Upgrade to Calcite 1.22.0 Key: BEAM-9379 URL: https://issues.apache.org/jira/browse/BEAM-9379 Project: Beam Issue Type: Task Components: dsl-sql Reporter: Rui Wang Assignee: Rui Wang Upgrade to Calcite 1.22.0 after it gets released (expected by end of Feb 2020). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9378) Support nested Row after upgrading to Calcite 1.22.0
Rui Wang created BEAM-9378: -- Summary: Support nested Row after upgrading to Calcite 1.22.0 Key: BEAM-9378 URL: https://issues.apache.org/jira/browse/BEAM-9378 Project: Beam Issue Type: Bug Components: dsl-sql Reporter: Rui Wang CALCITE-3138 improves the struct flattener in Calcite after upgrading Calcite, we will have better environment to test nested row support in BeamSQL -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1833) Restructure Python pipeline construction to better follow the Runner API
[ https://issues.apache.org/jira/browse/BEAM-1833?focusedWorklogId=392183&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392183 ] ASF GitHub Bot logged work on BEAM-1833: Author: ASF GitHub Bot Created on: 24/Feb/20 23:33 Start Date: 24/Feb/20 23:33 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10934: [BEAM-1833] Broke some people, setting the default to have the experiment be disabled URL: https://github.com/apache/beam/pull/10934#issuecomment-590602666 LGTM. @lukecwik - Are you fine with this change? Or do you prefer a different fix? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392183) Time Spent: 4h 10m (was: 4h) > Restructure Python pipeline construction to better follow the Runner API > > > Key: BEAM-1833 > URL: https://issues.apache.org/jira/browse/BEAM-1833 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > The most important part is removing the runner.apply overrides, but there are > also various other improvements (e.g. all inputs and outputs should be named). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9296) Add typing annotation to python SDF
[ https://issues.apache.org/jira/browse/BEAM-9296?focusedWorklogId=392180&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392180 ] ASF GitHub Bot logged work on BEAM-9296: Author: ASF GitHub Bot Created on: 24/Feb/20 23:25 Start Date: 24/Feb/20 23:25 Worklog Time Spent: 10m Work Description: chadrik commented on issue #10935: [BEAM-9296] Clean up and add type-hints to SDF API URL: https://github.com/apache/beam/pull/10935#issuecomment-590600160 LGTM, assuming the tests pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392180) Time Spent: 1h 10m (was: 1h) > Add typing annotation to python SDF > --- > > Key: BEAM-9296 > URL: https://issues.apache.org/jira/browse/BEAM-9296 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2822) Add support for progress reporting in fn API
[ https://issues.apache.org/jira/browse/BEAM-2822?focusedWorklogId=392175&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392175 ] ASF GitHub Bot logged work on BEAM-2822: Author: ASF GitHub Bot Created on: 24/Feb/20 23:20 Start Date: 24/Feb/20 23:20 Worklog Time Spent: 10m Work Description: ananvay commented on issue #10956: [BEAM-2822, BEAM-2939, BEAM-6189, BEAM-4374] Enable passing completed and remaining amounts of work using new metrics format for splittable dofns in the Python SDK. URL: https://github.com/apache/beam/pull/10956#issuecomment-590598664 Thanks a lot Luke! LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392175) Time Spent: 0.5h (was: 20m) > Add support for progress reporting in fn API > > > Key: BEAM-2822 > URL: https://issues.apache.org/jira/browse/BEAM-2822 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Vikas Kedigehalli >Priority: Minor > Labels: portability > Time Spent: 0.5h > Remaining Estimate: 0h > > https://s.apache.org/beam-fn-api-progress-reporting > Note that the ULR reference implementation, when ready, should be useful for > every runner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9296) Add typing annotation to python SDF
[ https://issues.apache.org/jira/browse/BEAM-9296?focusedWorklogId=392176&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392176 ] ASF GitHub Bot logged work on BEAM-9296: Author: ASF GitHub Bot Created on: 24/Feb/20 23:20 Start Date: 24/Feb/20 23:20 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #10935: [BEAM-9296] Clean up and add type-hints to SDF API URL: https://github.com/apache/beam/pull/10935#discussion_r383572809 ## File path: sdks/python/apache_beam/runners/common.py ## @@ -333,6 +336,7 @@ def is_splittable_dofn(self): return self.get_restriction_provider() is not None def get_restriction_coder(self): +# type: () -> Optional[TupleCoder] Review comment: correct. my comment about avoiding the declaration of `Optional[TupleCoder]` refers to the variable (which my edit does away with), not the the return type. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392176) Time Spent: 1h (was: 50m) > Add typing annotation to python SDF > --- > > Key: BEAM-9296 > URL: https://issues.apache.org/jira/browse/BEAM-9296 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=392172&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392172 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 24/Feb/20 23:16 Start Date: 24/Feb/20 23:16 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10955: [BEAM-8019] Updates Dataflow client URL: https://github.com/apache/beam/pull/10955#issuecomment-590597368 There's documentation available internally. Not sure if this can be easily generate purely using Beam repo. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392172) Time Spent: 6h 10m (was: 6h) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2822) Add support for progress reporting in fn API
[ https://issues.apache.org/jira/browse/BEAM-2822?focusedWorklogId=392161&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392161 ] ASF GitHub Bot logged work on BEAM-2822: Author: ASF GitHub Bot Created on: 24/Feb/20 23:12 Start Date: 24/Feb/20 23:12 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10956: [BEAM-2822, BEAM-2939, BEAM-6189, BEAM-4374] Enable passing completed and remaining amounts of work using new metrics format for splittable dofns in the Python SDK. URL: https://github.com/apache/beam/pull/10956#issuecomment-590596115 R: @ananvay @HuangLED CC: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392161) Time Spent: 20m (was: 10m) > Add support for progress reporting in fn API > > > Key: BEAM-2822 > URL: https://issues.apache.org/jira/browse/BEAM-2822 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Vikas Kedigehalli >Priority: Minor > Labels: portability > Time Spent: 20m > Remaining Estimate: 0h > > https://s.apache.org/beam-fn-api-progress-reporting > Note that the ULR reference implementation, when ready, should be useful for > every runner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2822) Add support for progress reporting in fn API
[ https://issues.apache.org/jira/browse/BEAM-2822?focusedWorklogId=392160&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392160 ] ASF GitHub Bot logged work on BEAM-2822: Author: ASF GitHub Bot Created on: 24/Feb/20 23:09 Start Date: 24/Feb/20 23:09 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10956: [BEAM-2822, BEAM-2939, BEAM-6189, BEAM-4374] Enable passing completed and remaining amounts of work using new metrics format for splittable dofns in the Python SDK. URL: https://github.com/apache/beam/pull/10956 This provides the equivalent to the `metrics.active_elements.fraction_remaining` using the new monitoring APIs. Without these, we cannot enable autoscaling using the new metrics. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_
[jira] [Work logged] (BEAM-9353) ByteBuddy Schema code does not properly handle null values
[ https://issues.apache.org/jira/browse/BEAM-9353?focusedWorklogId=392159&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392159 ] ASF GitHub Bot logged work on BEAM-9353: Author: ASF GitHub Bot Created on: 24/Feb/20 23:09 Start Date: 24/Feb/20 23:09 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #10926: [BEAM-9353] Fix bytebuddy nullable URL: https://github.com/apache/beam/pull/10926 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392159) Time Spent: 50m (was: 40m) > ByteBuddy Schema code does not properly handle null values > -- > > Key: BEAM-9353 > URL: https://issues.apache.org/jira/browse/BEAM-9353 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=392158&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392158 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 24/Feb/20 23:07 Start Date: 24/Feb/20 23:07 Worklog Time Spent: 10m Work Description: robertwb commented on issue #10955: [BEAM-8019] Updates Dataflow client URL: https://github.com/apache/beam/pull/10955#issuecomment-590594650 Thanks! Any notes as to how this generated? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392158) Time Spent: 6h (was: 5h 50m) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9353) ByteBuddy Schema code does not properly handle null values
[ https://issues.apache.org/jira/browse/BEAM-9353?focusedWorklogId=392149&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392149 ] ASF GitHub Bot logged work on BEAM-9353: Author: ASF GitHub Bot Created on: 24/Feb/20 22:59 Start Date: 24/Feb/20 22:59 Worklog Time Spent: 10m Work Description: rezarokni commented on issue #10926: [BEAM-9353] Fix bytebuddy nullable URL: https://github.com/apache/beam/pull/10926#issuecomment-590592042 Yes this fixed the bug thanx! LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392149) Time Spent: 40m (was: 0.5h) > ByteBuddy Schema code does not properly handle null values > -- > > Key: BEAM-9353 > URL: https://issues.apache.org/jira/browse/BEAM-9353 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9296) Add typing annotation to python SDF
[ https://issues.apache.org/jira/browse/BEAM-9296?focusedWorklogId=392146&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392146 ] ASF GitHub Bot logged work on BEAM-9296: Author: ASF GitHub Bot Created on: 24/Feb/20 22:58 Start Date: 24/Feb/20 22:58 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #10935: [BEAM-9296] Clean up and add type-hints to SDF API URL: https://github.com/apache/beam/pull/10935#discussion_r383564971 ## File path: sdks/python/apache_beam/runners/sdf_utils.py ## @@ -38,6 +38,8 @@ if TYPE_CHECKING: from apache_beam.io.iobase import RestrictionTracker + from apache_beam.io.iobase import RestrictionProgress Review comment: Done. Thanks for mentioning that! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392146) Time Spent: 50m (was: 40m) > Add typing annotation to python SDF > --- > > Key: BEAM-9296 > URL: https://issues.apache.org/jira/browse/BEAM-9296 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9357) Bump upper end of Google Bigquery dependencies for python
[ https://issues.apache.org/jira/browse/BEAM-9357?focusedWorklogId=392145&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392145 ] ASF GitHub Bot logged work on BEAM-9357: Author: ASF GitHub Bot Created on: 24/Feb/20 22:58 Start Date: 24/Feb/20 22:58 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10929: [BEAM-9357] Bump google cloud bigquery to 1.24.0 URL: https://github.com/apache/beam/pull/10929 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392145) Time Spent: 50m (was: 40m) > Bump upper end of Google Bigquery dependencies for python > - > > Key: BEAM-9357 > URL: https://issues.apache.org/jira/browse/BEAM-9357 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness > Environment: Python >Reporter: David Rubinstein >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > I am trying to use Dataflow with Beam Python and a package that depends on > google-resumable-media 0.5.0. The current google-cloud-bigquery (which is > only used for testing) depends on google-resumable-media <= 0.4.1. The upper > bound on the google-cloud-bigquery version should be loosened to solve > possible transitive dependency issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9357) Bump upper end of Google Bigquery dependencies for python
[ https://issues.apache.org/jira/browse/BEAM-9357?focusedWorklogId=392147&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392147 ] ASF GitHub Bot logged work on BEAM-9357: Author: ASF GitHub Bot Created on: 24/Feb/20 22:58 Start Date: 24/Feb/20 22:58 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10929: [BEAM-9357] Bump google cloud bigquery to 1.24.0 URL: https://github.com/apache/beam/pull/10929#issuecomment-590591947 > @aaltay Since I'm not aware of how PRs get merged around here, who do I have to mention to get the merge? I merged it. Thanks for the ping. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392147) Time Spent: 1h (was: 50m) > Bump upper end of Google Bigquery dependencies for python > - > > Key: BEAM-9357 > URL: https://issues.apache.org/jira/browse/BEAM-9357 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness > Environment: Python >Reporter: David Rubinstein >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > I am trying to use Dataflow with Beam Python and a package that depends on > google-resumable-media 0.5.0. The current google-cloud-bigquery (which is > only used for testing) depends on google-resumable-media <= 0.4.1. The upper > bound on the google-cloud-bigquery version should be loosened to solve > possible transitive dependency issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9296) Add typing annotation to python SDF
[ https://issues.apache.org/jira/browse/BEAM-9296?focusedWorklogId=392144&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392144 ] ASF GitHub Bot logged work on BEAM-9296: Author: ASF GitHub Bot Created on: 24/Feb/20 22:58 Start Date: 24/Feb/20 22:58 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #10935: [BEAM-9296] Clean up and add type-hints to SDF API URL: https://github.com/apache/beam/pull/10935#discussion_r383564824 ## File path: sdks/python/apache_beam/runners/common.py ## @@ -333,6 +336,7 @@ def is_splittable_dofn(self): return self.get_restriction_provider() is not None def get_restriction_coder(self): +# type: () -> Optional[TupleCoder] Review comment: Done. I think the return type should still be `Optional[TupleCoder]` given that it also returns `None`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392144) Time Spent: 40m (was: 0.5h) > Add typing annotation to python SDF > --- > > Key: BEAM-9296 > URL: https://issues.apache.org/jira/browse/BEAM-9296 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=392143&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392143 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 24/Feb/20 22:55 Start Date: 24/Feb/20 22:55 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10955: [BEAM-8019] Updates Dataflow client URL: https://github.com/apache/beam/pull/10955 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStruct
[jira] [Work logged] (BEAM-9373) Flink/Spark portable jar tests failing
[ https://issues.apache.org/jira/browse/BEAM-9373?focusedWorklogId=392142&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392142 ] ASF GitHub Bot logged work on BEAM-9373: Author: ASF GitHub Bot Created on: 24/Feb/20 22:49 Start Date: 24/Feb/20 22:49 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10954: [BEAM-9373] Spark/Flink tests fix string concat URL: https://github.com/apache/beam/pull/10954#issuecomment-590588718 Run PortableJar_Spark PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392142) Time Spent: 50m (was: 40m) > Flink/Spark portable jar tests failing > -- > > Key: BEAM-9373 > URL: https://issues.apache.org/jira/browse/BEAM-9373 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: currently-failing > Time Spent: 50m > Remaining Estimate: 0h > > *[https://builds.apache.org/job/beam_PostCommit_PortableJar_Flink|https://builds.apache.org/job/beam_PostCommit_PortableJar_Flink] > > *[https://builds.apache.org/job/beam_PostCommit_PortableJar_Spark|https://builds.apache.org/job/beam_PostCommit_PortableJar_Spark] > *14:13:27* 1: Task failed with an exception.*14:13:27* ---*14:13:27* > * Where:*14:13:27* Script > '/home/jenkins/jenkins-slave/workspace/beam_PostCommit_PortableJar_Flink/src/runners/flink/job-server/flink_job_server.gradle' > line: 266*14:13:27* *14:13:27* * What went wrong:*14:13:27* Execution failed > for task ':runners:flink:1.9:job-server:testFlinkUberJarPy37'.*14:13:27* > > Could not find method $() for arguments > [flink_job_server_70sh27mqy1m8ejqzz2sd1kql1$_addTestFlinkUberJarPy_closure15$_closure28$_closure29$_closure30@230aeda1] > on object of type org.gradle.process.internal.DefaultExecAction_Decorated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9373) Flink/Spark portable jar tests failing
[ https://issues.apache.org/jira/browse/BEAM-9373?focusedWorklogId=392141&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392141 ] ASF GitHub Bot logged work on BEAM-9373: Author: ASF GitHub Bot Created on: 24/Feb/20 22:48 Start Date: 24/Feb/20 22:48 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10954: [BEAM-9373] Spark/Flink tests fix string concat URL: https://github.com/apache/beam/pull/10954#issuecomment-590588404 Run PortableJar_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392141) Time Spent: 40m (was: 0.5h) > Flink/Spark portable jar tests failing > -- > > Key: BEAM-9373 > URL: https://issues.apache.org/jira/browse/BEAM-9373 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: currently-failing > Time Spent: 40m > Remaining Estimate: 0h > > *[https://builds.apache.org/job/beam_PostCommit_PortableJar_Flink|https://builds.apache.org/job/beam_PostCommit_PortableJar_Flink] > > *[https://builds.apache.org/job/beam_PostCommit_PortableJar_Spark|https://builds.apache.org/job/beam_PostCommit_PortableJar_Spark] > *14:13:27* 1: Task failed with an exception.*14:13:27* ---*14:13:27* > * Where:*14:13:27* Script > '/home/jenkins/jenkins-slave/workspace/beam_PostCommit_PortableJar_Flink/src/runners/flink/job-server/flink_job_server.gradle' > line: 266*14:13:27* *14:13:27* * What went wrong:*14:13:27* Execution failed > for task ':runners:flink:1.9:job-server:testFlinkUberJarPy37'.*14:13:27* > > Could not find method $() for arguments > [flink_job_server_70sh27mqy1m8ejqzz2sd1kql1$_addTestFlinkUberJarPy_closure15$_closure28$_closure29$_closure30@230aeda1] > on object of type org.gradle.process.internal.DefaultExecAction_Decorated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=392138&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392138 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 24/Feb/20 22:41 Start Date: 24/Feb/20 22:41 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #10892: [BEAM-8335] Make TestStream to/from runner_api include the output_tags property. URL: https://github.com/apache/beam/pull/10892#issuecomment-590586019 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392138) Time Spent: 70h 50m (was: 70h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 70h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=392139&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392139 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 24/Feb/20 22:41 Start Date: 24/Feb/20 22:41 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #10892: [BEAM-8335] Make TestStream to/from runner_api include the output_tags property. URL: https://github.com/apache/beam/pull/10892#issuecomment-590586019 retest this please EDIT: dang, doesn't work yet This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392139) Time Spent: 71h (was: 70h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 71h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9347) Remove default image for Unified Worker
[ https://issues.apache.org/jira/browse/BEAM-9347?focusedWorklogId=392133&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392133 ] ASF GitHub Bot logged work on BEAM-9347: Author: ASF GitHub Bot Created on: 24/Feb/20 22:35 Start Date: 24/Feb/20 22:35 Worklog Time Spent: 10m Work Description: angoenka commented on issue #10919: [BEAM-9347] Don't overwrite default runner harness for unified worker URL: https://github.com/apache/beam/pull/10919#issuecomment-590584036 Thanks @tvalentyn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392133) Time Spent: 2h 50m (was: 2h 40m) > Remove default image for Unified Worker > --- > > Key: BEAM-9347 > URL: https://issues.apache.org/jira/browse/BEAM-9347 > Project: Beam > Issue Type: Test > Components: runner-dataflow >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > The runner will choose the Runner Harness image for UW so we don't need to > overwrite the image in default behavior. > Also, this will help us distinguish between user requested overwrites for the > default overwrites(which is not used). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9347) Remove default image for Unified Worker
[ https://issues.apache.org/jira/browse/BEAM-9347?focusedWorklogId=392132&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392132 ] ASF GitHub Bot logged work on BEAM-9347: Author: ASF GitHub Bot Created on: 24/Feb/20 22:35 Start Date: 24/Feb/20 22:35 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #10919: [BEAM-9347] Don't overwrite default runner harness for unified worker URL: https://github.com/apache/beam/pull/10919 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392132) Time Spent: 2h 40m (was: 2.5h) > Remove default image for Unified Worker > --- > > Key: BEAM-9347 > URL: https://issues.apache.org/jira/browse/BEAM-9347 > Project: Beam > Issue Type: Test > Components: runner-dataflow >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > The runner will choose the Runner Harness image for UW so we don't need to > overwrite the image in default behavior. > Also, this will help us distinguish between user requested overwrites for the > default overwrites(which is not used). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9377) Python typehints: Map wrapper prevents Optional stripping
[ https://issues.apache.org/jira/browse/BEAM-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043916#comment-17043916 ] Udi Meiri commented on BEAM-9377: - Currently, Optional is only stripped when it's the outermost element of the output hint. This needs careful examination, but it may be correct to strip Optional before and/or after Iterable, e.g. Optional[Iterable[Optional[int]]] =>(to ParDo output type) int. cc: [~robertwb] > Python typehints: Map wrapper prevents Optional stripping > - > > Key: BEAM-9377 > URL: https://issues.apache.org/jira/browse/BEAM-9377 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Udi Meiri >Priority: Major > > This existing test is wrong: > {code} > def test_map_wrapper_optional_output(self): > # Optional does affect output type (Nones are NOT ignored). > def map_fn(unused_element: int) -> typehints.Optional[int]: > return 1 > th = beam.Map(map_fn).get_type_hints() > self.assertEqual(th.input_types, ((int, ), {})) > self.assertEqual(th.output_types, ((typehints.Optional[int], ), {})) > {code} > The resulting output type should be int. > {code} > inital output hint: > Optional[int] > with wrapper: > Iterable[Optional[int]] > with DoFn.default_type_hints: > Optional[int] > {code} > However any Nones returned by a DoFn's process method are dropped, so the > actual element_type returned is plain int. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9377) Python typehints: Map wrapper prevents Optional stripping
Udi Meiri created BEAM-9377: --- Summary: Python typehints: Map wrapper prevents Optional stripping Key: BEAM-9377 URL: https://issues.apache.org/jira/browse/BEAM-9377 Project: Beam Issue Type: New Feature Components: sdk-py-core Reporter: Udi Meiri This existing test is wrong: {code} def test_map_wrapper_optional_output(self): # Optional does affect output type (Nones are NOT ignored). def map_fn(unused_element: int) -> typehints.Optional[int]: return 1 th = beam.Map(map_fn).get_type_hints() self.assertEqual(th.input_types, ((int, ), {})) self.assertEqual(th.output_types, ((typehints.Optional[int], ), {})) {code} The resulting output type should be int. {code} inital output hint: Optional[int] with wrapper: Iterable[Optional[int]] with DoFn.default_type_hints: Optional[int] {code} However any Nones returned by a DoFn's process method are dropped, so the actual element_type returned is plain int. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1833) Restructure Python pipeline construction to better follow the Runner API
[ https://issues.apache.org/jira/browse/BEAM-1833?focusedWorklogId=392119&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392119 ] ASF GitHub Bot logged work on BEAM-1833: Author: ASF GitHub Bot Created on: 24/Feb/20 21:59 Start Date: 24/Feb/20 21:59 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #10934: [BEAM-1833] Broke some people, setting the default to have the experiment be disabled URL: https://github.com/apache/beam/pull/10934#issuecomment-590570107 > The intention is to fix Python SDK for 2.20 release (Wednesday), Sam do you have enough time to do that? Yep, I believe the PR is good-to-go. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392119) Time Spent: 4h (was: 3h 50m) > Restructure Python pipeline construction to better follow the Runner API > > > Key: BEAM-1833 > URL: https://issues.apache.org/jira/browse/BEAM-1833 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 4h > Remaining Estimate: 0h > > The most important part is removing the runner.apply overrides, but there are > also various other improvements (e.g. all inputs and outputs should be named). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1833) Restructure Python pipeline construction to better follow the Runner API
[ https://issues.apache.org/jira/browse/BEAM-1833?focusedWorklogId=392118&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392118 ] ASF GitHub Bot logged work on BEAM-1833: Author: ASF GitHub Bot Created on: 24/Feb/20 21:58 Start Date: 24/Feb/20 21:58 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #10934: [BEAM-1833] Broke some people, setting the default to have the experiment be disabled URL: https://github.com/apache/beam/pull/10934#issuecomment-590569784 > Could you also update the CHANGES.md? (There is a breaking change note that can be removed now.) Done, removed the breaking changes line. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392118) Time Spent: 3h 50m (was: 3h 40m) > Restructure Python pipeline construction to better follow the Runner API > > > Key: BEAM-1833 > URL: https://issues.apache.org/jira/browse/BEAM-1833 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > The most important part is removing the runner.apply overrides, but there are > also various other improvements (e.g. all inputs and outputs should be named). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=392116&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392116 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 24/Feb/20 21:46 Start Date: 24/Feb/20 21:46 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on issue #10951: [BEAM-8575] Modified the test to work for different runners. URL: https://github.com/apache/beam/pull/10951#issuecomment-590564815 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392116) Time Spent: 55h 50m (was: 55h 40m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 55h 50m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9056) Staging artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9056?focusedWorklogId=392115&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392115 ] ASF GitHub Bot logged work on BEAM-9056: Author: ASF GitHub Bot Created on: 24/Feb/20 21:42 Start Date: 24/Feb/20 21:42 Worklog Time Spent: 10m Work Description: ihji commented on issue #10621: [BEAM-9056] Staging artifacts from environment URL: https://github.com/apache/beam/pull/10621#issuecomment-590563159 fixed conflicts This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392115) Time Spent: 2.5h (was: 2h 20m) > Staging artifacts from environment > -- > > Key: BEAM-9056 > URL: https://issues.apache.org/jira/browse/BEAM-9056 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > staging artifacts from artifact information embedded in environment proto. > detail: > https://docs.google.com/document/d/1L7MJcfyy9mg2Ahfw5XPhUeBe-dyvAPMOYOiFA1-kAog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9373) Flink/Spark portable jar tests failing
[ https://issues.apache.org/jira/browse/BEAM-9373?focusedWorklogId=392114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392114 ] ASF GitHub Bot logged work on BEAM-9373: Author: ASF GitHub Bot Created on: 24/Feb/20 21:40 Start Date: 24/Feb/20 21:40 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10954: [BEAM-9373] Spark/Flink tests fix string concat URL: https://github.com/apache/beam/pull/10954#issuecomment-590562278 Run PortableJar_Spark PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392114) Time Spent: 0.5h (was: 20m) > Flink/Spark portable jar tests failing > -- > > Key: BEAM-9373 > URL: https://issues.apache.org/jira/browse/BEAM-9373 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: currently-failing > Time Spent: 0.5h > Remaining Estimate: 0h > > *[https://builds.apache.org/job/beam_PostCommit_PortableJar_Flink|https://builds.apache.org/job/beam_PostCommit_PortableJar_Flink] > > *[https://builds.apache.org/job/beam_PostCommit_PortableJar_Spark|https://builds.apache.org/job/beam_PostCommit_PortableJar_Spark] > *14:13:27* 1: Task failed with an exception.*14:13:27* ---*14:13:27* > * Where:*14:13:27* Script > '/home/jenkins/jenkins-slave/workspace/beam_PostCommit_PortableJar_Flink/src/runners/flink/job-server/flink_job_server.gradle' > line: 266*14:13:27* *14:13:27* * What went wrong:*14:13:27* Execution failed > for task ':runners:flink:1.9:job-server:testFlinkUberJarPy37'.*14:13:27* > > Could not find method $() for arguments > [flink_job_server_70sh27mqy1m8ejqzz2sd1kql1$_addTestFlinkUberJarPy_closure15$_closure28$_closure29$_closure30@230aeda1] > on object of type org.gradle.process.internal.DefaultExecAction_Decorated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9373) Flink/Spark portable jar tests failing
[ https://issues.apache.org/jira/browse/BEAM-9373?focusedWorklogId=392112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392112 ] ASF GitHub Bot logged work on BEAM-9373: Author: ASF GitHub Bot Created on: 24/Feb/20 21:39 Start Date: 24/Feb/20 21:39 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #10954: [BEAM-9373] Spark/Flink tests fix string concat URL: https://github.com/apache/beam/pull/10954 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark
[jira] [Work logged] (BEAM-9373) Flink/Spark portable jar tests failing
[ https://issues.apache.org/jira/browse/BEAM-9373?focusedWorklogId=392113&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392113 ] ASF GitHub Bot logged work on BEAM-9373: Author: ASF GitHub Bot Created on: 24/Feb/20 21:39 Start Date: 24/Feb/20 21:39 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10954: [BEAM-9373] Spark/Flink tests fix string concat URL: https://github.com/apache/beam/pull/10954#issuecomment-590562207 Run PortableJar_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392113) Time Spent: 20m (was: 10m) > Flink/Spark portable jar tests failing > -- > > Key: BEAM-9373 > URL: https://issues.apache.org/jira/browse/BEAM-9373 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: currently-failing > Time Spent: 20m > Remaining Estimate: 0h > > *[https://builds.apache.org/job/beam_PostCommit_PortableJar_Flink|https://builds.apache.org/job/beam_PostCommit_PortableJar_Flink] > > *[https://builds.apache.org/job/beam_PostCommit_PortableJar_Spark|https://builds.apache.org/job/beam_PostCommit_PortableJar_Spark] > *14:13:27* 1: Task failed with an exception.*14:13:27* ---*14:13:27* > * Where:*14:13:27* Script > '/home/jenkins/jenkins-slave/workspace/beam_PostCommit_PortableJar_Flink/src/runners/flink/job-server/flink_job_server.gradle' > line: 266*14:13:27* *14:13:27* * What went wrong:*14:13:27* Execution failed > for task ':runners:flink:1.9:job-server:testFlinkUberJarPy37'.*14:13:27* > > Could not find method $() for arguments > [flink_job_server_70sh27mqy1m8ejqzz2sd1kql1$_addTestFlinkUberJarPy_closure15$_closure28$_closure29$_closure30@230aeda1] > on object of type org.gradle.process.internal.DefaultExecAction_Decorated. -- This message was sent by Atlassian Jira (v8.3.4#803005)