[jira] [Work logged] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines
[ https://issues.apache.org/jira/browse/BEAM-7437?focusedWorklogId=275681&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275681 ] ASF GitHub Bot logged work on BEAM-7437: Author: ASF GitHub Bot Created on: 12/Jul/19 06:58 Start Date: 12/Jul/19 06:58 Worklog Time Spent: 10m Work Description: ttanay commented on pull request #9044: [BEAM-7437] Raise RuntimeError for PY2 in BigqueryFullResultStreamingMatcher URL: https://github.com/apache/beam/pull/9044#discussion_r302850735 ## File path: sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py ## @@ -192,7 +193,10 @@ def _get_query_result(self, bigquery_client): if len(response) >= len(self.expected_data): return response time.sleep(1) -raise TimeoutError('Timeout exceeded for matcher.') +if six.PY3: Review comment: Thanks @udim. I'll make the change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275681) Time Spent: 6h 50m (was: 6h 40m) > Integration Test for BQ streaming inserts for streaming pipelines > - > > Key: BEAM-7437 > URL: https://issues.apache.org/jira/browse/BEAM-7437 > Project: Beam > Issue Type: Test > Components: io-python-gcp >Affects Versions: 2.12.0 >Reporter: Tanay Tummalapalli >Assignee: Tanay Tummalapalli >Priority: Minor > Labels: test > Time Spent: 6h 50m > Remaining Estimate: 0h > > Integration Test for BigQuery Sink using Streaming Inserts for streaming > pipelines. > Integration tests currently exist for batch pipelines, it can also be added > for streaming pipelines using TestStream. This will be a precursor to the > failing integration test to be added for [BEAM-6611| > https://issues.apache.org/jira/browse/BEAM-6611]. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7079) Run Chicago Taxi Example on Dataflow
[ https://issues.apache.org/jira/browse/BEAM-7079?focusedWorklogId=275680&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275680 ] ASF GitHub Bot logged work on BEAM-7079: Author: ASF GitHub Bot Created on: 12/Jul/19 06:56 Start Date: 12/Jul/19 06:56 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #8939: [BEAM-7079] Add Chicago Taxi Example running on Dataflow URL: https://github.com/apache/beam/pull/8939#issuecomment-510771008 @angoenka Hi, could you take a look at this? CC: @pabloem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275680) Time Spent: 19h 40m (was: 19.5h) > Run Chicago Taxi Example on Dataflow > > > Key: BEAM-7079 > URL: https://issues.apache.org/jira/browse/BEAM-7079 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Michal Walenia >Assignee: Michal Walenia >Priority: Minor > Time Spent: 19h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6675) The JdbcIO sink should accept schemas
[ https://issues.apache.org/jira/browse/BEAM-6675?focusedWorklogId=275674&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275674 ] ASF GitHub Bot logged work on BEAM-6675: Author: ASF GitHub Bot Created on: 12/Jul/19 06:42 Start Date: 12/Jul/19 06:42 Worklog Time Spent: 10m Work Description: sehrish-vd commented on issue #8962: [BEAM-6675] Generate JDBC statement and preparedStatementSetter automatically when schema is available URL: https://github.com/apache/beam/pull/8962#issuecomment-510767335 @amaliujia can you please review the comments again? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275674) Time Spent: 5.5h (was: 5h 20m) > The JdbcIO sink should accept schemas > - > > Key: BEAM-6675 > URL: https://issues.apache.org/jira/browse/BEAM-6675 > Project: Beam > Issue Type: Sub-task > Components: io-java-jdbc >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > If the input has a schema, there should be a default mapping to a > PreparedStatement for writing based on that schema. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7386) Add Utility BiTemporalStreamJoin
[ https://issues.apache.org/jira/browse/BEAM-7386?focusedWorklogId=275673&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275673 ] ASF GitHub Bot logged work on BEAM-7386: Author: ASF GitHub Bot Created on: 12/Jul/19 06:40 Start Date: 12/Jul/19 06:40 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #9032: [BEAM-7386] Bi-Temporal Join URL: https://github.com/apache/beam/pull/9032#issuecomment-510766791 I am not seeing Java precommit tests are triggered. I am less familiar on how does that is enabled. Guessing have to set at here: https://github.com/apache/beam/blob/master/build.gradle#L132 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275673) Time Spent: 2h (was: 1h 50m) > Add Utility BiTemporalStreamJoin > > > Key: BEAM-7386 > URL: https://issues.apache.org/jira/browse/BEAM-7386 > Project: Beam > Issue Type: Improvement > Components: sdk-ideas >Affects Versions: 2.12.0 >Reporter: Reza ardeshir rokni >Assignee: Reza ardeshir rokni >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Add utility class that enables a temporal join between two streams where > Stream A is matched to Stream B where > A.timestamp = (max(b.timestamp) where b.timestamp <= a.timestamp) > This will use the following overall flow: > KV(key, Timestamped) > | Window > | GBK > | Statefull DoFn > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7386) Add Utility BiTemporalStreamJoin
[ https://issues.apache.org/jira/browse/BEAM-7386?focusedWorklogId=275663&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275663 ] ASF GitHub Bot logged work on BEAM-7386: Author: ASF GitHub Bot Created on: 12/Jul/19 06:31 Start Date: 12/Jul/19 06:31 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #9032: [BEAM-7386] Bi-Temporal Join URL: https://github.com/apache/beam/pull/9032#discussion_r302838285 ## File path: sdks/java/extensions/timeseries/src/main/java/org/apache/beam/sdk/extensions/timeseries/joins/BiStreamJoinResultCoder.java ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.timeseries.joins; + +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.util.Arrays; +import java.util.List; +import org.apache.beam.sdk.coders.*; +import org.apache.beam.sdk.values.TimestampedValue; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.sdk.values.TypeParameter; + +public class BiStreamJoinResultCoder +extends StructuredCoder> { + + private final Coder leftCoder; + private final Coder rightCoder; + private final Coder keyCoder; + + public BiStreamJoinResultCoder(Coder keyCoder, Coder leftCoder, Coder rightCoder) { +this.leftCoder = NullableCoder.of(leftCoder); +this.rightCoder = NullableCoder.of(rightCoder); +this.keyCoder = NullableCoder.of(keyCoder); + } + + public Coder getLeftCoder() { +return leftCoder; + } + + public Coder getRightCoder() { +return rightCoder; + } + + public Coder getKeyCoder() { +return keyCoder; + } + + public static BiStreamJoinResultCoder of( + Coder keyCoder, Coder leftCoder, Coder rightCoder) { + +return new BiStreamJoinResultCoder<>(keyCoder, leftCoder, rightCoder); + } + + @Override + public void encode(BiTemporalJoinResult value, OutputStream outStream) + throws IOException { + +if (value == null) { Review comment: key, left and right coders are already `NullableCoder` which tolerates `NULL`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275663) Time Spent: 1.5h (was: 1h 20m) > Add Utility BiTemporalStreamJoin > > > Key: BEAM-7386 > URL: https://issues.apache.org/jira/browse/BEAM-7386 > Project: Beam > Issue Type: Improvement > Components: sdk-ideas >Affects Versions: 2.12.0 >Reporter: Reza ardeshir rokni >Assignee: Reza ardeshir rokni >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Add utility class that enables a temporal join between two streams where > Stream A is matched to Stream B where > A.timestamp = (max(b.timestamp) where b.timestamp <= a.timestamp) > This will use the following overall flow: > KV(key, Timestamped) > | Window > | GBK > | Statefull DoFn > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7386) Add Utility BiTemporalStreamJoin
[ https://issues.apache.org/jira/browse/BEAM-7386?focusedWorklogId=275667&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275667 ] ASF GitHub Bot logged work on BEAM-7386: Author: ASF GitHub Bot Created on: 12/Jul/19 06:31 Start Date: 12/Jul/19 06:31 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #9032: [BEAM-7386] Bi-Temporal Join URL: https://github.com/apache/beam/pull/9032#issuecomment-510764741 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275667) Time Spent: 1h 50m (was: 1h 40m) > Add Utility BiTemporalStreamJoin > > > Key: BEAM-7386 > URL: https://issues.apache.org/jira/browse/BEAM-7386 > Project: Beam > Issue Type: Improvement > Components: sdk-ideas >Affects Versions: 2.12.0 >Reporter: Reza ardeshir rokni >Assignee: Reza ardeshir rokni >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Add utility class that enables a temporal join between two streams where > Stream A is matched to Stream B where > A.timestamp = (max(b.timestamp) where b.timestamp <= a.timestamp) > This will use the following overall flow: > KV(key, Timestamped) > | Window > | GBK > | Statefull DoFn > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7386) Add Utility BiTemporalStreamJoin
[ https://issues.apache.org/jira/browse/BEAM-7386?focusedWorklogId=275665&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275665 ] ASF GitHub Bot logged work on BEAM-7386: Author: ASF GitHub Bot Created on: 12/Jul/19 06:31 Start Date: 12/Jul/19 06:31 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #9032: [BEAM-7386] Bi-Temporal Join URL: https://github.com/apache/beam/pull/9032#discussion_r302843754 ## File path: sdks/java/extensions/timeseries/src/test/java/org/apache/beam/sdk/extensions/timeseries/joins/BiTemporalCacheTestIT.java ## @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.timeseries.joins; + +import java.io.Serializable; +import org.apache.beam.sdk.extensions.timeseries.joins.BiTemporalTestUtils.QuoteData; +import org.apache.beam.sdk.extensions.timeseries.joins.BiTemporalTestUtils.TradeData; +import org.apache.beam.sdk.testing.NeedsRunner; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.Sum; +import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TupleTag; +import org.joda.time.Duration; +import org.joda.time.Instant; +import org.junit.Rule; +import org.junit.Test; +import org.junit.experimental.categories.Category; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class BiTemporalCacheTestIT implements Serializable { Review comment: How does this IT run? Will `./gradlew test` trigger it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275665) > Add Utility BiTemporalStreamJoin > > > Key: BEAM-7386 > URL: https://issues.apache.org/jira/browse/BEAM-7386 > Project: Beam > Issue Type: Improvement > Components: sdk-ideas >Affects Versions: 2.12.0 >Reporter: Reza ardeshir rokni >Assignee: Reza ardeshir rokni >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Add utility class that enables a temporal join between two streams where > Stream A is matched to Stream B where > A.timestamp = (max(b.timestamp) where b.timestamp <= a.timestamp) > This will use the following overall flow: > KV(key, Timestamped) > | Window > | GBK > | Statefull DoFn > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7386) Add Utility BiTemporalStreamJoin
[ https://issues.apache.org/jira/browse/BEAM-7386?focusedWorklogId=275666&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275666 ] ASF GitHub Bot logged work on BEAM-7386: Author: ASF GitHub Bot Created on: 12/Jul/19 06:31 Start Date: 12/Jul/19 06:31 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #9032: [BEAM-7386] Bi-Temporal Join URL: https://github.com/apache/beam/pull/9032#discussion_r302798339 ## File path: sdks/java/extensions/timeseries/build.gradle ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +plugins { id 'org.apache.beam.module' } +applyJavaNature() + +description = "Apache Beam :: SDKs :: Java :: Extensions :: Timeseries" +ext.summary = """Beam TIMESERIES provides helper utils to deal with timeseries processing""" + +dependencies { + compile library.java.guava + compile project(path: ":sdks:java:core", configuration: "shadow") + // Needed to run the Example. + compile project(path: ":runners:direct-java") + testCompile library.java.hamcrest_core + testCompile library.java.hamcrest_library + testCompile library.java.junit + //testRuntimeOnly project(path: ":runners:direct-java") Review comment: remove this line? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275666) Time Spent: 1h 40m (was: 1.5h) > Add Utility BiTemporalStreamJoin > > > Key: BEAM-7386 > URL: https://issues.apache.org/jira/browse/BEAM-7386 > Project: Beam > Issue Type: Improvement > Components: sdk-ideas >Affects Versions: 2.12.0 >Reporter: Reza ardeshir rokni >Assignee: Reza ardeshir rokni >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > Add utility class that enables a temporal join between two streams where > Stream A is matched to Stream B where > A.timestamp = (max(b.timestamp) where b.timestamp <= a.timestamp) > This will use the following overall flow: > KV(key, Timestamped) > | Window > | GBK > | Statefull DoFn > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7386) Add Utility BiTemporalStreamJoin
[ https://issues.apache.org/jira/browse/BEAM-7386?focusedWorklogId=275664&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275664 ] ASF GitHub Bot logged work on BEAM-7386: Author: ASF GitHub Bot Created on: 12/Jul/19 06:31 Start Date: 12/Jul/19 06:31 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #9032: [BEAM-7386] Bi-Temporal Join URL: https://github.com/apache/beam/pull/9032#discussion_r302843161 ## File path: sdks/java/extensions/timeseries/src/main/java/org/apache/beam/sdk/extensions/timeseries/joins/BiStreamJoinResultCoder.java ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.timeseries.joins; + +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.util.Arrays; +import java.util.List; +import org.apache.beam.sdk.coders.*; +import org.apache.beam.sdk.values.TimestampedValue; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.sdk.values.TypeParameter; + +public class BiStreamJoinResultCoder Review comment: `BiStreamJoinResultCoder` and `BiTemporalJoinResultCoder` are duplicates? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275664) Time Spent: 1.5h (was: 1h 20m) > Add Utility BiTemporalStreamJoin > > > Key: BEAM-7386 > URL: https://issues.apache.org/jira/browse/BEAM-7386 > Project: Beam > Issue Type: Improvement > Components: sdk-ideas >Affects Versions: 2.12.0 >Reporter: Reza ardeshir rokni >Assignee: Reza ardeshir rokni >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Add utility class that enables a temporal join between two streams where > Stream A is matched to Stream B where > A.timestamp = (max(b.timestamp) where b.timestamp <= a.timestamp) > This will use the following overall flow: > KV(key, Timestamped) > | Window > | GBK > | Statefull DoFn > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-3595) Normalize URNs across SDKs and runners.
[ https://issues.apache.org/jira/browse/BEAM-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883566#comment-16883566 ] Chad Dombrova commented on BEAM-3595: - I see that this is marked as fixed, but when using v2.13.0 of the python sdk I get bad a bad urn for pardo: "urn:beam:transform:pardo:v1" (also, there's a least a dozen mentions of this ticket in the source code). In order to read my pipeline back into Java I have to remove the urn: prefix. The urns for other transforms don't include the urn: prefix. Here's a bad pardo message: {noformat} transforms { key: "ref_AppliedPTransform_Sleep_4" value { spec { urn: "urn:beam:transform:pardo:v1" payload: "\n\204\002\n\201\002\n beam:dofn:pickled_python_info:v1\032\3Jj0F4" } inputs { key: "0" value: "ref_PCollection_PCollection_1" } outputs { key: "None" value: "ref_PCollection_PCollection_2" } unique_name: "Sleep" } } {noformat} Additionally, the coders seem to be authored incorrectly, with a double nested spec: {noformat} coders { key: "ref_Coder_VarIntCoder_1" value { spec { spec { urn: "beam:coder:varint:v1" } } } } {noformat} I have to remove one level of the spec to get it to read properly in Java. To be clear, the issue that I'm describing occurs when I manually write and read the pipeline via protobufs: everything works fine when I submit using `pipe.run()`. I surmise that there is some function in the python sdk that fixes up the pipeline message before sending it, otherwise I assume it _would_ be broken on receipt. Is that correct? > Normalize URNs across SDKs and runners. > --- > > Key: BEAM-3595 > URL: https://issues.apache.org/jira/browse/BEAM-3595 > Project: Beam > Issue Type: Bug > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Eugene Kirpichov >Priority: Major > Fix For: 2.5.0 > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7467) Gearpump Quickstart fails, java.lang.NoClassDefFoundError: com/gs/collections/api/block/procedure/Procedure
[ https://issues.apache.org/jira/browse/BEAM-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883496#comment-16883496 ] Manu Zhang commented on BEAM-7467: -- [~lcwik], can we close now ? > Gearpump Quickstart fails, java.lang.NoClassDefFoundError: > com/gs/collections/api/block/procedure/Procedure > --- > > Key: BEAM-7467 > URL: https://issues.apache.org/jira/browse/BEAM-7467 > Project: Beam > Issue Type: Bug > Components: runner-gearpump >Affects Versions: 2.13.0 >Reporter: Luke Cwik >Assignee: Manu Zhang >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > After generating the archetype for the 2.13.0 RC2, the following quick start > command fails: > {code:java} > mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount > -Dexec.args="--inputFile=pom.xml --output=counts --runner=GearpumpRunner" > -Pgearpump-runner{code} > I also tried: > {code:java} > mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount > -Dexec.args="--inputFile=pom.xml --output=counts --runner=GearpumpRunner" > -Pgearpump-runner{code} > Log: > {code:java} > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/local/google/home/lcwik/.m2/repository/org/slf4j/slf4j-jdk14/1.7.25/slf4j-jdk14-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/local/google/home/lcwik/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.JDK14LoggerFactory] > May 31, 2019 9:38:26 AM akka.event.slf4j.Slf4jLogger$$anonfun$receive$1 > applyOrElse > INFO: Slf4jLogger started > May 31, 2019 9:38:26 AM akka.event.slf4j.Slf4jLogger$$anonfun$receive$1 > $anonfun$applyOrElse$3 > INFO: Starting remoting > May 31, 2019 9:38:26 AM akka.event.slf4j.Slf4jLogger$$anonfun$receive$1 > $anonfun$applyOrElse$3 > INFO: Remoting started; listening on addresses > :[akka.tcp://client1820912745@127.0.0.1:38849] > May 31, 2019 9:38:26 AM io.gearpump.metrics.Metrics$ createExtension > INFO: Metrics is enabled..., false > May 31, 2019 9:38:26 AM io.gearpump.cluster.master.MasterProxy > INFO: Master Proxy is started... > [WARNING] > java.lang.NoClassDefFoundError: > com/gs/collections/api/block/procedure/Procedure > at io.gearpump.streaming.dsl.plan.WindowOp.chain (OP.scala:273) > at io.gearpump.streaming.dsl.plan.Planner.merge (Planner.scala:86) > at io.gearpump.streaming.dsl.plan.Planner.$anonfun$optimize$2 > (Planner.scala:71) > at io.gearpump.streaming.dsl.plan.Planner.$anonfun$optimize$2$adapted > (Planner.scala:70) > at scala.collection.mutable.HashSet.foreach (HashSet.scala:77) > at io.gearpump.streaming.dsl.plan.Planner.$anonfun$optimize$1 > (Planner.scala:70) > at io.gearpump.streaming.dsl.plan.Planner.$anonfun$optimize$1$adapted > (Planner.scala:68) > at scala.collection.immutable.List.foreach (List.scala:388) > at io.gearpump.streaming.dsl.plan.Planner.optimize (Planner.scala:68) > at io.gearpump.streaming.dsl.plan.Planner.plan (Planner.scala:48) > at io.gearpump.streaming.dsl.scalaapi.StreamApp.plan (StreamApp.scala:59) > at io.gearpump.streaming.dsl.scalaapi.StreamApp$.streamAppToApplication > (StreamApp.scala:82) > at io.gearpump.streaming.dsl.javaapi.JavaStreamApp.submit > (JavaStreamApp.scala:44) > at org.apache.beam.runners.gearpump.GearpumpRunner.run > (GearpumpRunner.java:83) > at org.apache.beam.runners.gearpump.GearpumpRunner.run > (GearpumpRunner.java:44) > at org.apache.beam.sdk.Pipeline.run (Pipeline.java:313) > at org.apache.beam.sdk.Pipeline.run (Pipeline.java:299) > at org.apache.beam.examples.WordCount.runWordCount (WordCount.java:185) > at org.apache.beam.examples.WordCount.main (WordCount.java:192) > at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke (Method.java:498) > at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282) > at java.lang.Thread.run (Thread.java:748) > Caused by: java.lang.ClassNotFoundException: > com.gs.collections.api.block.procedure.Procedure > at java.net.URLClassLoader.findClass (URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass (ClassLoader.java:424) > at java.lang.ClassLoader.loadClass (ClassLoader.java:357) > at io.gearpump.streaming.dsl.plan.WindowOp.chain (OP.scala:273) > at io.gearpump.streaming.
[jira] [Work logged] (BEAM-7719) Ensure that publishing vendored artifacts first validates there contents
[ https://issues.apache.org/jira/browse/BEAM-7719?focusedWorklogId=275627&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275627 ] ASF GitHub Bot logged work on BEAM-7719: Author: ASF GitHub Bot Created on: 12/Jul/19 02:56 Start Date: 12/Jul/19 02:56 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #9036: [BEAM-7719] Ensure that publishing vendored artifacts checks the contents of the jar before publishing. URL: https://github.com/apache/beam/pull/9036 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275627) Time Spent: 1h 20m (was: 1h 10m) > Ensure that publishing vendored artifacts first validates there contents > > > Key: BEAM-7719 > URL: https://issues.apache.org/jira/browse/BEAM-7719 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > During the release of vendored guava 26.0, it was discovered that we don't > check the contents of the jars automatically. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-4948) Beam Dependency Update Request: com.google.guava
[ https://issues.apache.org/jira/browse/BEAM-4948?focusedWorklogId=275628&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275628 ] ASF GitHub Bot logged work on BEAM-4948: Author: ASF GitHub Bot Created on: 12/Jul/19 02:56 Start Date: 12/Jul/19 02:56 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #9038: [BEAM-4948, BEAM-6267, BEAM-5559, BEAM-7289] Fix shading of vendored guava to exclude classes from transitive dependencies which aren't needed at runtime URL: https://github.com/apache/beam/pull/9038 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275628) Time Spent: 2h (was: 1h 50m) > Beam Dependency Update Request: com.google.guava > > > Key: BEAM-4948 > URL: https://issues.apache.org/jira/browse/BEAM-4948 > Project: Beam > Issue Type: Bug > Components: dependencies >Reporter: Beam JIRA Bot >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > 2018-07-25 20:28:03.628639 > Please review and upgrade the com.google.guava to the latest version > None > > cc: -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7730) Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9
[ https://issues.apache.org/jira/browse/BEAM-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883483#comment-16883483 ] sunjincheng commented on BEAM-7730: --- Hi [~mxm] I have been seen that the patch of Flink 1.7, 1.8 are contribution from you, I think you have a lot of experience in this area, I am interested to contribute beam on this ticket, So I appreciate it if you can give me some suggestions and guides. :) > Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9 > - > > Key: BEAM-7730 > URL: https://issues.apache.org/jira/browse/BEAM-7730 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.14.0 > > > Apache Flink 1.9 will coming and it's better to add Flink 1.9 build target > and make Flink Runner compatible with Flink 1.9. > I will add the brief changes after the Flink 1.9.0 released. > And I appreciate it if you can leave your suggestions or comments! -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7730) Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9
[ https://issues.apache.org/jira/browse/BEAM-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated BEAM-7730: -- Description: Apache Flink 1.9 will coming and it's better to add Flink 1.9 build target and make Flink Runner compatible with Flink 1.9. I will add the brief changes after the Flink 1.9.0 released. And I appreciate it if you can leave your suggestions or comments! was: Apache Flink 1.9 will coming and it's better to make Flink Runner compatible with Flink 1.9. I will add the brief changes after the Flink 1.9.0 released. And I appreciate it if you can leave your suggestions or comments! > Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9 > - > > Key: BEAM-7730 > URL: https://issues.apache.org/jira/browse/BEAM-7730 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.14.0 > > > Apache Flink 1.9 will coming and it's better to add Flink 1.9 build target > and make Flink Runner compatible with Flink 1.9. > I will add the brief changes after the Flink 1.9.0 released. > And I appreciate it if you can leave your suggestions or comments! -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7730) Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9
[ https://issues.apache.org/jira/browse/BEAM-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated BEAM-7730: -- Summary: Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9 (was: Make FlinkRunner compatible with Flink 1.9) > Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9 > - > > Key: BEAM-7730 > URL: https://issues.apache.org/jira/browse/BEAM-7730 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.14.0 > > > Apache Flink 1.9 will coming and it's better to make Flink Runner compatible > with Flink 1.9. > I will add the brief changes after the Flink 1.9.0 released. > And I appreciate it if you can leave your suggestions or comments! -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7730) Make FlinkRunner compatible with Flink 1.9
sunjincheng created BEAM-7730: - Summary: Make FlinkRunner compatible with Flink 1.9 Key: BEAM-7730 URL: https://issues.apache.org/jira/browse/BEAM-7730 Project: Beam Issue Type: New Feature Components: runner-flink Reporter: sunjincheng Assignee: sunjincheng Fix For: 2.14.0 Apache Flink 1.9 will coming and it's better to make Flink Runner compatible with Flink 1.9. I will add the brief changes after the Flink 1.9.0 released. And I appreciate it if you can leave your suggestions or comments! -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7689) Temporary directory for WriteOperation may not be unique in FileBaseSink
[ https://issues.apache.org/jira/browse/BEAM-7689?focusedWorklogId=275626&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275626 ] ASF GitHub Bot logged work on BEAM-7689: Author: ASF GitHub Bot Created on: 12/Jul/19 02:20 Start Date: 12/Jul/19 02:20 Worklog Time Spent: 10m Work Description: akedin commented on issue #9039: [BEAM-7689] cherry-picking for 2.14.0 URL: https://github.com/apache/beam/pull/9039#issuecomment-510718962 It doesn't look like a flake. It fails consistently and looking at the tests seem to utilize TextIO so this change to FileBasedSink seems relevant This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275626) Time Spent: 2h 50m (was: 2h 40m) > Temporary directory for WriteOperation may not be unique in FileBaseSink > > > Key: BEAM-7689 > URL: https://issues.apache.org/jira/browse/BEAM-7689 > Project: Beam > Issue Type: Bug > Components: io-java-files >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Blocker > Fix For: 2.7.1, 2.14.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Temporary directory for WriteOperation in FileBasedSink is generated from a > second-granularity timestamp (-MM-dd_HH-mm-ss) and unique increasing > index. Such granularity is not enough to make temporary directories unique > between different jobs. When two jobs share the same temporary directory, > output file may not be produced in one job since the required temporary file > can be deleted from another job. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7546) Portable WordCount-on-Flink Precommit is flaky - temporary folder not found.
[ https://issues.apache.org/jira/browse/BEAM-7546?focusedWorklogId=275616&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275616 ] ASF GitHub Bot logged work on BEAM-7546: Author: ASF GitHub Bot Created on: 12/Jul/19 01:56 Start Date: 12/Jul/19 01:56 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9046: [BEAM-7546] Increasing environment cache to avoid chances of recreati… URL: https://github.com/apache/beam/pull/9046#issuecomment-510714732 LGTM, thank you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275616) Time Spent: 0.5h (was: 20m) > Portable WordCount-on-Flink Precommit is flaky - temporary folder not found. > > > Key: BEAM-7546 > URL: https://issues.apache.org/jira/browse/BEAM-7546 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Valentyn Tymofieiev >Assignee: Ankur Goenka >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > On a few occasions I see this test fail due to a temp directory being missing. > Sample scan from > https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/745/: > https://scans.gradle.com/s/3ra4xw4hqvlyw/console-log?task=:sdks:python:portableWordCountBatch > {noformat} > [grpc-default-executor-0] ERROR sdk_worker._execute - Error processing > instruction 8. Original traceback is > Traceback (most recent call last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 157, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 190, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 342, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 368, in process_bundle > bundle_processor.process_bundle(instruction_id)) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 589, in process_bundle > ].process_encoded(data.data) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 143, in process_encoded > self.output(decoded_value) > File "apache_beam/runners/worker/operations.py", line 255, in > apache_beam.runners.worker.operations.Operation.output > def output(self, windowed_value, output_index=0): > File "apache_beam/runners/worker/operations.py", line 256, in > apache_beam.runners.worker.operations.Operation.output > cython.cast(Receiver, > self.receivers[output_index]).receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 143, in > apache_beam.runners.worker.operations.SingletonConsumerSet.receive > self.consumer.process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 593, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py", line 594, in > apache_beam.runners.worker.operations.DoOperation.process > delayed_application = self.dofn_receiver.receive(o) > File "apache_beam/runners/common.py", line 778, in > apache_beam.runners.common.DoFnRun > ner.receive > self.process(windowed_value) > File "apache_beam/runners/common.py", line 784, in > apache_beam.runners.common.DoFnRunner.process > self._reraise_augmented(exn) > File "apache_beam/runners/common.py", line 851, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > raise_with_traceback(new_exn) > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > return self.do_fn_invoker.invoke_process(windowed_value) > File "apache_beam/runners/common.py", line 594, in > apache_beam.runners.common.PerWindowInvoker.invoke_process > self._invoke_process_per_window( > File "apache_beam/runners/common.py", line 666, in > apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window > windowed_value, self.process_method(*args_for_process)) > File "/usr/local/lib/python2.7/site-packages/apache_beam/io/iobase.py", > line 1041, in process > self.writer = self.sink.ope
[jira] [Work logged] (BEAM-7546) Portable WordCount-on-Flink Precommit is flaky - temporary folder not found.
[ https://issues.apache.org/jira/browse/BEAM-7546?focusedWorklogId=275617&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275617 ] ASF GitHub Bot logged work on BEAM-7546: Author: ASF GitHub Bot Created on: 12/Jul/19 01:56 Start Date: 12/Jul/19 01:56 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9046: [BEAM-7546] Increasing environment cache to avoid chances of recreati… URL: https://github.com/apache/beam/pull/9046#issuecomment-510714835 cc: @udim This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275617) Time Spent: 40m (was: 0.5h) > Portable WordCount-on-Flink Precommit is flaky - temporary folder not found. > > > Key: BEAM-7546 > URL: https://issues.apache.org/jira/browse/BEAM-7546 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Valentyn Tymofieiev >Assignee: Ankur Goenka >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > On a few occasions I see this test fail due to a temp directory being missing. > Sample scan from > https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/745/: > https://scans.gradle.com/s/3ra4xw4hqvlyw/console-log?task=:sdks:python:portableWordCountBatch > {noformat} > [grpc-default-executor-0] ERROR sdk_worker._execute - Error processing > instruction 8. Original traceback is > Traceback (most recent call last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 157, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 190, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 342, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 368, in process_bundle > bundle_processor.process_bundle(instruction_id)) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 589, in process_bundle > ].process_encoded(data.data) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 143, in process_encoded > self.output(decoded_value) > File "apache_beam/runners/worker/operations.py", line 255, in > apache_beam.runners.worker.operations.Operation.output > def output(self, windowed_value, output_index=0): > File "apache_beam/runners/worker/operations.py", line 256, in > apache_beam.runners.worker.operations.Operation.output > cython.cast(Receiver, > self.receivers[output_index]).receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 143, in > apache_beam.runners.worker.operations.SingletonConsumerSet.receive > self.consumer.process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 593, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py", line 594, in > apache_beam.runners.worker.operations.DoOperation.process > delayed_application = self.dofn_receiver.receive(o) > File "apache_beam/runners/common.py", line 778, in > apache_beam.runners.common.DoFnRun > ner.receive > self.process(windowed_value) > File "apache_beam/runners/common.py", line 784, in > apache_beam.runners.common.DoFnRunner.process > self._reraise_augmented(exn) > File "apache_beam/runners/common.py", line 851, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > raise_with_traceback(new_exn) > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > return self.do_fn_invoker.invoke_process(windowed_value) > File "apache_beam/runners/common.py", line 594, in > apache_beam.runners.common.PerWindowInvoker.invoke_process > self._invoke_process_per_window( > File "apache_beam/runners/common.py", line 666, in > apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window > windowed_value, self.process_method(*args_for_process)) > File "/usr/local/lib/python2.7/site-packages/apache_beam/io/iobase.py", > line 1041, in process > self.writer = self.sink.open_writ
[jira] [Commented] (BEAM-2264) Re-use credential instead of generating a new one one each GCS call
[ https://issues.apache.org/jira/browse/BEAM-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883456#comment-16883456 ] Udi Meiri commented on BEAM-2264: - In https://issues.apache.org/jira/browse/BEAM-3990 we found out that the storage client might not be thread safe. Perhaps though we can reuse the credentials instead of recreating them every time. > Re-use credential instead of generating a new one one each GCS call > --- > > Key: BEAM-2264 > URL: https://issues.apache.org/jira/browse/BEAM-2264 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Luke Cwik >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > We should cache the credential used within a Pipeline and re-use it instead > of generating a new one on each GCS call. When executing (against 2.0.0 RC2): > {code} > python -m apache_beam.examples.wordcount --input > "gs://dataflow-samples/shakespeare/*" --output local_counts > {code} > Note that we seemingly generate a new access token each time instead of when > a refresh is required. > {code} > super(GcsIO, cls).__new__(cls, storage_client)) > INFO:root:Starting the size estimation of the input > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:root:Finished the size estimation of the input at 1 files. Estimation > took 0.286200046539 seconds > INFO:root:Running pipeline with DirectRunner. > INFO:root:Starting the size estimation of the input > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:root:Finished the size estimation of the input at 43 files. Estimation > took 0.205624818802 seconds > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > ... many more times ... > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7546) Portable WordCount-on-Flink Precommit is flaky - temporary folder not found.
[ https://issues.apache.org/jira/browse/BEAM-7546?focusedWorklogId=275615&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275615 ] ASF GitHub Bot logged work on BEAM-7546: Author: ASF GitHub Bot Created on: 12/Jul/19 01:39 Start Date: 12/Jul/19 01:39 Worklog Time Spent: 10m Work Description: angoenka commented on issue #9046: [BEAM-7546] Increasing environment cache to avoid chances of recreati… URL: https://github.com/apache/beam/pull/9046#issuecomment-510711883 R: @tvalentyn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275615) Time Spent: 20m (was: 10m) > Portable WordCount-on-Flink Precommit is flaky - temporary folder not found. > > > Key: BEAM-7546 > URL: https://issues.apache.org/jira/browse/BEAM-7546 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Valentyn Tymofieiev >Assignee: Ankur Goenka >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > On a few occasions I see this test fail due to a temp directory being missing. > Sample scan from > https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/745/: > https://scans.gradle.com/s/3ra4xw4hqvlyw/console-log?task=:sdks:python:portableWordCountBatch > {noformat} > [grpc-default-executor-0] ERROR sdk_worker._execute - Error processing > instruction 8. Original traceback is > Traceback (most recent call last): > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 157, in _execute > response = task() > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 190, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 342, in do_instruction > request.instruction_id) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 368, in process_bundle > bundle_processor.process_bundle(instruction_id)) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 589, in process_bundle > ].process_encoded(data.data) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 143, in process_encoded > self.output(decoded_value) > File "apache_beam/runners/worker/operations.py", line 255, in > apache_beam.runners.worker.operations.Operation.output > def output(self, windowed_value, output_index=0): > File "apache_beam/runners/worker/operations.py", line 256, in > apache_beam.runners.worker.operations.Operation.output > cython.cast(Receiver, > self.receivers[output_index]).receive(windowed_value) > File "apache_beam/runners/worker/operations.py", line 143, in > apache_beam.runners.worker.operations.SingletonConsumerSet.receive > self.consumer.process(windowed_value) > File "apache_beam/runners/worker/operations.py", line 593, in > apache_beam.runners.worker.operations.DoOperation.process > with self.scoped_process_state: > File "apache_beam/runners/worker/operations.py", line 594, in > apache_beam.runners.worker.operations.DoOperation.process > delayed_application = self.dofn_receiver.receive(o) > File "apache_beam/runners/common.py", line 778, in > apache_beam.runners.common.DoFnRun > ner.receive > self.process(windowed_value) > File "apache_beam/runners/common.py", line 784, in > apache_beam.runners.common.DoFnRunner.process > self._reraise_augmented(exn) > File "apache_beam/runners/common.py", line 851, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > raise_with_traceback(new_exn) > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > return self.do_fn_invoker.invoke_process(windowed_value) > File "apache_beam/runners/common.py", line 594, in > apache_beam.runners.common.PerWindowInvoker.invoke_process > self._invoke_process_per_window( > File "apache_beam/runners/common.py", line 666, in > apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window > windowed_value, self.process_method(*args_for_process)) > File "/usr/local/lib/python2.7/site-packages/apache_beam/io/iobase.py", > line 1041, in process > self.writer = self.sink.open_wr
[jira] [Work logged] (BEAM-7546) Portable WordCount-on-Flink Precommit is flaky - temporary folder not found.
[ https://issues.apache.org/jira/browse/BEAM-7546?focusedWorklogId=275614&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275614 ] ASF GitHub Bot logged work on BEAM-7546: Author: ASF GitHub Bot Created on: 12/Jul/19 01:38 Start Date: 12/Jul/19 01:38 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #9046: [BEAM-7546] Increasing environment cache to avoid chances of recreati… URL: https://github.com/apache/beam/pull/9046 …ng the environment **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job
[jira] [Closed] (BEAM-3990) Dataflow jobs fail with "KeyError: 'location'" when uploading to GCS
[ https://issues.apache.org/jira/browse/BEAM-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri closed BEAM-3990. --- Resolution: Fixed Fix Version/s: Not applicable > Dataflow jobs fail with "KeyError: 'location'" when uploading to GCS > > > Key: BEAM-3990 > URL: https://issues.apache.org/jira/browse/BEAM-3990 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Chamikara Jayalath >Priority: Major > Fix For: Not applicable > > Time Spent: 50m > Remaining Estimate: 0h > > Some Dataflow jobs are failing due to following error (in worker logs). > > Error in _start_upload while inserting file > gs://cloud-ml-benchmark-output-us-central/df1-cloudml-benchmark-criteo-small-python-033010274088282-presubmit3/033010274088282/temp/df1-cloudml-benchmark-criteo-small-python-033010274088282-presubmit3.1522430898.446147/dax-tmp-2018-03-30_10_28_40-14595186994726940229-S241-1-dc87ef69274882bf/tmp-dc87ef6927488c5a-shard--try-308ae8b3268d12b2-endshard.avro: > Traceback (most recent call last): File > "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/gcsio.py", line > 559, in _start_upload self._client.objects.Insert(self._insert_request, > upload=self._upload) File > "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py", > line 971, in Insert download=download) File > "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line > 706, in _RunMethod http_request, client=self.client) File > "/usr/local/lib/python2.7/dist-packages/apitools/base/py/transfer.py", line > 860, in InitializeUpload url = > [http_response.info|https://www.google.com/url?q=http://http_response.info&sa=D&usg=AFQjCNGvYHYJBb_G4YNo3VvGoqX2Gq-6Yw]['location'] > KeyError: 'location' > > This seems to be due to [https://github.com/apache/beam/pull/4891.] Possibly > storage.StorageV1() cannot be shared across multiple requests without > additional fixes. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7689) Temporary directory for WriteOperation may not be unique in FileBaseSink
[ https://issues.apache.org/jira/browse/BEAM-7689?focusedWorklogId=275606&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275606 ] ASF GitHub Bot logged work on BEAM-7689: Author: ASF GitHub Bot Created on: 12/Jul/19 01:21 Start Date: 12/Jul/19 01:21 Worklog Time Spent: 10m Work Description: ihji commented on issue #9039: [BEAM-7689] cherry-picking for 2.14.0 URL: https://github.com/apache/beam/pull/9039#issuecomment-510708727 Run Java_Examples_Dataflow PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275606) Time Spent: 2h 40m (was: 2.5h) > Temporary directory for WriteOperation may not be unique in FileBaseSink > > > Key: BEAM-7689 > URL: https://issues.apache.org/jira/browse/BEAM-7689 > Project: Beam > Issue Type: Bug > Components: io-java-files >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Blocker > Fix For: 2.7.1, 2.14.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Temporary directory for WriteOperation in FileBasedSink is generated from a > second-granularity timestamp (-MM-dd_HH-mm-ss) and unique increasing > index. Such granularity is not enough to make temporary directories unique > between different jobs. When two jobs share the same temporary directory, > output file may not be produced in one job since the required temporary file > can be deleted from another job. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7680) synthetic_pipeline_test.py flaky
[ https://issues.apache.org/jira/browse/BEAM-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883424#comment-16883424 ] Chamikara Jayalath commented on BEAM-7680: -- Sorry, this is from code I wrote several years back and I lost context. I think comment was about test being time based on general (and hence there's a possibility of flakes). > synthetic_pipeline_test.py flaky > > > Key: BEAM-7680 > URL: https://issues.apache.org/jira/browse/BEAM-7680 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Udi Meiri >Assignee: Kasia Kucharczyk >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > {code:java} > 11:51:43 FAIL: testSyntheticSDFStep > (apache_beam.testing.synthetic_pipeline_test.SyntheticPipelineTest) > 11:51:43 > -- > 11:51:43 Traceback (most recent call last): > 11:51:43 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/synthetic_pipeline_test.py", > line 82, in testSyntheticSDFStep > 11:51:43 self.assertTrue(0.5 <= elapsed <= 3, elapsed) > 11:51:43 AssertionError: False is not true : 3.659700632095337{code} > [https://builds.apache.org/job/beam_PreCommit_Python_Cron/1502/consoleFull] > > Two flaky TODOs: > [https://github.com/apache/beam/blob/b79f24ced1c8519c29443ea7109c59ad18be2ebe/sdks/python/apache_beam/testing/synthetic_pipeline_test.py#L69-L82] -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines
[ https://issues.apache.org/jira/browse/BEAM-7437?focusedWorklogId=275561&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275561 ] ASF GitHub Bot logged work on BEAM-7437: Author: ASF GitHub Bot Created on: 12/Jul/19 00:11 Start Date: 12/Jul/19 00:11 Worklog Time Spent: 10m Work Description: udim commented on pull request #9044: [BEAM-7437] Raise RuntimeError for PY2 in BigqueryFullResultStreamingMatcher URL: https://github.com/apache/beam/pull/9044#discussion_r302787083 ## File path: sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py ## @@ -192,7 +193,10 @@ def _get_query_result(self, bigquery_client): if len(response) >= len(self.expected_data): return response time.sleep(1) -raise TimeoutError('Timeout exceeded for matcher.') +if six.PY3: Review comment: We've decided against using `six` in the Beam codebase. You can check for Python version using this code instead: ```suggestion if sys.version_info >= (3,): ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275561) Time Spent: 6h 40m (was: 6.5h) > Integration Test for BQ streaming inserts for streaming pipelines > - > > Key: BEAM-7437 > URL: https://issues.apache.org/jira/browse/BEAM-7437 > Project: Beam > Issue Type: Test > Components: io-python-gcp >Affects Versions: 2.12.0 >Reporter: Tanay Tummalapalli >Assignee: Tanay Tummalapalli >Priority: Minor > Labels: test > Time Spent: 6h 40m > Remaining Estimate: 0h > > Integration Test for BigQuery Sink using Streaming Inserts for streaming > pipelines. > Integration tests currently exist for batch pipelines, it can also be added > for streaming pipelines using TestStream. This will be a precursor to the > failing integration test to be added for [BEAM-6611| > https://issues.apache.org/jira/browse/BEAM-6611]. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7726) [Go SDK] State Backed Iterables
[ https://issues.apache.org/jira/browse/BEAM-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-7726: --- Status: Open (was: Triage Needed) > [Go SDK] State Backed Iterables > --- > > Key: BEAM-7726 > URL: https://issues.apache.org/jira/browse/BEAM-7726 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Affects Versions: Not applicable >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > Fix For: Not applicable > > > The Go SDK should support the State backed iterables protocol per the proto. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644] > > Primary case is for iterables after CoGBKs. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7729) Converting Nullable null values in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-7729?focusedWorklogId=275554&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275554 ] ASF GitHub Bot logged work on BEAM-7729: Author: ASF GitHub Bot Created on: 11/Jul/19 23:47 Start Date: 11/Jul/19 23:47 Worklog Time Spent: 10m Work Description: riazela commented on issue #9045: [BEAM-7729] Fixes the bug by checking the value first before parsing it. URL: https://github.com/apache/beam/pull/9045#issuecomment-510693206 R: @akedin This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275554) Time Spent: 20m (was: 10m) > Converting Nullable null values in BigQuery > > > Key: BEAM-7729 > URL: https://issues.apache.org/jira/browse/BEAM-7729 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > In org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils#convertAvroFormat we do > not check if the value is null and we pass it to the other methods and if > that value is null, we will get a null pointer exception. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7729) Converting Nullable null values in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-7729?focusedWorklogId=275553&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275553 ] ASF GitHub Bot logged work on BEAM-7729: Author: ASF GitHub Bot Created on: 11/Jul/19 23:47 Start Date: 11/Jul/19 23:47 Worklog Time Spent: 10m Work Description: riazela commented on pull request #9045: [BEAM-7729] Fixes the bug by checking the value first before parsing it. URL: https://github.com/apache/beam/pull/9045 [BEAM-7729] Fixes the bug by checking the value first before parsing it. The approach is similar to parsing the text table rows. (org.apache.beam.sdk.extensions.sql.impl.schema.BeamTableUtils#autoCastField) Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://b
[jira] [Commented] (BEAM-7680) synthetic_pipeline_test.py flaky
[ https://issues.apache.org/jira/browse/BEAM-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883404#comment-16883404 ] Pablo Estrada commented on BEAM-7680: - [~chamikara] you're mentioned in the TODOs, WDYT? > synthetic_pipeline_test.py flaky > > > Key: BEAM-7680 > URL: https://issues.apache.org/jira/browse/BEAM-7680 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Udi Meiri >Assignee: Kasia Kucharczyk >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > {code:java} > 11:51:43 FAIL: testSyntheticSDFStep > (apache_beam.testing.synthetic_pipeline_test.SyntheticPipelineTest) > 11:51:43 > -- > 11:51:43 Traceback (most recent call last): > 11:51:43 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/synthetic_pipeline_test.py", > line 82, in testSyntheticSDFStep > 11:51:43 self.assertTrue(0.5 <= elapsed <= 3, elapsed) > 11:51:43 AssertionError: False is not true : 3.659700632095337{code} > [https://builds.apache.org/job/beam_PreCommit_Python_Cron/1502/consoleFull] > > Two flaky TODOs: > [https://github.com/apache/beam/blob/b79f24ced1c8519c29443ea7109c59ad18be2ebe/sdks/python/apache_beam/testing/synthetic_pipeline_test.py#L69-L82] -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7729) Converting Nullable null values in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alireza Samadianzakaria updated BEAM-7729: -- Description: In org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils#convertAvroFormat we do not check if the value is null and we pass it to the other methods and if that value is null, we will get a null pointer exception. (was: In org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils#convertAvroFormat we do not check if the value is null and we pass it to the other methods and if that value is null, we will get a null pointer exception. The fix will be simple, just check if the value is null and if the field is nullable, then return null in that case.) > Converting Nullable null values in BigQuery > > > Key: BEAM-7729 > URL: https://issues.apache.org/jira/browse/BEAM-7729 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Major > > In org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils#convertAvroFormat we do > not check if the value is null and we pass it to the other methods and if > that value is null, we will get a null pointer exception. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7729) Converting Nullable null values in BigQuery
Alireza Samadianzakaria created BEAM-7729: - Summary: Converting Nullable null values in BigQuery Key: BEAM-7729 URL: https://issues.apache.org/jira/browse/BEAM-7729 Project: Beam Issue Type: Bug Components: io-java-gcp Reporter: Alireza Samadianzakaria Assignee: Alireza Samadianzakaria In org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils#convertAvroFormat we do not check if the value is null and we pass it to the other methods and if that value is null, we will get a null pointer exception. The fix will be simple, just check if the value is null and if the field is nullable, then return null in that case. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7463) Bigquery IO ITs are flaky: incorrect checksum
[ https://issues.apache.org/jira/browse/BEAM-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883398#comment-16883398 ] Pablo Estrada commented on BEAM-7463: - taking a look... > Bigquery IO ITs are flaky: incorrect checksum > - > > Key: BEAM-7463 > URL: https://issues.apache.org/jira/browse/BEAM-7463 > Project: Beam > Issue Type: Bug > Components: io-python-gcp >Reporter: Valentyn Tymofieiev >Assignee: Pablo Estrada >Priority: Major > Labels: currently-failing > Time Spent: 3h 40m > Remaining Estimate: 0h > > {noformat} > 15:03:38 FAIL: test_big_query_new_types > (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT) > 15:03:38 > -- > 15:03:38 Traceback (most recent call last): > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py", > line 211, in test_big_query_new_types > 15:03:38 big_query_query_to_table_pipeline.run_bq_pipeline(options) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py", > line 82, in run_bq_pipeline > 15:03:38 result = p.run() > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/testing/test_pipeline.py", > line 107, in run > 15:03:38 else test_runner_api)) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py", > line 406, in run > 15:03:38 self._options).run(False) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 15:03:38 return self.runner.run_pipeline(self, self._options) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py", > line 51, in run_pipeline > 15:03:38 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 15:03:38 AssertionError: > 15:03:38 Expected: (Test pipeline expected terminated in state: DONE and > Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214) > 15:03:38 but: Expected checksum is > 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214 Actual checksum is > da39a3ee5e6b4b0d3255bfef95601890afd80709 > {noformat} > [~Juta] could this be caused by changes to Bigquery matcher? > https://github.com/apache/beam/pull/8621/files#diff-f1ec7e3a3e7e2e5082ddb7043954c108R134 > > cc: [~pabloem] [~chamikara] [~apilloud] > A recent postcommit run has BQ failures in other tests as well: > https://builds.apache.org/job/beam_PostCommit_Python3_Verify/1000/consoleFull -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (BEAM-7545) Row Count Estimation for CSV TextTable
[ https://issues.apache.org/jira/browse/BEAM-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alireza Samadianzakaria resolved BEAM-7545. --- Resolution: Done Fix Version/s: Not applicable > Row Count Estimation for CSV TextTable > -- > > Key: BEAM-7545 > URL: https://issues.apache.org/jira/browse/BEAM-7545 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Major > Fix For: Not applicable > > Time Spent: 4h 10m > Remaining Estimate: 0h > > Implementing Row Count Estimation for CSV Tables by reading the first few > lines of the file and estimating the number of records based on the length of > these lines and the total length of the file. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7728) Support ParquetTable in SQL
Kai Jiang created BEAM-7728: --- Summary: Support ParquetTable in SQL Key: BEAM-7728 URL: https://issues.apache.org/jira/browse/BEAM-7728 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Kai Jiang Assignee: Kai Jiang -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7463) Bigquery IO ITs are flaky: incorrect checksum
[ https://issues.apache.org/jira/browse/BEAM-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883391#comment-16883391 ] Valentyn Tymofieiev commented on BEAM-7463: --- [~pabloem] could you please take a look into BQ suites? your expertise in BQ connector may help us track this down faster. [~Juta], Do you remember why we removed sorting in Bigquery matcher? See: https://github.com/apache/beam/pull/8621/files#diff-f1ec7e3a3e7e2e5082ddb7043954c108R134. Does BQ guarantee some order of returned result? if not, then that may very well cause the checksum errors, but there are also other errors with this test like table not found and but "but" field is empty, perhaps we should investigate these separately. Also, to echo some points from conversation on https://github.com/apache/beam/pull/8751 for visibility: class attributes assigned in setup/teardown method are not shared between execution of different tests cases defined in the same test class, so the race of table cleanup is not obvious to me. > Bigquery IO ITs are flaky: incorrect checksum > - > > Key: BEAM-7463 > URL: https://issues.apache.org/jira/browse/BEAM-7463 > Project: Beam > Issue Type: Bug > Components: io-python-gcp >Reporter: Valentyn Tymofieiev >Assignee: Pablo Estrada >Priority: Major > Labels: currently-failing > Time Spent: 3h 40m > Remaining Estimate: 0h > > {noformat} > 15:03:38 FAIL: test_big_query_new_types > (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT) > 15:03:38 > -- > 15:03:38 Traceback (most recent call last): > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py", > line 211, in test_big_query_new_types > 15:03:38 big_query_query_to_table_pipeline.run_bq_pipeline(options) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py", > line 82, in run_bq_pipeline > 15:03:38 result = p.run() > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/testing/test_pipeline.py", > line 107, in run > 15:03:38 else test_runner_api)) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py", > line 406, in run > 15:03:38 self._options).run(False) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 15:03:38 return self.runner.run_pipeline(self, self._options) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py", > line 51, in run_pipeline > 15:03:38 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 15:03:38 AssertionError: > 15:03:38 Expected: (Test pipeline expected terminated in state: DONE and > Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214) > 15:03:38 but: Expected checksum is > 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214 Actual checksum is > da39a3ee5e6b4b0d3255bfef95601890afd80709 > {noformat} > [~Juta] could this be caused by changes to Bigquery matcher? > https://github.com/apache/beam/pull/8621/files#diff-f1ec7e3a3e7e2e5082ddb7043954c108R134 > > cc: [~pabloem] [~chamikara] [~apilloud] > A recent postcommit run has BQ failures in other tests as well: > https://builds.apache.org/job/beam_PostCommit_Python3_Verify/1000/consoleFull -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (BEAM-7463) Bigquery IO ITs are flaky: incorrect checksum
[ https://issues.apache.org/jira/browse/BEAM-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev reassigned BEAM-7463: - Assignee: Pablo Estrada > Bigquery IO ITs are flaky: incorrect checksum > - > > Key: BEAM-7463 > URL: https://issues.apache.org/jira/browse/BEAM-7463 > Project: Beam > Issue Type: Bug > Components: io-python-gcp >Reporter: Valentyn Tymofieiev >Assignee: Pablo Estrada >Priority: Major > Labels: currently-failing > Time Spent: 3h 40m > Remaining Estimate: 0h > > {noformat} > 15:03:38 FAIL: test_big_query_new_types > (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT) > 15:03:38 > -- > 15:03:38 Traceback (most recent call last): > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py", > line 211, in test_big_query_new_types > 15:03:38 big_query_query_to_table_pipeline.run_bq_pipeline(options) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py", > line 82, in run_bq_pipeline > 15:03:38 result = p.run() > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/testing/test_pipeline.py", > line 107, in run > 15:03:38 else test_runner_api)) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py", > line 406, in run > 15:03:38 self._options).run(False) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 15:03:38 return self.runner.run_pipeline(self, self._options) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py", > line 51, in run_pipeline > 15:03:38 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 15:03:38 AssertionError: > 15:03:38 Expected: (Test pipeline expected terminated in state: DONE and > Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214) > 15:03:38 but: Expected checksum is > 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214 Actual checksum is > da39a3ee5e6b4b0d3255bfef95601890afd80709 > {noformat} > [~Juta] could this be caused by changes to Bigquery matcher? > https://github.com/apache/beam/pull/8621/files#diff-f1ec7e3a3e7e2e5082ddb7043954c108R134 > > cc: [~pabloem] [~chamikara] [~apilloud] > A recent postcommit run has BQ failures in other tests as well: > https://builds.apache.org/job/beam_PostCommit_Python3_Verify/1000/consoleFull -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7689) Temporary directory for WriteOperation may not be unique in FileBaseSink
[ https://issues.apache.org/jira/browse/BEAM-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883375#comment-16883375 ] Anton Kedin commented on BEAM-7689: --- Fails in Dataflow examples test > Temporary directory for WriteOperation may not be unique in FileBaseSink > > > Key: BEAM-7689 > URL: https://issues.apache.org/jira/browse/BEAM-7689 > Project: Beam > Issue Type: Bug > Components: io-java-files >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Blocker > Fix For: 2.7.1, 2.14.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Temporary directory for WriteOperation in FileBasedSink is generated from a > second-granularity timestamp (-MM-dd_HH-mm-ss) and unique increasing > index. Such granularity is not enough to make temporary directories unique > between different jobs. When two jobs share the same temporary directory, > output file may not be produced in one job since the required temporary file > can be deleted from another job. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines
[ https://issues.apache.org/jira/browse/BEAM-7437?focusedWorklogId=275535&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275535 ] ASF GitHub Bot logged work on BEAM-7437: Author: ASF GitHub Bot Created on: 11/Jul/19 22:27 Start Date: 11/Jul/19 22:27 Worklog Time Spent: 10m Work Description: ttanay commented on issue #9044: [BEAM-7437] Raise RuntimeError for PY2 in BigqueryFullResultStreamingMatcher URL: https://github.com/apache/beam/pull/9044#issuecomment-510676748 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275535) Time Spent: 6.5h (was: 6h 20m) > Integration Test for BQ streaming inserts for streaming pipelines > - > > Key: BEAM-7437 > URL: https://issues.apache.org/jira/browse/BEAM-7437 > Project: Beam > Issue Type: Test > Components: io-python-gcp >Affects Versions: 2.12.0 >Reporter: Tanay Tummalapalli >Assignee: Tanay Tummalapalli >Priority: Minor > Labels: test > Time Spent: 6.5h > Remaining Estimate: 0h > > Integration Test for BigQuery Sink using Streaming Inserts for streaming > pipelines. > Integration tests currently exist for batch pipelines, it can also be added > for streaming pipelines using TestStream. This will be a precursor to the > failing integration test to be added for [BEAM-6611| > https://issues.apache.org/jira/browse/BEAM-6611]. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (BEAM-6783) byte[] breaks in BeamSQL codegen
[ https://issues.apache.org/jira/browse/BEAM-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang reassigned BEAM-6783: -- Assignee: (was: Andrew Pilloud) > byte[] breaks in BeamSQL codegen > > > Key: BEAM-6783 > URL: https://issues.apache.org/jira/browse/BEAM-6783 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > > Calcite will call `byte[].toString` because BeamSQL codegen read byte[] from > Row to calcite (see: > https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java#L334). > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines
[ https://issues.apache.org/jira/browse/BEAM-7437?focusedWorklogId=275534&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275534 ] ASF GitHub Bot logged work on BEAM-7437: Author: ASF GitHub Bot Created on: 11/Jul/19 22:20 Start Date: 11/Jul/19 22:20 Worklog Time Spent: 10m Work Description: ttanay commented on issue #9044: [BEAM-7437] Raise RuntimeError for PY2 in BigqueryFullResultStreamingMatcher URL: https://github.com/apache/beam/pull/9044#issuecomment-510675038 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275534) Time Spent: 6h 20m (was: 6h 10m) > Integration Test for BQ streaming inserts for streaming pipelines > - > > Key: BEAM-7437 > URL: https://issues.apache.org/jira/browse/BEAM-7437 > Project: Beam > Issue Type: Test > Components: io-python-gcp >Affects Versions: 2.12.0 >Reporter: Tanay Tummalapalli >Assignee: Tanay Tummalapalli >Priority: Minor > Labels: test > Time Spent: 6h 20m > Remaining Estimate: 0h > > Integration Test for BigQuery Sink using Streaming Inserts for streaming > pipelines. > Integration tests currently exist for batch pipelines, it can also be added > for streaming pipelines using TestStream. This will be a precursor to the > failing integration test to be added for [BEAM-6611| > https://issues.apache.org/jira/browse/BEAM-6611]. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (BEAM-7548) test_approximate_unique_global_by_error is flaky
[ https://issues.apache.org/jira/browse/BEAM-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hannah Jiang resolved BEAM-7548. Resolution: Fixed Fix Version/s: 2.14.0 > test_approximate_unique_global_by_error is flaky > > > Key: BEAM-7548 > URL: https://issues.apache.org/jira/browse/BEAM-7548 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Valentyn Tymofieiev >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.14.0 > > Time Spent: 9h 10m > Remaining Estimate: 0h > > The error happened on Jenkins in Python 3.5 suite, which currently uses > Python 3.5.2 interpreter: > {noformat} > 11:57:47 > == > 11:57:47 ERROR: test_approximate_unique_global_by_error > (apache_beam.transforms.stats_test.ApproximateUniqueTest) > 11:57:47 > -- > 11:57:47 Traceback (most recent call last): > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py", > line 236, in test_approximate_unique_global_by_error > 11:57:47 pipeline.run() > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py", > line 107, in run > 11:57:47 else test_runner_api)) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py", > line 406, in run > 11:57:47 self._options).run(False) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py", > line 419, in run > 11:57:47 return self.runner.run_pipeline(self, self._options) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 128, in run_pipeline > 11:57:47 return runner.run_pipeline(pipeline, options) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 289, in run_pipeline > 11:57:47 default_environment=self._default_environment)) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 293, in run_via_runner_api > 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto)) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 369, in run_stages > 11:57:47 stage_context.safe_coders) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 531, in run_stage > 11:57:47 data_input, data_output) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 1235, in process_bundle > 11:57:47 result_future = > self._controller.control_handler.push(process_bundle) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 851, in push > 11:57:47 response = self.worker.do_instruction(request) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py", > line 342, in do_instruction > 11:57:47 request.instruction_id) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py", > line 368, in process_bundle > 11:57:47 bundle_processor.process_bundle(instruction_id)) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/bundle_processor.py",
[jira] [Work logged] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines
[ https://issues.apache.org/jira/browse/BEAM-7437?focusedWorklogId=275527&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275527 ] ASF GitHub Bot logged work on BEAM-7437: Author: ASF GitHub Bot Created on: 11/Jul/19 21:58 Start Date: 11/Jul/19 21:58 Worklog Time Spent: 10m Work Description: ttanay commented on pull request #9044: [BEAM-7437] Raise RuntimeError for PY2 in BigqueryFullResultStreamingMatcher URL: https://github.com/apache/beam/pull/9044 TimeoutError is not a built-in exception in Python 2. Raise RuntimeError in Python 2. The tests passed in the original PR[1], but the PreCommit started failing today because of this. [1] https://github.com/apache/beam/pull/8934 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status
[jira] [Work logged] (BEAM-7689) Temporary directory for WriteOperation may not be unique in FileBaseSink
[ https://issues.apache.org/jira/browse/BEAM-7689?focusedWorklogId=275523&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275523 ] ASF GitHub Bot logged work on BEAM-7689: Author: ASF GitHub Bot Created on: 11/Jul/19 21:47 Start Date: 11/Jul/19 21:47 Worklog Time Spent: 10m Work Description: ihji commented on issue #9039: [BEAM-7689] cherry-picking for 2.14.0 URL: https://github.com/apache/beam/pull/9039#issuecomment-510666736 Run Java_Examples_Dataflow PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275523) Time Spent: 2.5h (was: 2h 20m) > Temporary directory for WriteOperation may not be unique in FileBaseSink > > > Key: BEAM-7689 > URL: https://issues.apache.org/jira/browse/BEAM-7689 > Project: Beam > Issue Type: Bug > Components: io-java-files >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Blocker > Fix For: 2.7.1, 2.14.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Temporary directory for WriteOperation in FileBasedSink is generated from a > second-granularity timestamp (-MM-dd_HH-mm-ss) and unique increasing > index. Such granularity is not enough to make temporary directories unique > between different jobs. When two jobs share the same temporary directory, > output file may not be produced in one job since the required temporary file > can be deleted from another job. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (BEAM-7727) Python RestrictionTracker should have a synchronization wrapper like Java
[ https://issues.apache.org/jira/browse/BEAM-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyuan Zhang resolved BEAM-7727. Resolution: Duplicate Fix Version/s: Not applicable Duplicated with https://issues.apache.org/jira/browse/BEAM-7473. > Python RestrictionTracker should have a synchronization wrapper like Java > - > > Key: BEAM-7727 > URL: https://issues.apache.org/jira/browse/BEAM-7727 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: Not applicable > > > Currently Python RestrictionTracker manages synchronization, which is not > easy for sdk user to implemented. Python SDK should provide a same wrapper > like Java to help manage synchronization: > https://github.com/apache/beam/pull/6467 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7727) Python RestrictionTracker should have a synchronization wrapper like Java
[ https://issues.apache.org/jira/browse/BEAM-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyuan Zhang updated BEAM-7727: --- Status: Open (was: Triage Needed) > Python RestrictionTracker should have a synchronization wrapper like Java > - > > Key: BEAM-7727 > URL: https://issues.apache.org/jira/browse/BEAM-7727 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > > Currently Python RestrictionTracker manages synchronization, which is not > easy for sdk user to implemented. Python SDK should provide a same wrapper > like Java to help manage synchronization: > https://github.com/apache/beam/pull/6467 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (BEAM-7727) Python RestrictionTracker should have a synchronization wrapper like Java
[ https://issues.apache.org/jira/browse/BEAM-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyuan Zhang reassigned BEAM-7727: -- Assignee: Boyuan Zhang > Python RestrictionTracker should have a synchronization wrapper like Java > - > > Key: BEAM-7727 > URL: https://issues.apache.org/jira/browse/BEAM-7727 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > > Currently Python RestrictionTracker manages synchronization, which is not > easy for sdk user to implemented. Python SDK should provide a same wrapper > like Java to help manage synchronization: > https://github.com/apache/beam/pull/6467 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7727) Python RestrictionTracker should have a synchronization wrapper like Java
Boyuan Zhang created BEAM-7727: -- Summary: Python RestrictionTracker should have a synchronization wrapper like Java Key: BEAM-7727 URL: https://issues.apache.org/jira/browse/BEAM-7727 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Boyuan Zhang Currently Python RestrictionTracker manages synchronization, which is not easy for sdk user to implemented. Python SDK should provide a same wrapper like Java to help manage synchronization: https://github.com/apache/beam/pull/6467 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7726) [Go SDK] State Backed Iterables
Robert Burke created BEAM-7726: -- Summary: [Go SDK] State Backed Iterables Key: BEAM-7726 URL: https://issues.apache.org/jira/browse/BEAM-7726 Project: Beam Issue Type: Improvement Components: sdk-go Affects Versions: Not applicable Reporter: Robert Burke Assignee: Robert Burke Fix For: Not applicable The Go SDK should support the State backed iterables protocol per the proto. [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644] Primary case is for iterables after CoGBKs. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (BEAM-6825) Improve pipeline construction time error messages in Go SDK.
[ https://issues.apache.org/jira/browse/BEAM-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira resolved BEAM-6825. --- Resolution: Fixed Fix Version/s: Not applicable Forgot to update this bug, but this effort has been done for about a month. > Improve pipeline construction time error messages in Go SDK. > > > Key: BEAM-6825 > URL: https://issues.apache.org/jira/browse/BEAM-6825 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Minor > Fix For: Not applicable > > Time Spent: 1.5h > Remaining Estimate: 0h > > Many error messages for common pipeline construction mistakes are unclear and > unhelpful. They need to be improved to provide more context, especially for > newer users. This bug tracks these error message improvements. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (BEAM-7463) Bigquery IO ITs are flaky: incorrect checksum
[ https://issues.apache.org/jira/browse/BEAM-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev reassigned BEAM-7463: - Assignee: (was: Juta Staes) > Bigquery IO ITs are flaky: incorrect checksum > - > > Key: BEAM-7463 > URL: https://issues.apache.org/jira/browse/BEAM-7463 > Project: Beam > Issue Type: Bug > Components: io-python-gcp >Reporter: Valentyn Tymofieiev >Priority: Major > Labels: currently-failing > Time Spent: 3h 40m > Remaining Estimate: 0h > > {noformat} > 15:03:38 FAIL: test_big_query_new_types > (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT) > 15:03:38 > -- > 15:03:38 Traceback (most recent call last): > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py", > line 211, in test_big_query_new_types > 15:03:38 big_query_query_to_table_pipeline.run_bq_pipeline(options) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py", > line 82, in run_bq_pipeline > 15:03:38 result = p.run() > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/testing/test_pipeline.py", > line 107, in run > 15:03:38 else test_runner_api)) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py", > line 406, in run > 15:03:38 self._options).run(False) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 15:03:38 return self.runner.run_pipeline(self, self._options) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py", > line 51, in run_pipeline > 15:03:38 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 15:03:38 AssertionError: > 15:03:38 Expected: (Test pipeline expected terminated in state: DONE and > Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214) > 15:03:38 but: Expected checksum is > 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214 Actual checksum is > da39a3ee5e6b4b0d3255bfef95601890afd80709 > {noformat} > [~Juta] could this be caused by changes to Bigquery matcher? > https://github.com/apache/beam/pull/8621/files#diff-f1ec7e3a3e7e2e5082ddb7043954c108R134 > > cc: [~pabloem] [~chamikara] [~apilloud] > A recent postcommit run has BQ failures in other tests as well: > https://builds.apache.org/job/beam_PostCommit_Python3_Verify/1000/consoleFull -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7484) Throughput collection in BigQuery performance tests
[ https://issues.apache.org/jira/browse/BEAM-7484?focusedWorklogId=275515&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275515 ] ASF GitHub Bot logged work on BEAM-7484: Author: ASF GitHub Bot Created on: 11/Jul/19 21:00 Start Date: 11/Jul/19 21:00 Worklog Time Spent: 10m Work Description: udim commented on pull request #8766: [BEAM-7484] Metrics collection in BigQuery perf tests URL: https://github.com/apache/beam/pull/8766#discussion_r302695476 ## File path: sdks/python/apache_beam/io/gcp/bigquery_read_perf_test.py ## @@ -126,9 +133,21 @@ def format_record(record): p.run().wait_until_finish() def test(self): +def extract_values(row): + """Extracts value from a row.""" + yield base64.b64decode(row.values()[0]) + self.result = (self.pipeline | 'Read from BigQuery' >> Read(BigQuerySource( dataset=self.input_dataset, table=self.input_table)) + | 'Measure bytes' >> ParDo(MeasureBytes( + self.metrics_namespace, extract_values)) + | 'Count messages' >> ParDo(CountMessages( + self.metrics_namespace)) + | 'Measure time: Start' >> ParDo(MeasureTime( + self.metrics_namespace)) + | 'Measure time: End' >> ParDo(MeasureTime( Review comment: I don't understand what the difference is between Start and End versions of MeasureTime. From what I understand, you're measuring throughput, like bytes per second and messages per second. What time values do you extract from these metrics? Total running time? Momentary running time? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275515) Time Spent: 2h 10m (was: 2h) > Throughput collection in BigQuery performance tests > --- > > Key: BEAM-7484 > URL: https://issues.apache.org/jira/browse/BEAM-7484 > Project: Beam > Issue Type: New Feature > Components: testing >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > The goal is to collect bytes/time and messages/time metrics in BQ read and > write tests in Python SDK. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7424) Retry HTTP 429 errors from GCS w/ exponential backoff when reading data
[ https://issues.apache.org/jira/browse/BEAM-7424?focusedWorklogId=275505&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275505 ] ASF GitHub Bot logged work on BEAM-7424: Author: ASF GitHub Bot Created on: 11/Jul/19 20:18 Start Date: 11/Jul/19 20:18 Worklog Time Spent: 10m Work Description: akedin commented on pull request #9014: [BEAM-7424] cherry-picking for 2.14.0 URL: https://github.com/apache/beam/pull/9014 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275505) Time Spent: 5h 50m (was: 5h 40m) > Retry HTTP 429 errors from GCS w/ exponential backoff when reading data > --- > > Key: BEAM-7424 > URL: https://issues.apache.org/jira/browse/BEAM-7424 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, io-python-gcp, sdk-py-core >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Blocker > Fix For: 2.14.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > This has to be done for both Java and Python SDKs. > Seems like Java SDK already retries 429 errors w/o backoff (please verify): > [https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java#L185] -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming
[ https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=275504&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275504 ] ASF GitHub Bot logged work on BEAM-6611: Author: ASF GitHub Bot Created on: 11/Jul/19 20:16 Start Date: 11/Jul/19 20:16 Worklog Time Spent: 10m Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery file loads in Streaming for Python SDK URL: https://github.com/apache/beam/pull/8871#issuecomment-510637699 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275504) Time Spent: 2h 20m (was: 2h 10m) > A Python Sink for BigQuery with File Loads in Streaming > --- > > Key: BEAM-6611 > URL: https://issues.apache.org/jira/browse/BEAM-6611 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Tanay Tummalapalli >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 2h 20m > Remaining Estimate: 0h > > The Java SDK supports a bunch of methods for writing data into BigQuery, > while the Python SDK supports the following: > - Streaming inserts for streaming pipelines [As seen in [bigquery.py and > BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]] > - File loads for batch pipelines [As implemented in [PR > 7655|https://github.com/apache/beam/pull/7655]] > Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming > The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads > application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776]. > File loads have the advantage of being much cheaper than streaming inserts > (although they also are slower for the records to show up in the table). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (BEAM-7621) Selecting null row field's causes Null pointer Exception with BeamSql
[ https://issues.apache.org/jira/browse/BEAM-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Pilloud resolved BEAM-7621. -- Resolution: Fixed > Selecting null row field's causes Null pointer Exception with BeamSql > - > > Key: BEAM-7621 > URL: https://issues.apache.org/jira/browse/BEAM-7621 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.14.0 >Reporter: Vishwas >Priority: Major > Fix For: 2.16.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > I have below schema of the row: > private static final Schema innerRowWithArraySchema = > Schema.builder() > .addStringField("string_field") > .addArrayField("array_field", FieldType.INT64) > .build(); > private static final Schema nullableNestedRowWithArraySchema = > Schema.builder() > > .addNullableField("field1",FieldType.row(innerRowWithArraySchema)) > .addNullableField("field2", > FieldType.array(FieldType.row(innerRowWithArraySchema))) > .build(); > > *// Create a row with null values* > Row nullRow = Row.nullRow(nullableNestedRowWithArraySchema); > > Now when we try to select nested row field's NPE is thrown: > .apply(SqlTransform.query("select PCOLLECTION.field1.string_field as > row_string_field, PCOLLECTION.field2[2].string_field as array_string_field > from PCOLLECTION")); > > Below is the exception thrown: > Caused by: java.lang.RuntimeException: CalcFn failed to evaluate: { > final org.apache.beam.sdk.values.Row current = > (org.apache.beam.sdk.values.Row) c.element(); > *final java.util.List inp1_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getRow(1));* > *final java.util.List inp2_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getArray(2));* > > c.output(org.apache.beam.sdk.values.Row.withSchema(outputSchema).addValue(inp1_ > == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(inp1_, > 0, > "string_field")).addValue(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2) == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2), 0, "string_field")).build()); > } > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$CalcFn.processElement(BeamCalcRel.java:253) > Caused by: java.lang.NullPointerException > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$WrappedList.of(BeamCalcRel.java:459) > at SC.eval0(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > In BeamCalcRel.class *field* method, null values are not handled for > composite types. > public Expression field(BlockBuilder list, int index, Type storageType) { > > } else if (fromType.getTypeName().isCompositeType() > || (fromType.getTypeName().isCollectionType() > && > fromType.getCollectionElementType().getTypeName().isCompositeType())) { > field = Expressions.call(WrappedList.class, "of", field); // > List.of() is passed with null values > } > ... > } > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7621) Selecting null row field's causes Null pointer Exception with BeamSql
[ https://issues.apache.org/jira/browse/BEAM-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Pilloud updated BEAM-7621: - Fix Version/s: 2.16.0 > Selecting null row field's causes Null pointer Exception with BeamSql > - > > Key: BEAM-7621 > URL: https://issues.apache.org/jira/browse/BEAM-7621 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.14.0 >Reporter: Vishwas >Priority: Major > Fix For: 2.16.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > I have below schema of the row: > private static final Schema innerRowWithArraySchema = > Schema.builder() > .addStringField("string_field") > .addArrayField("array_field", FieldType.INT64) > .build(); > private static final Schema nullableNestedRowWithArraySchema = > Schema.builder() > > .addNullableField("field1",FieldType.row(innerRowWithArraySchema)) > .addNullableField("field2", > FieldType.array(FieldType.row(innerRowWithArraySchema))) > .build(); > > *// Create a row with null values* > Row nullRow = Row.nullRow(nullableNestedRowWithArraySchema); > > Now when we try to select nested row field's NPE is thrown: > .apply(SqlTransform.query("select PCOLLECTION.field1.string_field as > row_string_field, PCOLLECTION.field2[2].string_field as array_string_field > from PCOLLECTION")); > > Below is the exception thrown: > Caused by: java.lang.RuntimeException: CalcFn failed to evaluate: { > final org.apache.beam.sdk.values.Row current = > (org.apache.beam.sdk.values.Row) c.element(); > *final java.util.List inp1_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getRow(1));* > *final java.util.List inp2_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getArray(2));* > > c.output(org.apache.beam.sdk.values.Row.withSchema(outputSchema).addValue(inp1_ > == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(inp1_, > 0, > "string_field")).addValue(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2) == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2), 0, "string_field")).build()); > } > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$CalcFn.processElement(BeamCalcRel.java:253) > Caused by: java.lang.NullPointerException > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$WrappedList.of(BeamCalcRel.java:459) > at SC.eval0(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > In BeamCalcRel.class *field* method, null values are not handled for > composite types. > public Expression field(BlockBuilder list, int index, Type storageType) { > > } else if (fromType.getTypeName().isCompositeType() > || (fromType.getTypeName().isCollectionType() > && > fromType.getCollectionElementType().getTypeName().isCompositeType())) { > field = Expressions.call(WrappedList.class, "of", field); // > List.of() is passed with null values > } > ... > } > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7621) Selecting null row field's causes Null pointer Exception with BeamSql
[ https://issues.apache.org/jira/browse/BEAM-7621?focusedWorklogId=275497&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275497 ] ASF GitHub Bot logged work on BEAM-7621: Author: ASF GitHub Bot Created on: 11/Jul/19 19:59 Start Date: 11/Jul/19 19:59 Worklog Time Spent: 10m Work Description: apilloud commented on issue #8930: [BEAM-7621] Null pointer exception when accessing null row fields in BeamSql URL: https://github.com/apache/beam/pull/8930#issuecomment-510632045 Thanks for fixing this bug! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275497) Time Spent: 3.5h (was: 3h 20m) > Selecting null row field's causes Null pointer Exception with BeamSql > - > > Key: BEAM-7621 > URL: https://issues.apache.org/jira/browse/BEAM-7621 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.14.0 >Reporter: Vishwas >Priority: Major > Fix For: 2.16.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > I have below schema of the row: > private static final Schema innerRowWithArraySchema = > Schema.builder() > .addStringField("string_field") > .addArrayField("array_field", FieldType.INT64) > .build(); > private static final Schema nullableNestedRowWithArraySchema = > Schema.builder() > > .addNullableField("field1",FieldType.row(innerRowWithArraySchema)) > .addNullableField("field2", > FieldType.array(FieldType.row(innerRowWithArraySchema))) > .build(); > > *// Create a row with null values* > Row nullRow = Row.nullRow(nullableNestedRowWithArraySchema); > > Now when we try to select nested row field's NPE is thrown: > .apply(SqlTransform.query("select PCOLLECTION.field1.string_field as > row_string_field, PCOLLECTION.field2[2].string_field as array_string_field > from PCOLLECTION")); > > Below is the exception thrown: > Caused by: java.lang.RuntimeException: CalcFn failed to evaluate: { > final org.apache.beam.sdk.values.Row current = > (org.apache.beam.sdk.values.Row) c.element(); > *final java.util.List inp1_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getRow(1));* > *final java.util.List inp2_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getArray(2));* > > c.output(org.apache.beam.sdk.values.Row.withSchema(outputSchema).addValue(inp1_ > == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(inp1_, > 0, > "string_field")).addValue(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2) == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2), 0, "string_field")).build()); > } > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$CalcFn.processElement(BeamCalcRel.java:253) > Caused by: java.lang.NullPointerException > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$WrappedList.of(BeamCalcRel.java:459) > at SC.eval0(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > In BeamCalcRel.class *field* method, null values are not handled for > composite types. > public Expression field(BlockBuilder list, int index, Type storageType) { > > } else if (fromType.getTypeName().isCompositeType() > || (fromType.getTypeName().isCollectionType() > && > fromType.getCollectionElementType().getTypeName().isCompositeType())) { > field = Expressions.call(WrappedList.class, "of", field); // > List.of() is passed with null values > } > ... > } > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7621) Selecting null row field's causes Null pointer Exception with BeamSql
[ https://issues.apache.org/jira/browse/BEAM-7621?focusedWorklogId=275494&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275494 ] ASF GitHub Bot logged work on BEAM-7621: Author: ASF GitHub Bot Created on: 11/Jul/19 19:54 Start Date: 11/Jul/19 19:54 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #8930: [BEAM-7621] Null pointer exception when accessing null row fields in BeamSql URL: https://github.com/apache/beam/pull/8930 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275494) Time Spent: 3h 20m (was: 3h 10m) > Selecting null row field's causes Null pointer Exception with BeamSql > - > > Key: BEAM-7621 > URL: https://issues.apache.org/jira/browse/BEAM-7621 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.14.0 >Reporter: Vishwas >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > I have below schema of the row: > private static final Schema innerRowWithArraySchema = > Schema.builder() > .addStringField("string_field") > .addArrayField("array_field", FieldType.INT64) > .build(); > private static final Schema nullableNestedRowWithArraySchema = > Schema.builder() > > .addNullableField("field1",FieldType.row(innerRowWithArraySchema)) > .addNullableField("field2", > FieldType.array(FieldType.row(innerRowWithArraySchema))) > .build(); > > *// Create a row with null values* > Row nullRow = Row.nullRow(nullableNestedRowWithArraySchema); > > Now when we try to select nested row field's NPE is thrown: > .apply(SqlTransform.query("select PCOLLECTION.field1.string_field as > row_string_field, PCOLLECTION.field2[2].string_field as array_string_field > from PCOLLECTION")); > > Below is the exception thrown: > Caused by: java.lang.RuntimeException: CalcFn failed to evaluate: { > final org.apache.beam.sdk.values.Row current = > (org.apache.beam.sdk.values.Row) c.element(); > *final java.util.List inp1_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getRow(1));* > *final java.util.List inp2_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getArray(2));* > > c.output(org.apache.beam.sdk.values.Row.withSchema(outputSchema).addValue(inp1_ > == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(inp1_, > 0, > "string_field")).addValue(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2) == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2), 0, "string_field")).build()); > } > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$CalcFn.processElement(BeamCalcRel.java:253) > Caused by: java.lang.NullPointerException > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$WrappedList.of(BeamCalcRel.java:459) > at SC.eval0(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > In BeamCalcRel.class *field* method, null values are not handled for > composite types. > public Expression field(BlockBuilder list, int index, Type storageType) { > > } else if (fromType.getTypeName().isCompositeType() > || (fromType.getTypeName().isCollectionType() > && > fromType.getCollectionElementType().getTypeName().isCompositeType())) { > field = Expressions.call(WrappedList.class, "of", field); // > List.of() is passed with null values > } > ... > } > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky
[ https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=275483&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275483 ] ASF GitHub Bot logged work on BEAM-7548: Author: ASF GitHub Bot Created on: 11/Jul/19 19:32 Start Date: 11/Jul/19 19:32 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on issue #8959: [BEAM-7548] Cherry pick - fix flaky tests for ApproximateUnique URL: https://github.com/apache/beam/pull/8959#issuecomment-510623257 Thank you Anton! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275483) Time Spent: 9h 10m (was: 9h) > test_approximate_unique_global_by_error is flaky > > > Key: BEAM-7548 > URL: https://issues.apache.org/jira/browse/BEAM-7548 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Valentyn Tymofieiev >Assignee: Hannah Jiang >Priority: Major > Time Spent: 9h 10m > Remaining Estimate: 0h > > The error happened on Jenkins in Python 3.5 suite, which currently uses > Python 3.5.2 interpreter: > {noformat} > 11:57:47 > == > 11:57:47 ERROR: test_approximate_unique_global_by_error > (apache_beam.transforms.stats_test.ApproximateUniqueTest) > 11:57:47 > -- > 11:57:47 Traceback (most recent call last): > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py", > line 236, in test_approximate_unique_global_by_error > 11:57:47 pipeline.run() > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py", > line 107, in run > 11:57:47 else test_runner_api)) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py", > line 406, in run > 11:57:47 self._options).run(False) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py", > line 419, in run > 11:57:47 return self.runner.run_pipeline(self, self._options) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 128, in run_pipeline > 11:57:47 return runner.run_pipeline(pipeline, options) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 289, in run_pipeline > 11:57:47 default_environment=self._default_environment)) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 293, in run_via_runner_api > 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto)) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 369, in run_stages > 11:57:47 stage_context.safe_coders) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 531, in run_stage > 11:57:47 data_input, data_output) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 1235, in process_bundle > 11:57:47 result_future = > self._controller.control_handler.push(process_bundle) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 851, in push > 11:57:47 response = self.w
[jira] [Work logged] (BEAM-7621) Selecting null row field's causes Null pointer Exception with BeamSql
[ https://issues.apache.org/jira/browse/BEAM-7621?focusedWorklogId=275479&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275479 ] ASF GitHub Bot logged work on BEAM-7621: Author: ASF GitHub Bot Created on: 11/Jul/19 19:26 Start Date: 11/Jul/19 19:26 Worklog Time Spent: 10m Work Description: apilloud commented on issue #8930: [BEAM-7621] Null pointer exception when accessing null row fields in BeamSql URL: https://github.com/apache/beam/pull/8930#issuecomment-510620634 run precommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275479) Time Spent: 3h 10m (was: 3h) > Selecting null row field's causes Null pointer Exception with BeamSql > - > > Key: BEAM-7621 > URL: https://issues.apache.org/jira/browse/BEAM-7621 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.14.0 >Reporter: Vishwas >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > I have below schema of the row: > private static final Schema innerRowWithArraySchema = > Schema.builder() > .addStringField("string_field") > .addArrayField("array_field", FieldType.INT64) > .build(); > private static final Schema nullableNestedRowWithArraySchema = > Schema.builder() > > .addNullableField("field1",FieldType.row(innerRowWithArraySchema)) > .addNullableField("field2", > FieldType.array(FieldType.row(innerRowWithArraySchema))) > .build(); > > *// Create a row with null values* > Row nullRow = Row.nullRow(nullableNestedRowWithArraySchema); > > Now when we try to select nested row field's NPE is thrown: > .apply(SqlTransform.query("select PCOLLECTION.field1.string_field as > row_string_field, PCOLLECTION.field2[2].string_field as array_string_field > from PCOLLECTION")); > > Below is the exception thrown: > Caused by: java.lang.RuntimeException: CalcFn failed to evaluate: { > final org.apache.beam.sdk.values.Row current = > (org.apache.beam.sdk.values.Row) c.element(); > *final java.util.List inp1_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getRow(1));* > *final java.util.List inp2_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getArray(2));* > > c.output(org.apache.beam.sdk.values.Row.withSchema(outputSchema).addValue(inp1_ > == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(inp1_, > 0, > "string_field")).addValue(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2) == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2), 0, "string_field")).build()); > } > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$CalcFn.processElement(BeamCalcRel.java:253) > Caused by: java.lang.NullPointerException > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$WrappedList.of(BeamCalcRel.java:459) > at SC.eval0(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > In BeamCalcRel.class *field* method, null values are not handled for > composite types. > public Expression field(BlockBuilder list, int index, Type storageType) { > > } else if (fromType.getTypeName().isCompositeType() > || (fromType.getTypeName().isCollectionType() > && > fromType.getCollectionElementType().getTypeName().isCompositeType())) { > field = Expressions.call(WrappedList.class, "of", field); // > List.of() is passed with null values > } > ... > } > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7621) Selecting null row field's causes Null pointer Exception with BeamSql
[ https://issues.apache.org/jira/browse/BEAM-7621?focusedWorklogId=275477&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275477 ] ASF GitHub Bot logged work on BEAM-7621: Author: ASF GitHub Bot Created on: 11/Jul/19 19:25 Start Date: 11/Jul/19 19:25 Worklog Time Spent: 10m Work Description: apilloud commented on issue #8930: [BEAM-7621] Null pointer exception when accessing null row fields in BeamSql URL: https://github.com/apache/beam/pull/8930#issuecomment-510621183 run java precommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275477) Time Spent: 3h (was: 2h 50m) > Selecting null row field's causes Null pointer Exception with BeamSql > - > > Key: BEAM-7621 > URL: https://issues.apache.org/jira/browse/BEAM-7621 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.14.0 >Reporter: Vishwas >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > I have below schema of the row: > private static final Schema innerRowWithArraySchema = > Schema.builder() > .addStringField("string_field") > .addArrayField("array_field", FieldType.INT64) > .build(); > private static final Schema nullableNestedRowWithArraySchema = > Schema.builder() > > .addNullableField("field1",FieldType.row(innerRowWithArraySchema)) > .addNullableField("field2", > FieldType.array(FieldType.row(innerRowWithArraySchema))) > .build(); > > *// Create a row with null values* > Row nullRow = Row.nullRow(nullableNestedRowWithArraySchema); > > Now when we try to select nested row field's NPE is thrown: > .apply(SqlTransform.query("select PCOLLECTION.field1.string_field as > row_string_field, PCOLLECTION.field2[2].string_field as array_string_field > from PCOLLECTION")); > > Below is the exception thrown: > Caused by: java.lang.RuntimeException: CalcFn failed to evaluate: { > final org.apache.beam.sdk.values.Row current = > (org.apache.beam.sdk.values.Row) c.element(); > *final java.util.List inp1_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getRow(1));* > *final java.util.List inp2_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getArray(2));* > > c.output(org.apache.beam.sdk.values.Row.withSchema(outputSchema).addValue(inp1_ > == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(inp1_, > 0, > "string_field")).addValue(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2) == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2), 0, "string_field")).build()); > } > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$CalcFn.processElement(BeamCalcRel.java:253) > Caused by: java.lang.NullPointerException > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$WrappedList.of(BeamCalcRel.java:459) > at SC.eval0(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > In BeamCalcRel.class *field* method, null values are not handled for > composite types. > public Expression field(BlockBuilder list, int index, Type storageType) { > > } else if (fromType.getTypeName().isCompositeType() > || (fromType.getTypeName().isCollectionType() > && > fromType.getCollectionElementType().getTypeName().isCompositeType())) { > field = Expressions.call(WrappedList.class, "of", field); // > List.of() is passed with null values > } > ... > } > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7621) Selecting null row field's causes Null pointer Exception with BeamSql
[ https://issues.apache.org/jira/browse/BEAM-7621?focusedWorklogId=275476&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275476 ] ASF GitHub Bot logged work on BEAM-7621: Author: ASF GitHub Bot Created on: 11/Jul/19 19:25 Start Date: 11/Jul/19 19:25 Worklog Time Spent: 10m Work Description: apilloud commented on issue #8930: [BEAM-7621] Null pointer exception when accessing null row fields in BeamSql URL: https://github.com/apache/beam/pull/8930#issuecomment-510621013 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275476) Time Spent: 2h 50m (was: 2h 40m) > Selecting null row field's causes Null pointer Exception with BeamSql > - > > Key: BEAM-7621 > URL: https://issues.apache.org/jira/browse/BEAM-7621 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.14.0 >Reporter: Vishwas >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > I have below schema of the row: > private static final Schema innerRowWithArraySchema = > Schema.builder() > .addStringField("string_field") > .addArrayField("array_field", FieldType.INT64) > .build(); > private static final Schema nullableNestedRowWithArraySchema = > Schema.builder() > > .addNullableField("field1",FieldType.row(innerRowWithArraySchema)) > .addNullableField("field2", > FieldType.array(FieldType.row(innerRowWithArraySchema))) > .build(); > > *// Create a row with null values* > Row nullRow = Row.nullRow(nullableNestedRowWithArraySchema); > > Now when we try to select nested row field's NPE is thrown: > .apply(SqlTransform.query("select PCOLLECTION.field1.string_field as > row_string_field, PCOLLECTION.field2[2].string_field as array_string_field > from PCOLLECTION")); > > Below is the exception thrown: > Caused by: java.lang.RuntimeException: CalcFn failed to evaluate: { > final org.apache.beam.sdk.values.Row current = > (org.apache.beam.sdk.values.Row) c.element(); > *final java.util.List inp1_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getRow(1));* > *final java.util.List inp2_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getArray(2));* > > c.output(org.apache.beam.sdk.values.Row.withSchema(outputSchema).addValue(inp1_ > == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(inp1_, > 0, > "string_field")).addValue(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2) == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2), 0, "string_field")).build()); > } > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$CalcFn.processElement(BeamCalcRel.java:253) > Caused by: java.lang.NullPointerException > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$WrappedList.of(BeamCalcRel.java:459) > at SC.eval0(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > In BeamCalcRel.class *field* method, null values are not handled for > composite types. > public Expression field(BlockBuilder list, int index, Type storageType) { > > } else if (fromType.getTypeName().isCompositeType() > || (fromType.getTypeName().isCollectionType() > && > fromType.getCollectionElementType().getTypeName().isCompositeType())) { > field = Expressions.call(WrappedList.class, "of", field); // > List.of() is passed with null values > } > ... > } > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7621) Selecting null row field's causes Null pointer Exception with BeamSql
[ https://issues.apache.org/jira/browse/BEAM-7621?focusedWorklogId=275474&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275474 ] ASF GitHub Bot logged work on BEAM-7621: Author: ASF GitHub Bot Created on: 11/Jul/19 19:23 Start Date: 11/Jul/19 19:23 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #8930: [BEAM-7621] Null pointer exception when accessing null row fields in BeamSql URL: https://github.com/apache/beam/pull/8930#discussion_r302705234 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java ## @@ -386,24 +386,44 @@ public Expression field(BlockBuilder list, int index, Type storageType) { } Expression field = Expressions.call(expression, getter, Expressions.constant(index)); if (fromType.getTypeName().isLogicalType()) { -field = Expressions.call(field, "getMillis"); +Expression millisField = Expressions.call(field, "getMillis"); String logicalId = fromType.getLogicalType().getIdentifier(); if (logicalId.equals(TimeType.IDENTIFIER)) { - field = Expressions.convert_(field, int.class); + field = + Expressions.condition( Review comment: I'm suggesting you add one to this class. ``` private static Expression nullOr(Expression field, Expression ifNotNull) { return Expressions.condition( Expressions.equal(field, Expressions.constant(null)), Expressions.constant(null), Expressions.box(ifNotNull)); } ``` Then this becomes ``` nullOr(field, Expressions.convert_(millisField, int.class)); ``` and the same pattern can be applied below. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275474) Time Spent: 2h 40m (was: 2.5h) > Selecting null row field's causes Null pointer Exception with BeamSql > - > > Key: BEAM-7621 > URL: https://issues.apache.org/jira/browse/BEAM-7621 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.14.0 >Reporter: Vishwas >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > I have below schema of the row: > private static final Schema innerRowWithArraySchema = > Schema.builder() > .addStringField("string_field") > .addArrayField("array_field", FieldType.INT64) > .build(); > private static final Schema nullableNestedRowWithArraySchema = > Schema.builder() > > .addNullableField("field1",FieldType.row(innerRowWithArraySchema)) > .addNullableField("field2", > FieldType.array(FieldType.row(innerRowWithArraySchema))) > .build(); > > *// Create a row with null values* > Row nullRow = Row.nullRow(nullableNestedRowWithArraySchema); > > Now when we try to select nested row field's NPE is thrown: > .apply(SqlTransform.query("select PCOLLECTION.field1.string_field as > row_string_field, PCOLLECTION.field2[2].string_field as array_string_field > from PCOLLECTION")); > > Below is the exception thrown: > Caused by: java.lang.RuntimeException: CalcFn failed to evaluate: { > final org.apache.beam.sdk.values.Row current = > (org.apache.beam.sdk.values.Row) c.element(); > *final java.util.List inp1_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getRow(1));* > *final java.util.List inp2_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getArray(2));* > > c.output(org.apache.beam.sdk.values.Row.withSchema(outputSchema).addValue(inp1_ > == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(inp1_, > 0, > "string_field")).addValue(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2) == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2), 0, "string_field")).build()); > } > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$CalcFn.processElement(BeamCalcRel.java:253) > Caused by: ja
[jira] [Work logged] (BEAM-7424) Retry HTTP 429 errors from GCS w/ exponential backoff when reading data
[ https://issues.apache.org/jira/browse/BEAM-7424?focusedWorklogId=275472&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275472 ] ASF GitHub Bot logged work on BEAM-7424: Author: ASF GitHub Bot Created on: 11/Jul/19 19:14 Start Date: 11/Jul/19 19:14 Worklog Time Spent: 10m Work Description: akedin commented on issue #9014: [BEAM-7424] cherry-picking for 2.14.0 URL: https://github.com/apache/beam/pull/9014#issuecomment-510617428 run python precommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275472) Time Spent: 5h 40m (was: 5.5h) > Retry HTTP 429 errors from GCS w/ exponential backoff when reading data > --- > > Key: BEAM-7424 > URL: https://issues.apache.org/jira/browse/BEAM-7424 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, io-python-gcp, sdk-py-core >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Blocker > Fix For: 2.14.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > This has to be done for both Java and Python SDKs. > Seems like Java SDK already retries 429 errors w/o backoff (please verify): > [https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java#L185] -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7386) Add Utility BiTemporalStreamJoin
[ https://issues.apache.org/jira/browse/BEAM-7386?focusedWorklogId=275469&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275469 ] ASF GitHub Bot logged work on BEAM-7386: Author: ASF GitHub Bot Created on: 11/Jul/19 19:08 Start Date: 11/Jul/19 19:08 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #9032: [BEAM-7386] Bi-Temporal Join URL: https://github.com/apache/beam/pull/9032#issuecomment-510615471 Thank Kenn! Will take a look soon. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275469) Time Spent: 1h 20m (was: 1h 10m) > Add Utility BiTemporalStreamJoin > > > Key: BEAM-7386 > URL: https://issues.apache.org/jira/browse/BEAM-7386 > Project: Beam > Issue Type: Improvement > Components: sdk-ideas >Affects Versions: 2.12.0 >Reporter: Reza ardeshir rokni >Assignee: Reza ardeshir rokni >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Add utility class that enables a temporal join between two streams where > Stream A is matched to Stream B where > A.timestamp = (max(b.timestamp) where b.timestamp <= a.timestamp) > This will use the following overall flow: > KV(key, Timestamped) > | Window > | GBK > | Statefull DoFn > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming
[ https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=275458&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275458 ] ASF GitHub Bot logged work on BEAM-6611: Author: ASF GitHub Bot Created on: 11/Jul/19 18:34 Start Date: 11/Jul/19 18:34 Worklog Time Spent: 10m Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery file loads in Streaming for Python SDK URL: https://github.com/apache/beam/pull/8871#issuecomment-510603552 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275458) Time Spent: 2h 10m (was: 2h) > A Python Sink for BigQuery with File Loads in Streaming > --- > > Key: BEAM-6611 > URL: https://issues.apache.org/jira/browse/BEAM-6611 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Tanay Tummalapalli >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 2h 10m > Remaining Estimate: 0h > > The Java SDK supports a bunch of methods for writing data into BigQuery, > while the Python SDK supports the following: > - Streaming inserts for streaming pipelines [As seen in [bigquery.py and > BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]] > - File loads for batch pipelines [As implemented in [PR > 7655|https://github.com/apache/beam/pull/7655]] > Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming > The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads > application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776]. > File loads have the advantage of being much cheaper than streaming inserts > (although they also are slower for the records to show up in the table). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7386) Add Utility BiTemporalStreamJoin
[ https://issues.apache.org/jira/browse/BEAM-7386?focusedWorklogId=275455&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275455 ] ASF GitHub Bot logged work on BEAM-7386: Author: ASF GitHub Bot Created on: 11/Jul/19 18:26 Start Date: 11/Jul/19 18:26 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #9032: [BEAM-7386] Bi-Temporal Join URL: https://github.com/apache/beam/pull/9032#issuecomment-510600426 I believe you are looking for @amaliujia This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275455) Time Spent: 1h 10m (was: 1h) > Add Utility BiTemporalStreamJoin > > > Key: BEAM-7386 > URL: https://issues.apache.org/jira/browse/BEAM-7386 > Project: Beam > Issue Type: Improvement > Components: sdk-ideas >Affects Versions: 2.12.0 >Reporter: Reza ardeshir rokni >Assignee: Reza ardeshir rokni >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Add utility class that enables a temporal join between two streams where > Stream A is matched to Stream B where > A.timestamp = (max(b.timestamp) where b.timestamp <= a.timestamp) > This will use the following overall flow: > KV(key, Timestamped) > | Window > | GBK > | Statefull DoFn > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7617) LTS Backport: urlopen calls could get stuck without a timeout
[ https://issues.apache.org/jira/browse/BEAM-7617?focusedWorklogId=275451&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275451 ] ASF GitHub Bot logged work on BEAM-7617: Author: ASF GitHub Bot Created on: 11/Jul/19 18:19 Start Date: 11/Jul/19 18:19 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #8927: [BEAM-7617] Cherry Pick #8925 to 2.7.1 release branch URL: https://github.com/apache/beam/pull/8927#issuecomment-510598027 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275451) Time Spent: 50m (was: 40m) > LTS Backport: urlopen calls could get stuck without a timeout > - > > Key: BEAM-7617 > URL: https://issues.apache.org/jira/browse/BEAM-7617 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Ahmet Altay >Priority: Blocker > Fix For: 2.7.1 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7484) Throughput collection in BigQuery performance tests
[ https://issues.apache.org/jira/browse/BEAM-7484?focusedWorklogId=275450&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275450 ] ASF GitHub Bot logged work on BEAM-7484: Author: ASF GitHub Bot Created on: 11/Jul/19 18:17 Start Date: 11/Jul/19 18:17 Worklog Time Spent: 10m Work Description: udim commented on issue #8766: [BEAM-7484] Metrics collection in BigQuery perf tests URL: https://github.com/apache/beam/pull/8766#issuecomment-510597113 Yes, sorry for the delay This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275450) Time Spent: 2h (was: 1h 50m) > Throughput collection in BigQuery performance tests > --- > > Key: BEAM-7484 > URL: https://issues.apache.org/jira/browse/BEAM-7484 > Project: Beam > Issue Type: New Feature > Components: testing >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > The goal is to collect bytes/time and messages/time metrics in BQ read and > write tests in Python SDK. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Closed] (BEAM-7723) github PR comment phrase triggering of jenkins jobs broken (ghprb plugin)
[ https://issues.apache.org/jira/browse/BEAM-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri closed BEAM-7723. --- Resolution: Resolved Fix Version/s: Not applicable > github PR comment phrase triggering of jenkins jobs broken (ghprb plugin) > - > > Key: BEAM-7723 > URL: https://issues.apache.org/jira/browse/BEAM-7723 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Fix For: Not applicable > > > Apache Infra team has removed ghprb-plugin from Jenkins. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky
[ https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=275445&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275445 ] ASF GitHub Bot logged work on BEAM-7548: Author: ASF GitHub Bot Created on: 11/Jul/19 18:09 Start Date: 11/Jul/19 18:09 Worklog Time Spent: 10m Work Description: akedin commented on issue #8959: [BEAM-7548] Cherry pick - fix flaky tests for ApproximateUnique URL: https://github.com/apache/beam/pull/8959#issuecomment-510593210 Python precommit passed: - https://builds.apache.org/job/beam_PreCommit_Python_Phrase/640/ - https://scans.gradle.com/s/4isyitp7hrkv4 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275445) Time Spent: 9h (was: 8h 50m) > test_approximate_unique_global_by_error is flaky > > > Key: BEAM-7548 > URL: https://issues.apache.org/jira/browse/BEAM-7548 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Valentyn Tymofieiev >Assignee: Hannah Jiang >Priority: Major > Time Spent: 9h > Remaining Estimate: 0h > > The error happened on Jenkins in Python 3.5 suite, which currently uses > Python 3.5.2 interpreter: > {noformat} > 11:57:47 > == > 11:57:47 ERROR: test_approximate_unique_global_by_error > (apache_beam.transforms.stats_test.ApproximateUniqueTest) > 11:57:47 > -- > 11:57:47 Traceback (most recent call last): > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py", > line 236, in test_approximate_unique_global_by_error > 11:57:47 pipeline.run() > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py", > line 107, in run > 11:57:47 else test_runner_api)) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py", > line 406, in run > 11:57:47 self._options).run(False) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py", > line 419, in run > 11:57:47 return self.runner.run_pipeline(self, self._options) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 128, in run_pipeline > 11:57:47 return runner.run_pipeline(pipeline, options) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 289, in run_pipeline > 11:57:47 default_environment=self._default_environment)) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 293, in run_via_runner_api > 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto)) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 369, in run_stages > 11:57:47 stage_context.safe_coders) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 531, in run_stage > 11:57:47 data_input, data_output) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 1235, in process_bundle > 11:57:47 result_future = > self._controller.control_handler.push(process_bundle) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs
[jira] [Created] (BEAM-7725) migrate off of ghprb
Udi Meiri created BEAM-7725: --- Summary: migrate off of ghprb Key: BEAM-7725 URL: https://issues.apache.org/jira/browse/BEAM-7725 Project: Beam Issue Type: Bug Components: testing Reporter: Udi Meiri ghprb-plugin is going away and Beam needs to migrate off of it. Some background in https://issues.apache.org/jira/browse/BEAM-7723 (see the wiki page linked there) -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (BEAM-7723) github PR comment phrase triggering of jenkins jobs broken (ghprb plugin)
[ https://issues.apache.org/jira/browse/BEAM-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883231#comment-16883231 ] Udi Meiri edited comment on BEAM-7723 at 7/11/19 6:08 PM: -- Opened https://issues.apache.org/jira/browse/BEAM-7725 for migration. was (Author: udim): Opened https://issues.apache.org/jira/browse/BEAM-7723 for migration. > github PR comment phrase triggering of jenkins jobs broken (ghprb plugin) > - > > Key: BEAM-7723 > URL: https://issues.apache.org/jira/browse/BEAM-7723 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > > Apache Infra team has removed ghprb-plugin from Jenkins. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky
[ https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=275444&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275444 ] ASF GitHub Bot logged work on BEAM-7548: Author: ASF GitHub Bot Created on: 11/Jul/19 18:07 Start Date: 11/Jul/19 18:07 Worklog Time Spent: 10m Work Description: akedin commented on pull request #8959: [BEAM-7548] Cherry pick - fix flaky tests for ApproximateUnique URL: https://github.com/apache/beam/pull/8959 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275444) Time Spent: 8h 50m (was: 8h 40m) > test_approximate_unique_global_by_error is flaky > > > Key: BEAM-7548 > URL: https://issues.apache.org/jira/browse/BEAM-7548 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Valentyn Tymofieiev >Assignee: Hannah Jiang >Priority: Major > Time Spent: 8h 50m > Remaining Estimate: 0h > > The error happened on Jenkins in Python 3.5 suite, which currently uses > Python 3.5.2 interpreter: > {noformat} > 11:57:47 > == > 11:57:47 ERROR: test_approximate_unique_global_by_error > (apache_beam.transforms.stats_test.ApproximateUniqueTest) > 11:57:47 > -- > 11:57:47 Traceback (most recent call last): > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py", > line 236, in test_approximate_unique_global_by_error > 11:57:47 pipeline.run() > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py", > line 107, in run > 11:57:47 else test_runner_api)) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py", > line 406, in run > 11:57:47 self._options).run(False) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py", > line 419, in run > 11:57:47 return self.runner.run_pipeline(self, self._options) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 128, in run_pipeline > 11:57:47 return runner.run_pipeline(pipeline, options) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 289, in run_pipeline > 11:57:47 default_environment=self._default_environment)) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 293, in run_via_runner_api > 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto)) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 369, in run_stages > 11:57:47 stage_context.safe_coders) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 531, in run_stage > 11:57:47 data_input, data_output) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 1235, in process_bundle > 11:57:47 result_future = > self._controller.control_handler.push(process_bundle) > 11:57:47 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 851, in push > 11:57:47 response = self.worker.do_instruction(request) > 11
[jira] [Commented] (BEAM-7723) github PR comment phrase triggering of jenkins jobs broken (ghprb plugin)
[ https://issues.apache.org/jira/browse/BEAM-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883231#comment-16883231 ] Udi Meiri commented on BEAM-7723: - Opened https://issues.apache.org/jira/browse/BEAM-7723 for migration. > github PR comment phrase triggering of jenkins jobs broken (ghprb plugin) > - > > Key: BEAM-7723 > URL: https://issues.apache.org/jira/browse/BEAM-7723 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > > Apache Infra team has removed ghprb-plugin from Jenkins. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7725) Jenkins config: migrate off of ghprb-plugin
[ https://issues.apache.org/jira/browse/BEAM-7725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-7725: Summary: Jenkins config: migrate off of ghprb-plugin (was: migrate off of ghprb) > Jenkins config: migrate off of ghprb-plugin > --- > > Key: BEAM-7725 > URL: https://issues.apache.org/jira/browse/BEAM-7725 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Udi Meiri >Priority: Blocker > > ghprb-plugin is going away and Beam needs to migrate off of it. > Some background in https://issues.apache.org/jira/browse/BEAM-7723 (see the > wiki page linked there) -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (BEAM-7723) github PR comment phrase triggering of jenkins jobs broken (ghprb plugin)
[ https://issues.apache.org/jira/browse/BEAM-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri reassigned BEAM-7723: --- Assignee: Udi Meiri > github PR comment phrase triggering of jenkins jobs broken (ghprb plugin) > - > > Key: BEAM-7723 > URL: https://issues.apache.org/jira/browse/BEAM-7723 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > > Apache Infra team has removed ghprb-plugin from Jenkins. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (BEAM-7723) github PR comment phrase triggering of jenkins jobs broken (ghprb plugin)
[ https://issues.apache.org/jira/browse/BEAM-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883218#comment-16883218 ] Udi Meiri edited comment on BEAM-7723 at 7/11/19 5:49 PM: -- Chatted on Slack, they did something to fix it. https://the-asf.slack.com/archives/CBX4TSBQ8/p1562867180305200 Also they noted: {code} cool if you have any more issues, change the AdminList from 'asfbot' to 'asf-ci' {code} was (Author: udim): Chatted on Slack, they did something to fix it. https://the-asf.slack.com/archives/CBX4TSBQ8/p1562867180305200 > github PR comment phrase triggering of jenkins jobs broken (ghprb plugin) > - > > Key: BEAM-7723 > URL: https://issues.apache.org/jira/browse/BEAM-7723 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Udi Meiri >Priority: Major > > Apache Infra team has removed ghprb-plugin from Jenkins. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7723) github PR comment phrase triggering of jenkins jobs broken (ghprb plugin)
[ https://issues.apache.org/jira/browse/BEAM-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883218#comment-16883218 ] Udi Meiri commented on BEAM-7723: - Chatted on Slack, they did something to fix it. https://the-asf.slack.com/archives/CBX4TSBQ8/p1562867180305200 > github PR comment phrase triggering of jenkins jobs broken (ghprb plugin) > - > > Key: BEAM-7723 > URL: https://issues.apache.org/jira/browse/BEAM-7723 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Udi Meiri >Priority: Major > > Apache Infra team has removed ghprb-plugin from Jenkins. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7689) Temporary directory for WriteOperation may not be unique in FileBaseSink
[ https://issues.apache.org/jira/browse/BEAM-7689?focusedWorklogId=275434&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275434 ] ASF GitHub Bot logged work on BEAM-7689: Author: ASF GitHub Bot Created on: 11/Jul/19 17:40 Start Date: 11/Jul/19 17:40 Worklog Time Spent: 10m Work Description: akedin commented on issue #9039: [BEAM-7689] cherry-picking for 2.14.0 URL: https://github.com/apache/beam/pull/9039#issuecomment-510583231 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275434) Time Spent: 2h 20m (was: 2h 10m) > Temporary directory for WriteOperation may not be unique in FileBaseSink > > > Key: BEAM-7689 > URL: https://issues.apache.org/jira/browse/BEAM-7689 > Project: Beam > Issue Type: Bug > Components: io-java-files >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Blocker > Fix For: 2.7.1, 2.14.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Temporary directory for WriteOperation in FileBasedSink is generated from a > second-granularity timestamp (-MM-dd_HH-mm-ss) and unique increasing > index. Such granularity is not enough to make temporary directories unique > between different jobs. When two jobs share the same temporary directory, > output file may not be produced in one job since the required temporary file > can be deleted from another job. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout
[ https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=275430&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275430 ] ASF GitHub Bot logged work on BEAM-7616: Author: ASF GitHub Bot Created on: 11/Jul/19 17:28 Start Date: 11/Jul/19 17:28 Worklog Time Spent: 10m Work Description: akedin commented on pull request #8926: [BEAM-7616] Cherry Pick #8925 to release branch URL: https://github.com/apache/beam/pull/8926 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275430) Time Spent: 2h (was: 1h 50m) > urlopen calls could get stuck without a timeout > --- > > Key: BEAM-7616 > URL: https://issues.apache.org/jira/browse/BEAM-7616 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Ahmet Altay >Priority: Blocker > Fix For: 2.14.0 > > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout
[ https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=275429&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275429 ] ASF GitHub Bot logged work on BEAM-7616: Author: ASF GitHub Bot Created on: 11/Jul/19 17:28 Start Date: 11/Jul/19 17:28 Worklog Time Spent: 10m Work Description: akedin commented on issue #8926: [BEAM-7616] Cherry Pick #8925 to release branch URL: https://github.com/apache/beam/pull/8926#issuecomment-510579110 Python precommit succeeded: - https://builds.apache.org/job/beam_PreCommit_Python_Commit/7391/ - https://scans.gradle.com/s/7pdbjhd6y4yd2 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275429) Time Spent: 1h 50m (was: 1h 40m) > urlopen calls could get stuck without a timeout > --- > > Key: BEAM-7616 > URL: https://issues.apache.org/jira/browse/BEAM-7616 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Ahmet Altay >Priority: Blocker > Fix For: 2.14.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7724) Codegen should cast(null) to a type to match exact function signature
Rui Wang created BEAM-7724: -- Summary: Codegen should cast(null) to a type to match exact function signature Key: BEAM-7724 URL: https://issues.apache.org/jira/browse/BEAM-7724 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Rui Wang If there are two function signatures for the same function name, when input parameter is null, Janino will throw exception due to vagueness: A(String) A(Integer) Janino does not know how to match A(null). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7723) github PR comment phrase triggering of jenkins jobs broken (ghprb plugin)
[ https://issues.apache.org/jira/browse/BEAM-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883183#comment-16883183 ] Udi Meiri commented on BEAM-7723: - Looks like we rely on several environment variables from GHPRB: ghprbPullId ghprbPullAuthorLogin ghprbSourceBranch Wiki documentation that may be useful: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=89065396 > github PR comment phrase triggering of jenkins jobs broken (ghprb plugin) > - > > Key: BEAM-7723 > URL: https://issues.apache.org/jira/browse/BEAM-7723 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Udi Meiri >Priority: Major > > Apache Infra team has removed ghprb-plugin from Jenkins. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7723) github PR comment phrase triggering of jenkins jobs broken (ghprb plugin)
[ https://issues.apache.org/jira/browse/BEAM-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-7723: Summary: github PR comment phrase triggering of jenkins jobs broken (ghprb plugin) (was: github PR comment phrase triggering of jenkins jobs broken) > github PR comment phrase triggering of jenkins jobs broken (ghprb plugin) > - > > Key: BEAM-7723 > URL: https://issues.apache.org/jira/browse/BEAM-7723 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Udi Meiri >Priority: Major > > Apache Infra team has removed ghprb-plugin from Jenkins. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7685) Join Reordering
[ https://issues.apache.org/jira/browse/BEAM-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-7685: --- Issue Type: New Feature (was: Bug) > Join Reordering > --- > > Key: BEAM-7685 > URL: https://issues.apache.org/jira/browse/BEAM-7685 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Major > > Query Parser does not reorder joins based on their costs. The fix is simple > we need to include the rules that are related to reordering joins such as > JoinCommuteRule. However, reordering joins may produce plans that has Cross > Join or other not supported types of join. We should either rewrite those > rules to consider that or return infinite cost for those types of joins so > that they will not be selected -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7723) github PR comment phrase triggering of jenkins jobs broken
Udi Meiri created BEAM-7723: --- Summary: github PR comment phrase triggering of jenkins jobs broken Key: BEAM-7723 URL: https://issues.apache.org/jira/browse/BEAM-7723 Project: Beam Issue Type: Bug Components: testing Reporter: Udi Meiri Apache Infra team has removed ghprb-plugin from Jenkins. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7666) Pipe memory thrashing signal to Dataflow
[ https://issues.apache.org/jira/browse/BEAM-7666?focusedWorklogId=275406&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275406 ] ASF GitHub Bot logged work on BEAM-7666: Author: ASF GitHub Bot Created on: 11/Jul/19 16:48 Start Date: 11/Jul/19 16:48 Worklog Time Spent: 10m Work Description: dustin12 commented on issue #8971: [BEAM-7666] Throttle piping URL: https://github.com/apache/beam/pull/8971#issuecomment-510565665 @pabloem ping now that everyone is back from the long weekend :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275406) Time Spent: 10m Remaining Estimate: 0h > Pipe memory thrashing signal to Dataflow > > > Key: BEAM-7666 > URL: https://issues.apache.org/jira/browse/BEAM-7666 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Dustin Rhodes >Assignee: Dustin Rhodes >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > For autoscaling we would like to know if the user worker is spending too much > time garbage collecting. Pipe this signal through counters to DF. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (BEAM-7198) Rename ToStringCoder into ToBytesCoder
[ https://issues.apache.org/jira/browse/BEAM-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-7198: - Assignee: Francesco Perera > Rename ToStringCoder into ToBytesCoder > -- > > Key: BEAM-7198 > URL: https://issues.apache.org/jira/browse/BEAM-7198 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Francesco Perera >Priority: Minor > > The name of ToStringCoder class [1] is confusing, since the output of > encode() on Python3 will be bytes. On Python 2 the output is also bytes, > since bytes and string are synonyms on Py2. > ToBytesCoder would be a better name for this class. > Note that this class is not listed in coders that constitute Public APIs [2], > so we can treat this as internal change. As a courtesy to users who happened > to reference a non-public coder in their pipelines we can keep the old class > name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but > clean up Beam codeabase to use the new name. > [1] > https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344 > [2] > https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20 > cc: [~yoshiki.obata] [~chamikara] -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7621) Selecting null row field's causes Null pointer Exception with BeamSql
[ https://issues.apache.org/jira/browse/BEAM-7621?focusedWorklogId=275391&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275391 ] ASF GitHub Bot logged work on BEAM-7621: Author: ASF GitHub Bot Created on: 11/Jul/19 16:11 Start Date: 11/Jul/19 16:11 Worklog Time Spent: 10m Work Description: bmv126 commented on pull request #8930: [BEAM-7621] Null pointer exception when accessing null row fields in BeamSql URL: https://github.com/apache/beam/pull/8930#discussion_r302628537 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java ## @@ -386,24 +386,44 @@ public Expression field(BlockBuilder list, int index, Type storageType) { } Expression field = Expressions.call(expression, getter, Expressions.constant(index)); if (fromType.getTypeName().isLogicalType()) { -field = Expressions.call(field, "getMillis"); +Expression millisField = Expressions.call(field, "getMillis"); String logicalId = fromType.getLogicalType().getIdentifier(); if (logicalId.equals(TimeType.IDENTIFIER)) { - field = Expressions.convert_(field, int.class); + field = + Expressions.condition( Review comment: Are you suggesting the helper function can be added in calcite git ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275391) Time Spent: 2.5h (was: 2h 20m) > Selecting null row field's causes Null pointer Exception with BeamSql > - > > Key: BEAM-7621 > URL: https://issues.apache.org/jira/browse/BEAM-7621 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.14.0 >Reporter: Vishwas >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > I have below schema of the row: > private static final Schema innerRowWithArraySchema = > Schema.builder() > .addStringField("string_field") > .addArrayField("array_field", FieldType.INT64) > .build(); > private static final Schema nullableNestedRowWithArraySchema = > Schema.builder() > > .addNullableField("field1",FieldType.row(innerRowWithArraySchema)) > .addNullableField("field2", > FieldType.array(FieldType.row(innerRowWithArraySchema))) > .build(); > > *// Create a row with null values* > Row nullRow = Row.nullRow(nullableNestedRowWithArraySchema); > > Now when we try to select nested row field's NPE is thrown: > .apply(SqlTransform.query("select PCOLLECTION.field1.string_field as > row_string_field, PCOLLECTION.field2[2].string_field as array_string_field > from PCOLLECTION")); > > Below is the exception thrown: > Caused by: java.lang.RuntimeException: CalcFn failed to evaluate: { > final org.apache.beam.sdk.values.Row current = > (org.apache.beam.sdk.values.Row) c.element(); > *final java.util.List inp1_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getRow(1));* > *final java.util.List inp2_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getArray(2));* > > c.output(org.apache.beam.sdk.values.Row.withSchema(outputSchema).addValue(inp1_ > == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(inp1_, > 0, > "string_field")).addValue(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2) == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2), 0, "string_field")).build()); > } > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$CalcFn.processElement(BeamCalcRel.java:253) > Caused by: java.lang.NullPointerException > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$WrappedList.of(BeamCalcRel.java:459) > at SC.eval0(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > In BeamCalcRel.class *field* method, null values are not handled for > composite types. > public Expression field(BlockBuilder list, int index, Type storageType
[jira] [Work logged] (BEAM-7621) Selecting null row field's causes Null pointer Exception with BeamSql
[ https://issues.apache.org/jira/browse/BEAM-7621?focusedWorklogId=275386&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275386 ] ASF GitHub Bot logged work on BEAM-7621: Author: ASF GitHub Bot Created on: 11/Jul/19 16:07 Start Date: 11/Jul/19 16:07 Worklog Time Spent: 10m Work Description: bmv126 commented on pull request #8930: [BEAM-7621] Null pointer exception when accessing null row fields in BeamSql URL: https://github.com/apache/beam/pull/8930#discussion_r302626701 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java ## @@ -386,24 +386,44 @@ public Expression field(BlockBuilder list, int index, Type storageType) { } Expression field = Expressions.call(expression, getter, Expressions.constant(index)); if (fromType.getTypeName().isLogicalType()) { -field = Expressions.call(field, "getMillis"); +Expression millisField = Expressions.call(field, "getMillis"); String logicalId = fromType.getLogicalType().getIdentifier(); if (logicalId.equals(TimeType.IDENTIFIER)) { - field = Expressions.convert_(field, int.class); + field = + Expressions.condition( + Expressions.equal(field, Expressions.constant(null)), + Expressions.constant(null), + Expressions.call( Review comment: Without adding the autoboxing part, the expression was getting generated as below: current.getDateTime(1) == null ? (Long) null : current.getDateTime(1).getMillis() If "current.getDateTime(1) == null" evaluates to true, above expression was causing Null pointer exception as "null" is getting autounboxed to long. So I had to add the autoboxing part. Now I have changed it to Expressions.box as per your suggestion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275386) Time Spent: 2h 10m (was: 2h) > Selecting null row field's causes Null pointer Exception with BeamSql > - > > Key: BEAM-7621 > URL: https://issues.apache.org/jira/browse/BEAM-7621 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.14.0 >Reporter: Vishwas >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > I have below schema of the row: > private static final Schema innerRowWithArraySchema = > Schema.builder() > .addStringField("string_field") > .addArrayField("array_field", FieldType.INT64) > .build(); > private static final Schema nullableNestedRowWithArraySchema = > Schema.builder() > > .addNullableField("field1",FieldType.row(innerRowWithArraySchema)) > .addNullableField("field2", > FieldType.array(FieldType.row(innerRowWithArraySchema))) > .build(); > > *// Create a row with null values* > Row nullRow = Row.nullRow(nullableNestedRowWithArraySchema); > > Now when we try to select nested row field's NPE is thrown: > .apply(SqlTransform.query("select PCOLLECTION.field1.string_field as > row_string_field, PCOLLECTION.field2[2].string_field as array_string_field > from PCOLLECTION")); > > Below is the exception thrown: > Caused by: java.lang.RuntimeException: CalcFn failed to evaluate: { > final org.apache.beam.sdk.values.Row current = > (org.apache.beam.sdk.values.Row) c.element(); > *final java.util.List inp1_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getRow(1));* > *final java.util.List inp2_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getArray(2));* > > c.output(org.apache.beam.sdk.values.Row.withSchema(outputSchema).addValue(inp1_ > == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(inp1_, > 0, > "string_field")).addValue(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2) == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayItemOptional(inp2_, > 2), 0, "string_field")).build()); > } >
[jira] [Work logged] (BEAM-7621) Selecting null row field's causes Null pointer Exception with BeamSql
[ https://issues.apache.org/jira/browse/BEAM-7621?focusedWorklogId=275387&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275387 ] ASF GitHub Bot logged work on BEAM-7621: Author: ASF GitHub Bot Created on: 11/Jul/19 16:07 Start Date: 11/Jul/19 16:07 Worklog Time Spent: 10m Work Description: bmv126 commented on pull request #8930: [BEAM-7621] Null pointer exception when accessing null row fields in BeamSql URL: https://github.com/apache/beam/pull/8930#discussion_r302626780 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java ## @@ -386,24 +386,44 @@ public Expression field(BlockBuilder list, int index, Type storageType) { } Expression field = Expressions.call(expression, getter, Expressions.constant(index)); if (fromType.getTypeName().isLogicalType()) { -field = Expressions.call(field, "getMillis"); +Expression millisField = Expressions.call(field, "getMillis"); String logicalId = fromType.getLogicalType().getIdentifier(); if (logicalId.equals(TimeType.IDENTIFIER)) { - field = Expressions.convert_(field, int.class); + field = + Expressions.condition( + Expressions.equal(field, Expressions.constant(null)), + Expressions.constant(null), + Expressions.call( + Integer.class, "valueOf", Expressions.convert_(millisField, int.class))); } else if (logicalId.equals(DateType.IDENTIFIER)) { field = - Expressions.convert_( - Expressions.modulo(field, Expressions.constant(MILLIS_PER_DAY)), int.class); + Expressions.condition( + Expressions.equal(field, Expressions.constant(null)), + Expressions.constant(null), + Expressions.call( + Integer.class, + "valueOf", + Expressions.convert_( + Expressions.modulo(millisField, Expressions.constant(MILLIS_PER_DAY)), Review comment: I have fixed it now, by making it a division operation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275387) Time Spent: 2h 20m (was: 2h 10m) > Selecting null row field's causes Null pointer Exception with BeamSql > - > > Key: BEAM-7621 > URL: https://issues.apache.org/jira/browse/BEAM-7621 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.14.0 >Reporter: Vishwas >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > I have below schema of the row: > private static final Schema innerRowWithArraySchema = > Schema.builder() > .addStringField("string_field") > .addArrayField("array_field", FieldType.INT64) > .build(); > private static final Schema nullableNestedRowWithArraySchema = > Schema.builder() > > .addNullableField("field1",FieldType.row(innerRowWithArraySchema)) > .addNullableField("field2", > FieldType.array(FieldType.row(innerRowWithArraySchema))) > .build(); > > *// Create a row with null values* > Row nullRow = Row.nullRow(nullableNestedRowWithArraySchema); > > Now when we try to select nested row field's NPE is thrown: > .apply(SqlTransform.query("select PCOLLECTION.field1.string_field as > row_string_field, PCOLLECTION.field2[2].string_field as array_string_field > from PCOLLECTION")); > > Below is the exception thrown: > Caused by: java.lang.RuntimeException: CalcFn failed to evaluate: { > final org.apache.beam.sdk.values.Row current = > (org.apache.beam.sdk.values.Row) c.element(); > *final java.util.List inp1_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getRow(1));* > *final java.util.List inp2_ = > org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel.WrappedList.of(current.getArray(2));* > > c.output(org.apache.beam.sdk.values.Row.withSchema(outputSchema).addValue(inp1_ > == null ? (String) null : (String) > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.structAccess(inp1_, > 0, > "string_field")).addValue(org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.runtime.SqlFunctions.arrayI
[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout
[ https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=275381&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275381 ] ASF GitHub Bot logged work on BEAM-7616: Author: ASF GitHub Bot Created on: 11/Jul/19 15:55 Start Date: 11/Jul/19 15:55 Worklog Time Spent: 10m Work Description: akedin commented on issue #8926: [BEAM-7616] Cherry Pick #8925 to release branch URL: https://github.com/apache/beam/pull/8926#issuecomment-510545445 running the precommit manually: https://builds.apache.org/job/beam_PreCommit_Python_Commit/7389/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275381) Time Spent: 1h 40m (was: 1.5h) > urlopen calls could get stuck without a timeout > --- > > Key: BEAM-7616 > URL: https://issues.apache.org/jira/browse/BEAM-7616 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Ahmet Altay >Priority: Blocker > Fix For: 2.14.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7603) Support for ValueProvider-given GCS Location for WriteToBigQuery w File Loads
[ https://issues.apache.org/jira/browse/BEAM-7603?focusedWorklogId=275379&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275379 ] ASF GitHub Bot logged work on BEAM-7603: Author: ASF GitHub Bot Created on: 11/Jul/19 15:49 Start Date: 11/Jul/19 15:49 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8908: [BEAM-7603] Support for ValueProvider-given GCS Location for WriteToBigQuery w File Loads URL: https://github.com/apache/beam/pull/8908#issuecomment-510542915 Thank you Anton! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275379) Time Spent: 4h 50m (was: 4h 40m) > Support for ValueProvider-given GCS Location for WriteToBigQuery w File Loads > - > > Key: BEAM-7603 > URL: https://issues.apache.org/jira/browse/BEAM-7603 > Project: Beam > Issue Type: Improvement > Components: io-python-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Blocker > Fix For: 2.14.0 > > Time Spent: 4h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7603) Support for ValueProvider-given GCS Location for WriteToBigQuery w File Loads
[ https://issues.apache.org/jira/browse/BEAM-7603?focusedWorklogId=275378&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275378 ] ASF GitHub Bot logged work on BEAM-7603: Author: ASF GitHub Bot Created on: 11/Jul/19 15:47 Start Date: 11/Jul/19 15:47 Worklog Time Spent: 10m Work Description: akedin commented on pull request #8908: [BEAM-7603] Support for ValueProvider-given GCS Location for WriteToBigQuery w File Loads URL: https://github.com/apache/beam/pull/8908 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275378) Time Spent: 4h 40m (was: 4.5h) > Support for ValueProvider-given GCS Location for WriteToBigQuery w File Loads > - > > Key: BEAM-7603 > URL: https://issues.apache.org/jira/browse/BEAM-7603 > Project: Beam > Issue Type: Improvement > Components: io-python-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Blocker > Fix For: 2.14.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7603) Support for ValueProvider-given GCS Location for WriteToBigQuery w File Loads
[ https://issues.apache.org/jira/browse/BEAM-7603?focusedWorklogId=275376&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275376 ] ASF GitHub Bot logged work on BEAM-7603: Author: ASF GitHub Bot Created on: 11/Jul/19 15:45 Start Date: 11/Jul/19 15:45 Worklog Time Spent: 10m Work Description: akedin commented on issue #8908: [BEAM-7603] Support for ValueProvider-given GCS Location for WriteToBigQuery w File Loads URL: https://github.com/apache/beam/pull/8908#issuecomment-510541222 Re-ran the python precommit manually, all seems green: https://scans.gradle.com/s/3s7bzzralmq6w , merging This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 275376) Time Spent: 4.5h (was: 4h 20m) > Support for ValueProvider-given GCS Location for WriteToBigQuery w File Loads > - > > Key: BEAM-7603 > URL: https://issues.apache.org/jira/browse/BEAM-7603 > Project: Beam > Issue Type: Improvement > Components: io-python-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Blocker > Fix For: 2.14.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7722) Simplify running of Beam Python on Flink
[ https://issues.apache.org/jira/browse/BEAM-7722?focusedWorklogId=275370&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275370 ] ASF GitHub Bot logged work on BEAM-7722: Author: ASF GitHub Bot Created on: 11/Jul/19 15:22 Start Date: 11/Jul/19 15:22 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9043: [BEAM-7722] Add a Python FlinkRunner that fetches and uses released artifacts. URL: https://github.com/apache/beam/pull/9043 Also refactored the job_server module to be more re-usable and cleaned up the logic in PortableRunner. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](http
[jira] [Created] (BEAM-7722) Simplify running of Beam Python on Flink
Robert Bradshaw created BEAM-7722: - Summary: Simplify running of Beam Python on Flink Key: BEAM-7722 URL: https://issues.apache.org/jira/browse/BEAM-7722 Project: Beam Issue Type: Test Components: sdk-py-core Reporter: Robert Bradshaw Currently this requires building and running several processes. We should be able to automate most of this away. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7718) PubsubIO to use gRPC API instead of JSON REST API
[ https://issues.apache.org/jira/browse/BEAM-7718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883084#comment-16883084 ] Steve Niemitz commented on BEAM-7718: - Additionally, the gRPC implementation seems efficient enough that it will very quickly hit the pubsub subscriber quota with any non-trivial number of workers. I'm not really sure why, using the windmill-native version I can easily read from this subscription at a reasonable rate with plenty of quota left. I wasn't able to do the same with the JSON version, but I think that's just because it isn't efficient enough to be able to get enough throughput on a worker. > PubsubIO to use gRPC API instead of JSON REST API > - > > Key: BEAM-7718 > URL: https://issues.apache.org/jira/browse/BEAM-7718 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Steve Niemitz >Priority: Major > > {quote}The default implementation uses the HTTP REST API, which seems to be > much less performant than the gRPC implementation. Is there a reason that > the gRPC implementation is essentially unavailable from the public API? > PubsubIO.Read.withClientFactory is package private. I worked around this by > making it public and rebuilding{quote} -- This message was sent by Atlassian JIRA (v7.6.14#76016)