[jira] [Updated] (BEAM-13857) Add expansion service startup to Go integration test flags.
[ https://issues.apache.org/jira/browse/BEAM-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-13857: --- Fix Version/s: Not applicable Resolution: Fixed Status: Resolved (was: Open) > Add expansion service startup to Go integration test flags. > --- > > Key: BEAM-13857 > URL: https://issues.apache.org/jira/browse/BEAM-13857 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Ritesh Ghorse >Assignee: Daniel Oliveira >Priority: P2 > Fix For: Not applicable > > Time Spent: 5h 50m > Remaining Estimate: 0h > > Currently a separate debezium io expansion address flag needs to be passed to > the runner when running cross-language debezium IO pipelines from Go SDK. > Find a way to do this in a better way so that we could have it started along > with java io expansion service while spinning up the test without bulking > :sdks:java:io:expansion-service. > In particular, needing to add a flag per expansion service jar to our > integration tests will eventually become quite cluttered, so we may wish to > settle on some kind of KV map flag approach instead to reduce copypasta code > overhead. > Edit: Decided on going with the KV map flag approach within the Go SDK > instead of in a bash script, and moving expansion service startup into the > codebase as well. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-14296) PostCommit Java VR Dataflow V2 Streaming failing (:release:go-licenses:java:dockerRun)
[ https://issues.apache.org/jira/browse/BEAM-14296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-14296: --- Fix Version/s: 2.38.0 Resolution: Fixed Status: Resolved (was: Open) > PostCommit Java VR Dataflow V2 Streaming failing > (:release:go-licenses:java:dockerRun) > -- > > Key: BEAM-14296 > URL: https://issues.apache.org/jira/browse/BEAM-14296 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Emily Ye >Assignee: Daniel Oliveira >Priority: P2 > Labels: currently-failing > Fix For: 2.38.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Links: > * First failure (no changes that seem to have trigger this) : > [https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2_Streaming/1987/] > Segfault in :release:go-licenses:java:dockerRun, seems to be related to > github.com/spf13/cobra v1.3.0 (see go.mod) and > [https://github.com/google/go-licenses/issues/125|https://github.com/google/go-licenses/issues/125.] > Will attempt to fix by pining version of go-licenses > > (Add any investigation notes so far) > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14282) Fix exceptions swallowed in several Python I/O connectors
[ https://issues.apache.org/jira/browse/BEAM-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521399#comment-17521399 ] Daniel Oliveira commented on BEAM-14282: This has been cherry-picked into 2.38.0, so assuming there's no further work it should be safe to resolve. > Fix exceptions swallowed in several Python I/O connectors > - > > Key: BEAM-14282 > URL: https://issues.apache.org/jira/browse/BEAM-14282 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Affects Versions: 2.32.0, 2.33.0, 2.34.0, 2.35.0, 2.36.0, 2.37.0 >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: P1 > Fix For: 2.38.0 > > Time Spent: 3h > Remaining Estimate: 0h > > Seems like we do not re-throw errors after reporting metrics at following > locations. > https://github.com/apache/beam/blob/8e217ea0d1f383ef5033ef507b14d01edf9c67e6/sdks/python/apache_beam/io/gcp/datastore/v1new/datastoreio.py#L303 > https://github.com/apache/beam/blob/70d9e2a08cc32192790cd9c98ffa15a756877a73/sdks/python/apache_beam/io/gcp/gcsio.py#L644 > Not re-raising these errors could result in data correctness issues for > downstream consumers. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-13519) Java precommit flaky (timing out)
[ https://issues.apache.org/jira/browse/BEAM-13519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-13519: --- Fix Version/s: (was: 2.38.0) > Java precommit flaky (timing out) > - > > Key: BEAM-13519 > URL: https://issues.apache.org/jira/browse/BEAM-13519 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kiley Sok >Priority: P1 > Labels: flake > Time Spent: 2h 40m > Remaining Estimate: 0h > > Java precommits are sometimes timing out with no clear cause. Gradle will log > a bunch of routine build tasks, and then Jenkins will abort the job much > later. There are no logs to indicate what happened. It is not even clear > which task or tasks, if any, was the culprit, since many tasks are run in > parallel. > 01:53:28 > Task :sdks:java:testing:nexmark:build > 01:53:28 > Task :sdks:java:testing:nexmark:buildDependents > 01:53:28 > Task :sdks:java:extensions:sql:zetasql:buildDependents > 01:53:28 > Task :sdks:java:io:google-cloud-platform:buildDependents > 01:53:28 > Task :sdks:java:extensions:sql:buildDependents > 01:53:28 > Task :sdks:java:io:kafka:buildDependents > 01:53:28 > Task :sdks:java:extensions:join-library:buildDependents > 01:53:28 > Task :sdks:java:io:synthetic:buildDependents > 01:53:28 > Task :sdks:java:io:mongodb:buildDependents > 01:53:28 > Task :sdks:java:io:thrift:buildDependents > 01:53:28 > Task :sdks:java:testing:test-utils:buildDependents > 01:53:28 > Task :sdks:java:expansion-service:buildDependents > 01:53:28 > Task :sdks:java:extensions:arrow:buildDependents > 01:53:28 > Task :sdks:java:extensions:protobuf:buildDependents > 01:53:28 > Task :sdks:java:io:common:buildDependents > 01:53:28 > Task :runners:direct-java:buildDependents > 01:53:28 > Task :runners:local-java:buildDependents > 01:53:28 Build timed out (after 120 minutes). Marking the build as aborted. > https://ci-beam.apache.org/job/beam_PreCommit_Java_cron/4874/ -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-13519) Java precommit flaky (timing out)
[ https://issues.apache.org/jira/browse/BEAM-13519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-13519: --- Fix Version/s: 2.38.0 Resolution: Fixed Status: Resolved (was: Open) > Java precommit flaky (timing out) > - > > Key: BEAM-13519 > URL: https://issues.apache.org/jira/browse/BEAM-13519 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kiley Sok >Priority: P1 > Labels: flake > Fix For: 2.38.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Java precommits are sometimes timing out with no clear cause. Gradle will log > a bunch of routine build tasks, and then Jenkins will abort the job much > later. There are no logs to indicate what happened. It is not even clear > which task or tasks, if any, was the culprit, since many tasks are run in > parallel. > 01:53:28 > Task :sdks:java:testing:nexmark:build > 01:53:28 > Task :sdks:java:testing:nexmark:buildDependents > 01:53:28 > Task :sdks:java:extensions:sql:zetasql:buildDependents > 01:53:28 > Task :sdks:java:io:google-cloud-platform:buildDependents > 01:53:28 > Task :sdks:java:extensions:sql:buildDependents > 01:53:28 > Task :sdks:java:io:kafka:buildDependents > 01:53:28 > Task :sdks:java:extensions:join-library:buildDependents > 01:53:28 > Task :sdks:java:io:synthetic:buildDependents > 01:53:28 > Task :sdks:java:io:mongodb:buildDependents > 01:53:28 > Task :sdks:java:io:thrift:buildDependents > 01:53:28 > Task :sdks:java:testing:test-utils:buildDependents > 01:53:28 > Task :sdks:java:expansion-service:buildDependents > 01:53:28 > Task :sdks:java:extensions:arrow:buildDependents > 01:53:28 > Task :sdks:java:extensions:protobuf:buildDependents > 01:53:28 > Task :sdks:java:io:common:buildDependents > 01:53:28 > Task :runners:direct-java:buildDependents > 01:53:28 > Task :runners:local-java:buildDependents > 01:53:28 Build timed out (after 120 minutes). Marking the build as aborted. > https://ci-beam.apache.org/job/beam_PreCommit_Java_cron/4874/ -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-14252) beam_PostCommit_Java_DataflowV1 failing with a variety of flakes and errors
[ https://issues.apache.org/jira/browse/BEAM-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-14252: --- Fix Version/s: (was: 2.38.0) > beam_PostCommit_Java_DataflowV1 failing with a variety of flakes and errors > --- > > Key: BEAM-14252 > URL: https://issues.apache.org/jira/browse/BEAM-14252 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, test-failures >Reporter: Daniel Oliveira >Assignee: Emily Ye >Priority: P1 > > Test Suite: https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1 > This is a catch-all bug for the various failures affecting this test. It > seems to have gone under the radar for a while, so it's likely that multiple > different failures have built up over time. Individual failures should be > linked as sub-tasks. > Looking at the [build > trend|https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/buildTimeTrend], > this seems to have started around > https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1386/, on > March 18, but even then it started with only 2 failures. Meanwhile recent > builds are around 35-45 failures, and it varies implying some of the failures > are flakes. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14253) pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1
[ https://issues.apache.org/jira/browse/BEAM-14253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517738#comment-17517738 ] Daniel Oliveira commented on BEAM-14253: I added an error I saw in the Dataflow logs. > pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 > - > > Key: BEAM-14253 > URL: https://issues.apache.org/jira/browse/BEAM-14253 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp, test-failures >Reporter: Daniel Oliveira >Assignee: Daniel Collins >Priority: P1 > > Example: > https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1455/testReport/junit/org.apache.beam.sdk.io.gcp.pubsublite/ReadWriteIT/testReadWrite/ > {noformat} > java.lang.AssertionError: Did not receive signal on > projects/apache-beam-testing/subscriptions/result-subscription--586739339276181574 > in 300s > {noformat} > Dataflow logs show this, might be related: > {noformat} > Error message from worker: java.lang.IllegalArgumentException > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:127) > > org.apache.beam.sdk.io.gcp.pubsublite.internal.SubscriptionPartitionLoader$GeneratorFn.getInitialWatermarkEstimatorState(SubscriptionPartitionLoader.java:76) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-14253) pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1
[ https://issues.apache.org/jira/browse/BEAM-14253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-14253: --- Description: Example: https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1455/testReport/junit/org.apache.beam.sdk.io.gcp.pubsublite/ReadWriteIT/testReadWrite/ {noformat} java.lang.AssertionError: Did not receive signal on projects/apache-beam-testing/subscriptions/result-subscription--586739339276181574 in 300s {noformat} Dataflow logs show this, might be related: {noformat} Error message from worker: java.lang.IllegalArgumentException org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:127) org.apache.beam.sdk.io.gcp.pubsublite.internal.SubscriptionPartitionLoader$GeneratorFn.getInitialWatermarkEstimatorState(SubscriptionPartitionLoader.java:76) {noformat} was: Example: https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1455/testReport/junit/org.apache.beam.sdk.io.gcp.pubsublite/ReadWriteIT/testReadWrite/ {noformat} java.lang.AssertionError: Did not receive signal on projects/apache-beam-testing/subscriptions/result-subscription--586739339276181574 in 300s {noformat} > pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 > - > > Key: BEAM-14253 > URL: https://issues.apache.org/jira/browse/BEAM-14253 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp, test-failures >Reporter: Daniel Oliveira >Assignee: Daniel Collins >Priority: P1 > > Example: > https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1455/testReport/junit/org.apache.beam.sdk.io.gcp.pubsublite/ReadWriteIT/testReadWrite/ > {noformat} > java.lang.AssertionError: Did not receive signal on > projects/apache-beam-testing/subscriptions/result-subscription--586739339276181574 > in 300s > {noformat} > Dataflow logs show this, might be related: > {noformat} > Error message from worker: java.lang.IllegalArgumentException > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:127) > > org.apache.beam.sdk.io.gcp.pubsublite.internal.SubscriptionPartitionLoader$GeneratorFn.getInitialWatermarkEstimatorState(SubscriptionPartitionLoader.java:76) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (BEAM-14263) beam_PostCommit_Java_DataflowV2, testBigQueryStorageWrite30MProto failing consistently
Daniel Oliveira created BEAM-14263: -- Summary: beam_PostCommit_Java_DataflowV2, testBigQueryStorageWrite30MProto failing consistently Key: BEAM-14263 URL: https://issues.apache.org/jira/browse/BEAM-14263 Project: Beam Issue Type: Bug Components: test-failures Reporter: Daniel Oliveira Assignee: Chamikara Madhusanka Jayalath testBigQueryStorageWrite30MProto seems to have been failing since being originally introduced. First failure I found: https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV2/1391/, which includes when this PR was merged: https://github.com/apache/beam/pull/17038 I don't see any explicit reason this might be failing either in the console logs or Dataflow logs. Here's the console logs: {noformat} java.lang.RuntimeException: Workflow failed. Causes: S06:WriteToBQ/StorageApiLoads/GroupIntoBatches/ParDo(GroupIntoBatches)/ParMultiDo(GroupIntoBatches)/Reshard/Read+WriteToBQ/StorageApiLoads/GroupIntoBatches/ParDo(GroupIntoBatches)/ParMultiDo(GroupIntoBatches)/Reshard/UnreifyWindow+WriteToBQ/StorageApiLoads/GroupIntoBatches/ParDo(GroupIntoBatches)/ParMultiDo(GroupIntoBatches)+WriteToBQ/StorageApiLoads/StorageApiWriteSharded/Write Records/ParMultiDo(WriteRecords)/Reshard/Reify+WriteToBQ/StorageApiLoads/StorageApiWriteSharded/Write Records/ParMultiDo(WriteRecords)/Reshard/Write failed., The job failed because a work item has failed 4 times. Look in previous log entries for the cause of each one of the 4 failures. For more information, see https://cloud.google.com/dataflow/docs/guides/common-errors. The work item was attempted on these workers: testpipeline-jenkins-0319-03190548-sfjp-harness-263n Root cause: The worker lost contact with the service., {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-13519) Java precommit flaky (timing out)
[ https://issues.apache.org/jira/browse/BEAM-13519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517703#comment-17517703 ] Daniel Oliveira commented on BEAM-13519: I see there was an actual race condition causing this and not just an error with the test, so it looks important enough to cherry-pick. Any bug able to cause consistent failures in a Precommit like this is definitely release-blocking. > Java precommit flaky (timing out) > - > > Key: BEAM-13519 > URL: https://issues.apache.org/jira/browse/BEAM-13519 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kiley Sok >Priority: P1 > Labels: flake > Fix For: 2.38.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Java precommits are sometimes timing out with no clear cause. Gradle will log > a bunch of routine build tasks, and then Jenkins will abort the job much > later. There are no logs to indicate what happened. It is not even clear > which task or tasks, if any, was the culprit, since many tasks are run in > parallel. > 01:53:28 > Task :sdks:java:testing:nexmark:build > 01:53:28 > Task :sdks:java:testing:nexmark:buildDependents > 01:53:28 > Task :sdks:java:extensions:sql:zetasql:buildDependents > 01:53:28 > Task :sdks:java:io:google-cloud-platform:buildDependents > 01:53:28 > Task :sdks:java:extensions:sql:buildDependents > 01:53:28 > Task :sdks:java:io:kafka:buildDependents > 01:53:28 > Task :sdks:java:extensions:join-library:buildDependents > 01:53:28 > Task :sdks:java:io:synthetic:buildDependents > 01:53:28 > Task :sdks:java:io:mongodb:buildDependents > 01:53:28 > Task :sdks:java:io:thrift:buildDependents > 01:53:28 > Task :sdks:java:testing:test-utils:buildDependents > 01:53:28 > Task :sdks:java:expansion-service:buildDependents > 01:53:28 > Task :sdks:java:extensions:arrow:buildDependents > 01:53:28 > Task :sdks:java:extensions:protobuf:buildDependents > 01:53:28 > Task :sdks:java:io:common:buildDependents > 01:53:28 > Task :runners:direct-java:buildDependents > 01:53:28 > Task :runners:local-java:buildDependents > 01:53:28 Build timed out (after 120 minutes). Marking the build as aborted. > https://ci-beam.apache.org/job/beam_PreCommit_Java_cron/4874/ -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-13519) Java precommit flaky (timing out)
[ https://issues.apache.org/jira/browse/BEAM-13519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-13519: --- Fix Version/s: 2.38.0 > Java precommit flaky (timing out) > - > > Key: BEAM-13519 > URL: https://issues.apache.org/jira/browse/BEAM-13519 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kiley Sok >Priority: P1 > Labels: flake > Fix For: 2.38.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Java precommits are sometimes timing out with no clear cause. Gradle will log > a bunch of routine build tasks, and then Jenkins will abort the job much > later. There are no logs to indicate what happened. It is not even clear > which task or tasks, if any, was the culprit, since many tasks are run in > parallel. > 01:53:28 > Task :sdks:java:testing:nexmark:build > 01:53:28 > Task :sdks:java:testing:nexmark:buildDependents > 01:53:28 > Task :sdks:java:extensions:sql:zetasql:buildDependents > 01:53:28 > Task :sdks:java:io:google-cloud-platform:buildDependents > 01:53:28 > Task :sdks:java:extensions:sql:buildDependents > 01:53:28 > Task :sdks:java:io:kafka:buildDependents > 01:53:28 > Task :sdks:java:extensions:join-library:buildDependents > 01:53:28 > Task :sdks:java:io:synthetic:buildDependents > 01:53:28 > Task :sdks:java:io:mongodb:buildDependents > 01:53:28 > Task :sdks:java:io:thrift:buildDependents > 01:53:28 > Task :sdks:java:testing:test-utils:buildDependents > 01:53:28 > Task :sdks:java:expansion-service:buildDependents > 01:53:28 > Task :sdks:java:extensions:arrow:buildDependents > 01:53:28 > Task :sdks:java:extensions:protobuf:buildDependents > 01:53:28 > Task :sdks:java:io:common:buildDependents > 01:53:28 > Task :runners:direct-java:buildDependents > 01:53:28 > Task :runners:local-java:buildDependents > 01:53:28 Build timed out (after 120 minutes). Marking the build as aborted. > https://ci-beam.apache.org/job/beam_PreCommit_Java_cron/4874/ -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-13950) PVR_Spark2_Streaming perma-red
[ https://issues.apache.org/jira/browse/BEAM-13950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517698#comment-17517698 ] Daniel Oliveira commented on BEAM-13950: Just noting that this is still affecting releases with 2.38.0. Not marking it as release-blocking since the reasoning above still holds. > PVR_Spark2_Streaming perma-red > -- > > Key: BEAM-13950 > URL: https://issues.apache.org/jira/browse/BEAM-13950 > Project: Beam > Issue Type: Bug > Components: runner-spark, test-failures >Affects Versions: 2.37.0, 2.38.0 >Reporter: Brian Hulette >Priority: P1 > > https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark2_Streaming has > been failing a variable number of tests for a while. > Last successful run was Dec 28, 2021 > (https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark2_Streaming/1021/), > which was approximately coincident with gradle 7 changes. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-13950) PVR_Spark2_Streaming perma-red
[ https://issues.apache.org/jira/browse/BEAM-13950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-13950: --- Affects Version/s: 2.38.0 > PVR_Spark2_Streaming perma-red > -- > > Key: BEAM-13950 > URL: https://issues.apache.org/jira/browse/BEAM-13950 > Project: Beam > Issue Type: Bug > Components: runner-spark, test-failures >Affects Versions: 2.37.0, 2.38.0 >Reporter: Brian Hulette >Priority: P1 > > https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark2_Streaming has > been failing a variable number of tests for a while. > Last successful run was Dec 28, 2021 > (https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark2_Streaming/1021/), > which was approximately coincident with gradle 7 changes. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (BEAM-14254) beam_PostCommit_Java_PVR_Flink_Streaming failing due to new AfterSynchronizedProcessingTime test
[ https://issues.apache.org/jira/browse/BEAM-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517224#comment-17517224 ] Daniel Oliveira edited comment on BEAM-14254 at 4/5/22 5:35 AM: Update: Looks like same test is causing issues with Samza: https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Samza/1249/testReport/org.apache.beam.sdk.transforms/GroupByKeyTest$BasicTests/testAfterProcessingTimeContinuationTriggerUsingState/ was (Author: danoliveira): Update: Looks like same is happening in Samza: https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Samza/1249/testReport/org.apache.beam.sdk.transforms/GroupByKeyTest$BasicTests/testAfterProcessingTimeContinuationTriggerUsingState/ > beam_PostCommit_Java_PVR_Flink_Streaming failing due to new > AfterSynchronizedProcessingTime test > > > Key: BEAM-14254 > URL: https://issues.apache.org/jira/browse/BEAM-14254 > Project: Beam > Issue Type: Bug > Components: runner-flink, test-failures >Affects Versions: 2.38.0 >Reporter: Daniel Oliveira >Assignee: Ankur Goenka >Priority: P2 > > Test: https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/ > Failure example: > https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/7644/ > This seems to be technically a flake, but highly flake (looks like ~90% of > runs are failing and this is the only error in sight). I can't pinpoint where > it started since even the oldest failure shows this, but it's likely that it > was failing since the test was introduced because the test also caused > failures in Dataflow (https://issues.apache.org/jira/browse/BEAM-13952). > Error message: > {noformat} > java.lang.RuntimeException: The Runner experienced the following error during > execution: > java.lang.RuntimeException: Error received from SDK harness for instruction > 10: org.apache.beam.sdk.util.UserCodeException: java.lang.AssertionError: > Second Triggered sum/Values/Values/Map/ParMultiDo(Anonymous).output: > Expected: iterable with items [<42>] in any order > but: no item matches: <42> in [] > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14254) beam_PostCommit_Java_PVR_Flink_Streaming failing due to new AfterSynchronizedProcessingTime test
[ https://issues.apache.org/jira/browse/BEAM-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517224#comment-17517224 ] Daniel Oliveira commented on BEAM-14254: Update: Looks like same is happening in Samza: https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Samza/1249/testReport/org.apache.beam.sdk.transforms/GroupByKeyTest$BasicTests/testAfterProcessingTimeContinuationTriggerUsingState/ > beam_PostCommit_Java_PVR_Flink_Streaming failing due to new > AfterSynchronizedProcessingTime test > > > Key: BEAM-14254 > URL: https://issues.apache.org/jira/browse/BEAM-14254 > Project: Beam > Issue Type: Bug > Components: runner-flink, test-failures >Affects Versions: 2.38.0 >Reporter: Daniel Oliveira >Assignee: Ankur Goenka >Priority: P2 > > Test: https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/ > Failure example: > https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/7644/ > This seems to be technically a flake, but highly flake (looks like ~90% of > runs are failing and this is the only error in sight). I can't pinpoint where > it started since even the oldest failure shows this, but it's likely that it > was failing since the test was introduced because the test also caused > failures in Dataflow (https://issues.apache.org/jira/browse/BEAM-13952). > Error message: > {noformat} > java.lang.RuntimeException: The Runner experienced the following error during > execution: > java.lang.RuntimeException: Error received from SDK harness for instruction > 10: org.apache.beam.sdk.util.UserCodeException: java.lang.AssertionError: > Second Triggered sum/Values/Values/Map/ParMultiDo(Anonymous).output: > Expected: iterable with items [<42>] in any order > but: no item matches: <42> in [] > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14254) beam_PostCommit_Java_PVR_Flink_Streaming failing due to new AfterSynchronizedProcessingTime test
[ https://issues.apache.org/jira/browse/BEAM-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517221#comment-17517221 ] Daniel Oliveira commented on BEAM-14254: Ankur, I'm assigning to you since you handled this similar issue: https://issues.apache.org/jira/browse/BEAM-13961, but feel free to hand it off elsewhere, I'm just not sure who's an appropriate owner. > beam_PostCommit_Java_PVR_Flink_Streaming failing due to new > AfterSynchronizedProcessingTime test > > > Key: BEAM-14254 > URL: https://issues.apache.org/jira/browse/BEAM-14254 > Project: Beam > Issue Type: Bug > Components: runner-flink, test-failures >Affects Versions: 2.38.0 >Reporter: Daniel Oliveira >Assignee: Ankur Goenka >Priority: P2 > > Test: https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/ > Failure example: > https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/7644/ > This seems to be technically a flake, but highly flake (looks like ~90% of > runs are failing and this is the only error in sight). I can't pinpoint where > it started since even the oldest failure shows this, but it's likely that it > was failing since the test was introduced because the test also caused > failures in Dataflow (https://issues.apache.org/jira/browse/BEAM-13952). > Error message: > {noformat} > java.lang.RuntimeException: The Runner experienced the following error during > execution: > java.lang.RuntimeException: Error received from SDK harness for instruction > 10: org.apache.beam.sdk.util.UserCodeException: java.lang.AssertionError: > Second Triggered sum/Values/Values/Map/ParMultiDo(Anonymous).output: > Expected: iterable with items [<42>] in any order > but: no item matches: <42> in [] > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (BEAM-14254) beam_PostCommit_Java_PVR_Flink_Streaming failing due to new AfterSynchronizedProcessingTime test
Daniel Oliveira created BEAM-14254: -- Summary: beam_PostCommit_Java_PVR_Flink_Streaming failing due to new AfterSynchronizedProcessingTime test Key: BEAM-14254 URL: https://issues.apache.org/jira/browse/BEAM-14254 Project: Beam Issue Type: Bug Components: runner-flink, test-failures Affects Versions: 2.38.0 Reporter: Daniel Oliveira Assignee: Ankur Goenka Test: https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/ Failure example: https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/7644/ This seems to be technically a flake, but highly flake (looks like ~90% of runs are failing and this is the only error in sight). I can't pinpoint where it started since even the oldest failure shows this, but it's likely that it was failing since the test was introduced because the test also caused failures in Dataflow (https://issues.apache.org/jira/browse/BEAM-13952). Error message: {noformat} java.lang.RuntimeException: The Runner experienced the following error during execution: java.lang.RuntimeException: Error received from SDK harness for instruction 10: org.apache.beam.sdk.util.UserCodeException: java.lang.AssertionError: Second Triggered sum/Values/Values/Map/ParMultiDo(Anonymous).output: Expected: iterable with items [<42>] in any order but: no item matches: <42> in [] {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-14222) Test failure org.apache.beam.sdk.io.gcp.spanner.SpannerReadIT.testReadAllRecordsInDb
[ https://issues.apache.org/jira/browse/BEAM-14222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-14222: --- Parent: BEAM-14252 Issue Type: Sub-task (was: Test) > Test failure > org.apache.beam.sdk.io.gcp.spanner.SpannerReadIT.testReadAllRecordsInDb > > > Key: BEAM-14222 > URL: https://issues.apache.org/jira/browse/BEAM-14222 > Project: Beam > Issue Type: Sub-task > Components: test-failures >Reporter: Kiley Sok >Assignee: Bingye Li >Priority: P2 > Time Spent: 1h > Remaining Estimate: 0h > > java.lang.AssertionError: Count PG rows/Flatten.PCollections.out: > Expected: <5L> > but: was <0L> > https://ci-beam.apache.org/job/beam_PostCommit_Java/8806/testReport/junit/org.apache.beam.sdk.io.gcp.spanner/SpannerReadIT/testReadAllRecordsInDb/ -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-14192) ITs run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has been compiled by a more recent version of the Java Runtime
[ https://issues.apache.org/jira/browse/BEAM-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-14192: --- Parent: BEAM-14252 Issue Type: Sub-task (was: Bug) > ITs run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has > been compiled by a more recent version of the Java Runtime > -- > > Key: BEAM-14192 > URL: https://issues.apache.org/jira/browse/BEAM-14192 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow, test-failures >Reporter: Luke Cwik >Assignee: Kiley Sok >Priority: P2 > Fix For: 2.38.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The Dataflow ITs fails with a class version mismatch. I believe the Dataflow > v1 container that is being tested was built with the wrong JDK version. > Jenkins: > https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1427/#showFailuresLink > Example Failure: > {noformat} > java.lang.RuntimeException: java.lang.RuntimeException: > org.apache.beam.sdk.util.UserCodeException: > java.lang.UnsupportedClassVersionError: org/apache/commons/logging/LogFactory > has been compiled by a more recent version of the Java Runtime (class file > version 55.0), this version of the Java Runtime only recognizes class file > versions up to 52.0 > at > org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:187) > at > org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:108) > at > org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:56) > at > org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:39) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14253) pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1
[ https://issues.apache.org/jira/browse/BEAM-14253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517174#comment-17517174 ] Daniel Oliveira commented on BEAM-14253: Brian can you check if this is the same failure as https://issues.apache.org/jira/browse/BEAM-13025? The message is the same but I assume the root cause could be different. > pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 > - > > Key: BEAM-14253 > URL: https://issues.apache.org/jira/browse/BEAM-14253 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp, test-failures >Reporter: Daniel Oliveira >Assignee: Brian Hulette >Priority: P1 > > Example: > https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1455/testReport/junit/org.apache.beam.sdk.io.gcp.pubsublite/ReadWriteIT/testReadWrite/ > {noformat} > java.lang.AssertionError: Did not receive signal on > projects/apache-beam-testing/subscriptions/result-subscription--586739339276181574 > in 300s > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (BEAM-14253) pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1
Daniel Oliveira created BEAM-14253: -- Summary: pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 Key: BEAM-14253 URL: https://issues.apache.org/jira/browse/BEAM-14253 Project: Beam Issue Type: Sub-task Components: io-java-gcp, test-failures Reporter: Daniel Oliveira Example: https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1455/testReport/junit/org.apache.beam.sdk.io.gcp.pubsublite/ReadWriteIT/testReadWrite/ {noformat} java.lang.AssertionError: Did not receive signal on projects/apache-beam-testing/subscriptions/result-subscription--586739339276181574 in 300s {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (BEAM-14253) pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1
[ https://issues.apache.org/jira/browse/BEAM-14253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira reassigned BEAM-14253: -- Assignee: Brian Hulette > pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 > - > > Key: BEAM-14253 > URL: https://issues.apache.org/jira/browse/BEAM-14253 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp, test-failures >Reporter: Daniel Oliveira >Assignee: Brian Hulette >Priority: P1 > > Example: > https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1455/testReport/junit/org.apache.beam.sdk.io.gcp.pubsublite/ReadWriteIT/testReadWrite/ > {noformat} > java.lang.AssertionError: Did not receive signal on > projects/apache-beam-testing/subscriptions/result-subscription--586739339276181574 > in 300s > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (BEAM-13025) pubsublite.ReadWriteIT flaky in beam_PostCommit_Java_DataflowV2
[ https://issues.apache.org/jira/browse/BEAM-13025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira reassigned BEAM-13025: -- Assignee: Brian Hulette > pubsublite.ReadWriteIT flaky in beam_PostCommit_Java_DataflowV2 > - > > Key: BEAM-13025 > URL: https://issues.apache.org/jira/browse/BEAM-13025 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Brian Hulette >Priority: P1 > Labels: currently-failing, flake > Time Spent: 10m > Remaining Estimate: 0h > > [https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV2/758/testReport/org.apache.beam.sdk.io.gcp.pubsublite/ReadWriteIT/testReadWrite/] > java.lang.AssertionError: Did not receive signal on > projects/apache-beam-testing/subscriptions/result-subscription--5335365384640437489 > in 300s -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14192) ITs run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has been compiled by a more recent version of the Java Runtime
[ https://issues.apache.org/jira/browse/BEAM-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517171#comment-17517171 ] Daniel Oliveira commented on BEAM-14192: I'm marking this as affecting 2.38.0, but it really depends on whether this is being caused by an SDK-side change. It looks like it's possibly entirely Dataflow-side in which case this is probably not release-blocking. > ITs run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has > been compiled by a more recent version of the Java Runtime > -- > > Key: BEAM-14192 > URL: https://issues.apache.org/jira/browse/BEAM-14192 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, test-failures >Reporter: Luke Cwik >Assignee: Kiley Sok >Priority: P2 > Fix For: 2.38.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The Dataflow ITs fails with a class version mismatch. I believe the Dataflow > v1 container that is being tested was built with the wrong JDK version. > Jenkins: > https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1427/#showFailuresLink > Example Failure: > {noformat} > java.lang.RuntimeException: java.lang.RuntimeException: > org.apache.beam.sdk.util.UserCodeException: > java.lang.UnsupportedClassVersionError: org/apache/commons/logging/LogFactory > has been compiled by a more recent version of the Java Runtime (class file > version 55.0), this version of the Java Runtime only recognizes class file > versions up to 52.0 > at > org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:187) > at > org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:108) > at > org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:56) > at > org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:39) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-14192) ITs run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has been compiled by a more recent version of the Java Runtime
[ https://issues.apache.org/jira/browse/BEAM-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-14192: --- Fix Version/s: 2.38.0 > ITs run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has > been compiled by a more recent version of the Java Runtime > -- > > Key: BEAM-14192 > URL: https://issues.apache.org/jira/browse/BEAM-14192 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, test-failures >Reporter: Luke Cwik >Assignee: Kiley Sok >Priority: P2 > Fix For: 2.38.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The Dataflow ITs fails with a class version mismatch. I believe the Dataflow > v1 container that is being tested was built with the wrong JDK version. > Jenkins: > https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1427/#showFailuresLink > Example Failure: > {noformat} > java.lang.RuntimeException: java.lang.RuntimeException: > org.apache.beam.sdk.util.UserCodeException: > java.lang.UnsupportedClassVersionError: org/apache/commons/logging/LogFactory > has been compiled by a more recent version of the Java Runtime (class file > version 55.0), this version of the Java Runtime only recognizes class file > versions up to 52.0 > at > org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:187) > at > org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:108) > at > org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:56) > at > org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:39) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (BEAM-14252) beam_PostCommit_Java_DataflowV1 failing with a variety of flakes and errors
Daniel Oliveira created BEAM-14252: -- Summary: beam_PostCommit_Java_DataflowV1 failing with a variety of flakes and errors Key: BEAM-14252 URL: https://issues.apache.org/jira/browse/BEAM-14252 Project: Beam Issue Type: Bug Components: runner-dataflow, test-failures Reporter: Daniel Oliveira Fix For: 2.38.0 Test Suite: https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1 This is a catch-all bug for the various failures affecting this test. It seems to have gone under the radar for a while, so it's likely that multiple different failures have built up over time. Individual failures should be linked as sub-tasks. Looking at the [build trend|https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/buildTimeTrend], this seems to have started around https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1386/, on March 18, but even then it started with only 2 failures. Meanwhile recent builds are around 35-45 failures, and it varies implying some of the failures are flakes. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-14179) MonitoringInfoMetricName null value guard uncovering additional issues
[ https://issues.apache.org/jira/browse/BEAM-14179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-14179: --- Resolution: Fixed Status: Resolved (was: Open) > MonitoringInfoMetricName null value guard uncovering additional issues > -- > > Key: BEAM-14179 > URL: https://issues.apache.org/jira/browse/BEAM-14179 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, sdk-java-harness >Reporter: Luke Cwik >Assignee: Daniel Oliveira >Priority: P2 > Fix For: 2.38.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Additional integration testing > (//cloud/dataflow/testing/integration/sdk:V1ReadIT_testE2EV1Read) caught that > https://github.com/apache/beam/pull/17094 causes a regression: > The test failed with: > {noformat} > Caused by: java.lang.NullPointerException: null value in entry: > DATASTORE_NAMESPACE=null > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:32) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:100) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.RegularImmutableMap.fromEntries(RegularImmutableMap.java:74) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:464) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:437) > at > org.apache.beam.runners.core.metrics.MonitoringInfoMetricName.(MonitoringInfoMetricName.java:46) > at > org.apache.beam.runners.core.metrics.MonitoringInfoMetricName.named(MonitoringInfoMetricName.java:93) > at > org.apache.beam.runners.core.metrics.ServiceCallMetric.call(ServiceCallMetric.java:82) > at > org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read$ReadFn.runQueryWithRetries(DatastoreV1.java:927) > at > org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read$ReadFn.processElement(DatastoreV1.java:965) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14179) MonitoringInfoMetricName null value guard uncovering additional issues
[ https://issues.apache.org/jira/browse/BEAM-14179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516139#comment-17516139 ] Daniel Oliveira commented on BEAM-14179: I cherry-picked in a fix, mostly because the PR that revealed this isn't the root cause; The bug would still exist even if I rolled back, it just wouldn't be revealed by our tests. Plus the cherry-picked PR is a very small fix and unlikely to cause many issues. > MonitoringInfoMetricName null value guard uncovering additional issues > -- > > Key: BEAM-14179 > URL: https://issues.apache.org/jira/browse/BEAM-14179 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, sdk-java-harness >Reporter: Luke Cwik >Assignee: Daniel Oliveira >Priority: P2 > Fix For: 2.38.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Additional integration testing > (//cloud/dataflow/testing/integration/sdk:V1ReadIT_testE2EV1Read) caught that > https://github.com/apache/beam/pull/17094 causes a regression: > The test failed with: > {noformat} > Caused by: java.lang.NullPointerException: null value in entry: > DATASTORE_NAMESPACE=null > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:32) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:100) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.RegularImmutableMap.fromEntries(RegularImmutableMap.java:74) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:464) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:437) > at > org.apache.beam.runners.core.metrics.MonitoringInfoMetricName.(MonitoringInfoMetricName.java:46) > at > org.apache.beam.runners.core.metrics.MonitoringInfoMetricName.named(MonitoringInfoMetricName.java:93) > at > org.apache.beam.runners.core.metrics.ServiceCallMetric.call(ServiceCallMetric.java:82) > at > org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read$ReadFn.runQueryWithRetries(DatastoreV1.java:927) > at > org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read$ReadFn.processElement(DatastoreV1.java:965) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14116) Fix Pub/Sub Lite IO and SDF performance issues with shuffles
[ https://issues.apache.org/jira/browse/BEAM-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516134#comment-17516134 ] Daniel Oliveira commented on BEAM-14116: Rolled back #17004 on the release branch to resolve this as a release-blocker. This should probably still be addressed on master though so I'll leave the bug open. > Fix Pub/Sub Lite IO and SDF performance issues with shuffles > > > Key: BEAM-14116 > URL: https://issues.apache.org/jira/browse/BEAM-14116 > Project: Beam > Issue Type: Task > Components: io-java-gcp, runner-dataflow >Reporter: Daniel Collins >Assignee: Daniel Collins >Priority: P2 > Fix For: 2.38.0 > > Time Spent: 7.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-14116) Fix Pub/Sub Lite IO and SDF performance issues with shuffles
[ https://issues.apache.org/jira/browse/BEAM-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-14116: --- Fix Version/s: (was: 2.38.0) > Fix Pub/Sub Lite IO and SDF performance issues with shuffles > > > Key: BEAM-14116 > URL: https://issues.apache.org/jira/browse/BEAM-14116 > Project: Beam > Issue Type: Task > Components: io-java-gcp, runner-dataflow >Reporter: Daniel Collins >Assignee: Daniel Collins >Priority: P2 > Time Spent: 7.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-14185) [SpannerIO.readChangeStreams] Drop metadata tables at the end of the job
[ https://issues.apache.org/jira/browse/BEAM-14185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-14185: --- Resolution: Fixed Status: Resolved (was: Open) > [SpannerIO.readChangeStreams] Drop metadata tables at the end of the job > > > Key: BEAM-14185 > URL: https://issues.apache.org/jira/browse/BEAM-14185 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Affects Versions: 2.37.0 >Reporter: Thiago Nunes >Assignee: Thiago Nunes >Priority: P2 > Fix For: 2.38.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > The SpannerIO.readChangeStreams Connector uses metadata tables to keep track > of its internal state during execution. At the moment, these metadata tables > linger after the execution, meaning that users will have to drop them > manually. > In this change, we would like to drop them automatically once the job > finishes. This should only occur after all partitions have been processed > successfully and marked as finished. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-14194) [SpannerIO.readChangeStream] Throw error when autoscaling algorithm is not NONE
[ https://issues.apache.org/jira/browse/BEAM-14194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-14194: --- Resolution: Fixed Status: Resolved (was: Open) > [SpannerIO.readChangeStream] Throw error when autoscaling algorithm is not > NONE > --- > > Key: BEAM-14194 > URL: https://issues.apache.org/jira/browse/BEAM-14194 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp, runner-dataflow >Affects Versions: 2.37.0 >Reporter: Thiago Nunes >Assignee: Thiago Nunes >Priority: P2 > Fix For: 2.38.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > SpannerIO.readChangeStreams does not currently support the autoscaling > feature. In order to avoid customer confusion, we decided to error out if an > algorithm different than NONE is specified. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-14181) BQ: Storage API Sink reuses closed connections
[ https://issues.apache.org/jira/browse/BEAM-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-14181: --- Resolution: Fixed Status: Resolved (was: Open) > BQ: Storage API Sink reuses closed connections > -- > > Key: BEAM-14181 > URL: https://issues.apache.org/jira/browse/BEAM-14181 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Ahmet Altay >Assignee: Reuven Lax >Priority: P1 > Fix For: 2.38.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Creating a jira so that it can be considered whether it is release blocking > or not. > Related change: https://github.com/apache/beam/pull/17187 > This causes the BigQuery sink to sometimes get full stuck and never recover, > and the pipeline grinds to a halt. Likely the regression was introduced in > the last Beam release -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-14171) CoGroupByKey loses values with large groups on Dataflow v1
[ https://issues.apache.org/jira/browse/BEAM-14171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-14171: --- Resolution: Fixed Status: Resolved (was: Triage Needed) > CoGroupByKey loses values with large groups on Dataflow v1 > -- > > Key: BEAM-14171 > URL: https://issues.apache.org/jira/browse/BEAM-14171 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-java-core >Affects Versions: 2.36.0, 2.37.0 >Reporter: Niel Markwick >Assignee: Robert Bradshaw >Priority: P1 > Fix For: 2.38.0 > > Time Spent: 2h > Remaining Estimate: 0h > > CoGroupByKey can lose elements - replacing them with null values when a group > is large (>10,000 elements). > > This only occurs in dataflow v1, not dataflow-v2 runner > Possibly related to BEAM-13541. > > https://lists.apache.org/thread/5y56kbgm3q0m1byzf7186rrkomrcfldm > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-14201) Provide a courtesy notification for users who may set a deprecated prebuild_sdk_container_base_image option.
[ https://issues.apache.org/jira/browse/BEAM-14201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-14201: --- Resolution: Fixed Status: Resolved (was: Open) > Provide a courtesy notification for users who may set a deprecated > prebuild_sdk_container_base_image option. > - > > Key: BEAM-14201 > URL: https://issues.apache.org/jira/browse/BEAM-14201 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: P2 > Fix For: 2.38.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Since --prebuild_sdk_container_base_image was removed in 2.38.0 SDK in favor > of --sdk_container_image that can be used for the same purpose, it would be > nice to provide a courtesy message who use this option to proprely switch if > the deprecated option is used in isolation. > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14201) Provide a courtesy notification for users who may set a deprecated prebuild_sdk_container_base_image option.
[ https://issues.apache.org/jira/browse/BEAM-14201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515071#comment-17515071 ] Daniel Oliveira commented on BEAM-14201: Whoops, I missed this yesterday, but the fix is all there and tested and this would definitely affect users. > Provide a courtesy notification for users who may set a deprecated > prebuild_sdk_container_base_image option. > - > > Key: BEAM-14201 > URL: https://issues.apache.org/jira/browse/BEAM-14201 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: P2 > Fix For: 2.38.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Since --prebuild_sdk_container_base_image was removed in 2.38.0 SDK in favor > of --sdk_container_image that can be used for the same purpose, it would be > nice to provide a courtesy message who use this option to proprely switch if > the deprecated option is used in isolation. > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-8218) Implement Apache PulsarIO
[ https://issues.apache.org/jira/browse/BEAM-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-8218: -- Fix Version/s: (was: 2.38.0) > Implement Apache PulsarIO > - > > Key: BEAM-8218 > URL: https://issues.apache.org/jira/browse/BEAM-8218 > Project: Beam > Issue Type: Task > Components: io-ideas >Reporter: Alex Van Boxel >Assignee: Marco Robles >Priority: P3 > Time Spent: 17h 50m > Remaining Estimate: 0h > > Apache Pulsar is starting to gain popularity. Having a native Beam PulsarIO > could be beneficial. > [https://pulsar.apache.org/|https://pulsar.apache.org/en/] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-14216) Multiple XVR Suites having similar flakes simultaneously
[ https://issues.apache.org/jira/browse/BEAM-14216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-14216: --- Description: Didn't have time to look very closely into the root cause, but in taking a look at flaky cross-language tests I noticed a pattern of different suites on different runners flaking at the same time. The specific ones that I've noticed so far are: Samza: https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/ Spark: https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/ Dataflow: https://ci-beam.apache.org/job/beam_PostCommit_XVR_PythonUsingJava_Dataflow/ Example flake (Mar 29, 12 PM): https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/993/ https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/3530/ https://ci-beam.apache.org/job/beam_PostCommit_XVR_PythonUsingJava_Dataflow/242/ Example flake 2 (Mar 30, 6 PM): https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/998/ https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/3535/ https://ci-beam.apache.org/job/beam_PostCommit_XVR_PythonUsingJava_Dataflow/247/ was: Didn't have time to look very closely into this. The test seems to be flaky and from the example failures I looked at there are potentially multiple different failures (or the same failure appearing as different error messages). Example failure: https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/994/ > Multiple XVR Suites having similar flakes simultaneously > > > Key: BEAM-14216 > URL: https://issues.apache.org/jira/browse/BEAM-14216 > Project: Beam > Issue Type: Bug > Components: cross-language, test-failures >Reporter: Daniel Oliveira >Priority: P2 > Labels: flake > > Didn't have time to look very closely into the root cause, but in taking a > look at flaky cross-language tests I noticed a pattern of different suites on > different runners flaking at the same time. The specific ones that I've > noticed so far are: > Samza: https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/ > Spark: https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/ > Dataflow: > https://ci-beam.apache.org/job/beam_PostCommit_XVR_PythonUsingJava_Dataflow/ > Example flake (Mar 29, 12 PM): > https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/993/ > https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/3530/ > https://ci-beam.apache.org/job/beam_PostCommit_XVR_PythonUsingJava_Dataflow/242/ > Example flake 2 (Mar 30, 6 PM): > https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/998/ > https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/3535/ > https://ci-beam.apache.org/job/beam_PostCommit_XVR_PythonUsingJava_Dataflow/247/ -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-14216) Multiple XVR Suites having similar flakes simultaneously
[ https://issues.apache.org/jira/browse/BEAM-14216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-14216: --- Summary: Multiple XVR Suites having similar flakes simultaneously (was: beam_PostCommit_XVR_Samza is flaky) > Multiple XVR Suites having similar flakes simultaneously > > > Key: BEAM-14216 > URL: https://issues.apache.org/jira/browse/BEAM-14216 > Project: Beam > Issue Type: Bug > Components: cross-language, test-failures >Reporter: Daniel Oliveira >Priority: P2 > Labels: flake > > Didn't have time to look very closely into this. The test seems to be flaky > and from the example failures I looked at there are potentially multiple > different failures (or the same failure appearing as different error > messages). > Example failure: https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/994/ -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (BEAM-14216) beam_PostCommit_XVR_Samza is flaky
Daniel Oliveira created BEAM-14216: -- Summary: beam_PostCommit_XVR_Samza is flaky Key: BEAM-14216 URL: https://issues.apache.org/jira/browse/BEAM-14216 Project: Beam Issue Type: Bug Components: cross-language, test-failures Reporter: Daniel Oliveira Didn't have time to look very closely into this. The test seems to be flaky and from the example failures I looked at there are potentially multiple different failures (or the same failure appearing as different error messages). Example failure: https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/994/ -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-13830) XVR Direct/Spark/Flink tests are timing out
[ https://issues.apache.org/jira/browse/BEAM-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515020#comment-17515020 ] Daniel Oliveira commented on BEAM-13830: This was resolved a while ago. > XVR Direct/Spark/Flink tests are timing out > --- > > Key: BEAM-13830 > URL: https://issues.apache.org/jira/browse/BEAM-13830 > Project: Beam > Issue Type: Bug > Components: cross-language, test-failures >Reporter: Chamikara Madhusanka Jayalath >Priority: P1 > Fix For: 2.36.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/ > https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/ > https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/ > Seems like these tests are running a set of > "org.apache.beam.sdk.extensions.schemaio.expansion" tests [1] that it did not > run before [2]. > I see that https://github.com/apache/beam/pull/16705 did some Gradle changes > related to SchemaIO and also in the set of PRs mentioned in the first failure > so possibly related. > [1] https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/2260/testReport/ > [2] > https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/2259/testReport/ -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-13830) XVR Direct/Spark/Flink tests are timing out
[ https://issues.apache.org/jira/browse/BEAM-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-13830: --- Resolution: Fixed Status: Resolved (was: Open) > XVR Direct/Spark/Flink tests are timing out > --- > > Key: BEAM-13830 > URL: https://issues.apache.org/jira/browse/BEAM-13830 > Project: Beam > Issue Type: Bug > Components: cross-language, test-failures >Reporter: Chamikara Madhusanka Jayalath >Priority: P1 > Fix For: 2.36.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/ > https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/ > https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/ > Seems like these tests are running a set of > "org.apache.beam.sdk.extensions.schemaio.expansion" tests [1] that it did not > run before [2]. > I see that https://github.com/apache/beam/pull/16705 did some Gradle changes > related to SchemaIO and also in the set of PRs mentioned in the first failure > so possibly related. > [1] https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/2260/testReport/ > [2] > https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/2259/testReport/ -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (BEAM-14214) beam_PostCommit_XVR_GoUsingJava_Dataflow fails on some test transforms
Daniel Oliveira created BEAM-14214: -- Summary: beam_PostCommit_XVR_GoUsingJava_Dataflow fails on some test transforms Key: BEAM-14214 URL: https://issues.apache.org/jira/browse/BEAM-14214 Project: Beam Issue Type: Bug Components: cross-language, sdk-go, test-failures Reporter: Daniel Oliveira Example failure: https://ci-beam.apache.org/job/beam_PostCommit_XVR_GoUsingJava_Dataflow/7/ I couldn't find accurate details about why the tests are failing, but TestXLang_Prefix, TestXLang_Multi, and TestXLang_Partition are failing while running for some reason. Investigating the Dataflow logs, we can see SDK harnesses are failing to connect for some reason. For example: {noformat} "getPodContainerStatuses for pod "df-go-testxlang-multi-03300551-62xv-harness-3msv_default(a7f1d8dfb2c3d2b4e80f5d92c1728787)" failed: rpc error: code = Unknown desc = Error: No such container: bea0d9bde42bf890f6fe1d4f589932471037a5948fb9588d01a06425cd14c177" {noformat} However I haven't been able to find any further details showing why the harness fails, and the tests keep running beyond that for a while with other errors that are also pretty inscrutable. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-12815) Flink Go XVR tests fail on TestXLang_Multi: Insufficient number of network buffers
[ https://issues.apache.org/jira/browse/BEAM-12815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-12815: --- Labels: (was: test) > Flink Go XVR tests fail on TestXLang_Multi: Insufficient number of network > buffers > -- > > Key: BEAM-12815 > URL: https://issues.apache.org/jira/browse/BEAM-12815 > Project: Beam > Issue Type: Bug > Components: cross-language, sdk-go >Reporter: Daniel Oliveira >Assignee: Danny McCormick >Priority: P3 > Fix For: Not applicable > > > When running the cross-language test suites () Flink fails on TestXLang_Multi > with the following error: > {noformat} > 19:29:14 2021/08/27 02:29:14 (): java.io.IOException: Insufficient number of > network buffers: required 17, but only 16 available. The total number of > network buffers is currently set to 2048 of 32768 bytes each. You can > increase this number by setting the configuration keys > 'taskmanager.memory.network.fraction', 'taskmanager.memory.network.min', and > 'taskmanager.memory.network.max'. > 19:29:14 2021/08/27 02:29:14 Job state: FAILED > 19:29:14 --- FAIL: TestXLang_Multi (6.26s){noformat} > This doesn't seem to be a parallelism problem (go test is run with "-p 1" as > expected) and is only happening on this specific test. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-12815) Flink Go XVR tests fail on TestXLang_Multi: Insufficient number of network buffers
[ https://issues.apache.org/jira/browse/BEAM-12815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-12815: --- Component/s: test-failures > Flink Go XVR tests fail on TestXLang_Multi: Insufficient number of network > buffers > -- > > Key: BEAM-12815 > URL: https://issues.apache.org/jira/browse/BEAM-12815 > Project: Beam > Issue Type: Bug > Components: cross-language, sdk-go, test-failures >Reporter: Daniel Oliveira >Assignee: Danny McCormick >Priority: P3 > Fix For: Not applicable > > > When running the cross-language test suites () Flink fails on TestXLang_Multi > with the following error: > {noformat} > 19:29:14 2021/08/27 02:29:14 (): java.io.IOException: Insufficient number of > network buffers: required 17, but only 16 available. The total number of > network buffers is currently set to 2048 of 32768 bytes each. You can > increase this number by setting the configuration keys > 'taskmanager.memory.network.fraction', 'taskmanager.memory.network.min', and > 'taskmanager.memory.network.max'. > 19:29:14 2021/08/27 02:29:14 Job state: FAILED > 19:29:14 --- FAIL: TestXLang_Multi (6.26s){noformat} > This doesn't seem to be a parallelism problem (go test is run with "-p 1" as > expected) and is only happening on this specific test. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-12815) Flink Go XVR tests fail on TestXLang_Multi: Insufficient number of network buffers
[ https://issues.apache.org/jira/browse/BEAM-12815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-12815: --- Labels: test (was: ) > Flink Go XVR tests fail on TestXLang_Multi: Insufficient number of network > buffers > -- > > Key: BEAM-12815 > URL: https://issues.apache.org/jira/browse/BEAM-12815 > Project: Beam > Issue Type: Bug > Components: cross-language, sdk-go >Reporter: Daniel Oliveira >Assignee: Danny McCormick >Priority: P3 > Labels: test > Fix For: Not applicable > > > When running the cross-language test suites () Flink fails on TestXLang_Multi > with the following error: > {noformat} > 19:29:14 2021/08/27 02:29:14 (): java.io.IOException: Insufficient number of > network buffers: required 17, but only 16 available. The total number of > network buffers is currently set to 2048 of 32768 bytes each. You can > increase this number by setting the configuration keys > 'taskmanager.memory.network.fraction', 'taskmanager.memory.network.min', and > 'taskmanager.memory.network.max'. > 19:29:14 2021/08/27 02:29:14 Job state: FAILED > 19:29:14 --- FAIL: TestXLang_Multi (6.26s){noformat} > This doesn't seem to be a parallelism problem (go test is run with "-p 1" as > expected) and is only happening on this specific test. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (BEAM-14163) Performance Regressions in streaming python ParDo and GBK Load Tests
[ https://issues.apache.org/jira/browse/BEAM-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514403#comment-17514403 ] Daniel Oliveira edited comment on BEAM-14163 at 3/30/22, 2:42 AM: -- Cherry-pick is in and the graphs linked show metrics returning to their previous values on master, so I think it's safe to mark this resolved. Thanks everyone who helped investigate! was (Author: danoliveira): Cherry-pick is in and the graphs linked show metrics returning to their previous values on master, so I think it's safe to mark this resolved. > Performance Regressions in streaming python ParDo and GBK Load Tests > > > Key: BEAM-14163 > URL: https://issues.apache.org/jira/browse/BEAM-14163 > Project: Beam > Issue Type: Bug > Components: community-metrics, sdk-py-core >Affects Versions: 2.38.0 >Reporter: Daniel Oliveira >Assignee: Robert Bradshaw >Priority: P1 > Fix For: 2.38.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > As specified in the [Beam Release > Guide|https://beam.apache.org/contribute/release-guide/#4-investigate-performance-regressions], > I'm investigating performance regressions. The following load test metrics > show a clear and persistant performance regression starting approximately > around March 17 and affecting version 2.38.0. > ParDo Load Tests: > http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python > GBK Load Tests: > http://metrics.beam.apache.org/d/UYZ-oJ3Zk/gbk-load-tests?orgId=1&var-processingType=streaming&var-sdk=python&from=now-30d&to=now -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14163) Performance Regressions in streaming python ParDo and GBK Load Tests
[ https://issues.apache.org/jira/browse/BEAM-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514403#comment-17514403 ] Daniel Oliveira commented on BEAM-14163: Cherry-pick is in and the graphs linked show metrics returning to their previous values on master, so I think it's safe to mark this resolved. > Performance Regressions in streaming python ParDo and GBK Load Tests > > > Key: BEAM-14163 > URL: https://issues.apache.org/jira/browse/BEAM-14163 > Project: Beam > Issue Type: Bug > Components: community-metrics, sdk-py-core >Affects Versions: 2.38.0 >Reporter: Daniel Oliveira >Assignee: Robert Bradshaw >Priority: P1 > Fix For: 2.38.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > As specified in the [Beam Release > Guide|https://beam.apache.org/contribute/release-guide/#4-investigate-performance-regressions], > I'm investigating performance regressions. The following load test metrics > show a clear and persistant performance regression starting approximately > around March 17 and affecting version 2.38.0. > ParDo Load Tests: > http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python > GBK Load Tests: > http://metrics.beam.apache.org/d/UYZ-oJ3Zk/gbk-load-tests?orgId=1&var-processingType=streaming&var-sdk=python&from=now-30d&to=now -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-14163) Performance Regressions in streaming python ParDo and GBK Load Tests
[ https://issues.apache.org/jira/browse/BEAM-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-14163: --- Resolution: Fixed Status: Resolved (was: Open) > Performance Regressions in streaming python ParDo and GBK Load Tests > > > Key: BEAM-14163 > URL: https://issues.apache.org/jira/browse/BEAM-14163 > Project: Beam > Issue Type: Bug > Components: community-metrics, sdk-py-core >Affects Versions: 2.38.0 >Reporter: Daniel Oliveira >Assignee: Robert Bradshaw >Priority: P1 > Fix For: 2.38.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > As specified in the [Beam Release > Guide|https://beam.apache.org/contribute/release-guide/#4-investigate-performance-regressions], > I'm investigating performance regressions. The following load test metrics > show a clear and persistant performance regression starting approximately > around March 17 and affecting version 2.38.0. > ParDo Load Tests: > http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python > GBK Load Tests: > http://metrics.beam.apache.org/d/UYZ-oJ3Zk/gbk-load-tests?orgId=1&var-processingType=streaming&var-sdk=python&from=now-30d&to=now -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14194) [SpannerIO.readChangeStream] Throw error when autoscaling algorithm is not NONE
[ https://issues.apache.org/jira/browse/BEAM-14194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514388#comment-17514388 ] Daniel Oliveira commented on BEAM-14194: (I'm copy-pasting my comment from the other release-blocker because it is completely applicable here) I can fit this cherry-pick in since it's all ready and won't delay the release. While it technically doesn't fit the requirements for a release-blocker (since it isn't a "significant regression or loss of functionality"), it's in a weird spot due to being a new feature. So sure, technically none of this can be a regression since it's all a new feature of this release. But this is still a known issue with noticeable user impact and the cherry-pick isn't delaying anything, so I think it's worth getting in. > [SpannerIO.readChangeStream] Throw error when autoscaling algorithm is not > NONE > --- > > Key: BEAM-14194 > URL: https://issues.apache.org/jira/browse/BEAM-14194 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp, runner-dataflow >Affects Versions: 2.37.0 >Reporter: Thiago Nunes >Assignee: Thiago Nunes >Priority: P2 > Fix For: 2.38.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > SpannerIO.readChangeStreams does not currently support the autoscaling > feature. In order to avoid customer confusion, we decided to error out if an > algorithm different than NONE is specified. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14185) [SpannerIO.readChangeStreams] Drop metadata tables at the end of the job
[ https://issues.apache.org/jira/browse/BEAM-14185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514378#comment-17514378 ] Daniel Oliveira commented on BEAM-14185: I can fit this cherry-pick in since it's all ready and won't delay the release. While it technically doesn't fit the requirements for a release-blocker (since it isn't a "significant regression or loss of functionality"), it's in a weird spot due to being a new feature. So sure, technically none of this can be a regression since it's all a new feature of this release. But this is still a known issue with noticeable user impact and the cherry-pick isn't delaying anything, so I think it's worth getting in. > [SpannerIO.readChangeStreams] Drop metadata tables at the end of the job > > > Key: BEAM-14185 > URL: https://issues.apache.org/jira/browse/BEAM-14185 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Affects Versions: 2.37.0 >Reporter: Thiago Nunes >Assignee: Thiago Nunes >Priority: P2 > Fix For: 2.38.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The SpannerIO.readChangeStreams Connector uses metadata tables to keep track > of its internal state during execution. At the moment, these metadata tables > linger after the execution, meaning that users will have to drop them > manually. > In this change, we would like to drop them automatically once the job > finishes. This should only occur after all partitions have been processed > successfully and marked as finished. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-13519) Java precommit flaky (timing out)
[ https://issues.apache.org/jira/browse/BEAM-13519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514377#comment-17514377 ] Daniel Oliveira commented on BEAM-13519: I should probably add that this has been making reviewing PRs annoying since it makes the Java precommit very flaky, and since it takes ~3 hours to run, deflaking is a huge time sink. > Java precommit flaky (timing out) > - > > Key: BEAM-13519 > URL: https://issues.apache.org/jira/browse/BEAM-13519 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Priority: P1 > Labels: flake > > Java precommits are sometimes timing out with no clear cause. Gradle will log > a bunch of routine build tasks, and then Jenkins will abort the job much > later. There are no logs to indicate what happened. It is not even clear > which task or tasks, if any, was the culprit, since many tasks are run in > parallel. > 01:53:28 > Task :sdks:java:testing:nexmark:build > 01:53:28 > Task :sdks:java:testing:nexmark:buildDependents > 01:53:28 > Task :sdks:java:extensions:sql:zetasql:buildDependents > 01:53:28 > Task :sdks:java:io:google-cloud-platform:buildDependents > 01:53:28 > Task :sdks:java:extensions:sql:buildDependents > 01:53:28 > Task :sdks:java:io:kafka:buildDependents > 01:53:28 > Task :sdks:java:extensions:join-library:buildDependents > 01:53:28 > Task :sdks:java:io:synthetic:buildDependents > 01:53:28 > Task :sdks:java:io:mongodb:buildDependents > 01:53:28 > Task :sdks:java:io:thrift:buildDependents > 01:53:28 > Task :sdks:java:testing:test-utils:buildDependents > 01:53:28 > Task :sdks:java:expansion-service:buildDependents > 01:53:28 > Task :sdks:java:extensions:arrow:buildDependents > 01:53:28 > Task :sdks:java:extensions:protobuf:buildDependents > 01:53:28 > Task :sdks:java:io:common:buildDependents > 01:53:28 > Task :runners:direct-java:buildDependents > 01:53:28 > Task :runners:local-java:buildDependents > 01:53:28 Build timed out (after 120 minutes). Marking the build as aborted. > https://ci-beam.apache.org/job/beam_PreCommit_Java_cron/4874/ -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-13519) Java precommit flaky (timing out)
[ https://issues.apache.org/jira/browse/BEAM-13519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514374#comment-17514374 ] Daniel Oliveira commented on BEAM-13519: This is resurfacing now, despite the Precommit timeout getting extended to 180 minutes. Check out https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/5270/ for example: {noformat} 12:00:48 > Task :sdks:java:core:validateShadedJarDoesntLeakNonProjectClasses 12:00:48 > Task :sdks:java:core:check 12:00:48 > Task :sdks:java:core:build 12:00:48 > Task :sdks:java:core:buildNeeded 14:16:44 Build timed out (after 180 minutes). Marking the build as aborted. 14:16:44 Build was aborted {noformat} > Java precommit flaky (timing out) > - > > Key: BEAM-13519 > URL: https://issues.apache.org/jira/browse/BEAM-13519 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Priority: P1 > Labels: flake > > Java precommits are sometimes timing out with no clear cause. Gradle will log > a bunch of routine build tasks, and then Jenkins will abort the job much > later. There are no logs to indicate what happened. It is not even clear > which task or tasks, if any, was the culprit, since many tasks are run in > parallel. > 01:53:28 > Task :sdks:java:testing:nexmark:build > 01:53:28 > Task :sdks:java:testing:nexmark:buildDependents > 01:53:28 > Task :sdks:java:extensions:sql:zetasql:buildDependents > 01:53:28 > Task :sdks:java:io:google-cloud-platform:buildDependents > 01:53:28 > Task :sdks:java:extensions:sql:buildDependents > 01:53:28 > Task :sdks:java:io:kafka:buildDependents > 01:53:28 > Task :sdks:java:extensions:join-library:buildDependents > 01:53:28 > Task :sdks:java:io:synthetic:buildDependents > 01:53:28 > Task :sdks:java:io:mongodb:buildDependents > 01:53:28 > Task :sdks:java:io:thrift:buildDependents > 01:53:28 > Task :sdks:java:testing:test-utils:buildDependents > 01:53:28 > Task :sdks:java:expansion-service:buildDependents > 01:53:28 > Task :sdks:java:extensions:arrow:buildDependents > 01:53:28 > Task :sdks:java:extensions:protobuf:buildDependents > 01:53:28 > Task :sdks:java:io:common:buildDependents > 01:53:28 > Task :runners:direct-java:buildDependents > 01:53:28 > Task :runners:local-java:buildDependents > 01:53:28 Build timed out (after 120 minutes). Marking the build as aborted. > https://ci-beam.apache.org/job/beam_PreCommit_Java_cron/4874/ -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-8218) Implement Apache PulsarIO
[ https://issues.apache.org/jira/browse/BEAM-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514308#comment-17514308 ] Daniel Oliveira commented on BEAM-8218: --- Note that this is marked as release-blocking for 2.38.0 due to the release version being set. Not sure if that was intentional, but this doesn't look like a release-blocking bug. Can we remove this from the list of release-blockers? > Implement Apache PulsarIO > - > > Key: BEAM-8218 > URL: https://issues.apache.org/jira/browse/BEAM-8218 > Project: Beam > Issue Type: Task > Components: io-ideas >Reporter: Alex Van Boxel >Assignee: Marco Robles >Priority: P3 > Fix For: 2.38.0 > > Time Spent: 17h 50m > Remaining Estimate: 0h > > Apache Pulsar is starting to gain popularity. Having a native Beam PulsarIO > could be beneficial. > [https://pulsar.apache.org/|https://pulsar.apache.org/en/] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14177) GroupByKey iteration caching broken for portable runners like Dataflow runner v2
[ https://issues.apache.org/jira/browse/BEAM-14177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514305#comment-17514305 ] Daniel Oliveira commented on BEAM-14177: Since this has been cherry-picked into the release branch, can we mark this as resolved? > GroupByKey iteration caching broken for portable runners like Dataflow runner > v2 > > > Key: BEAM-14177 > URL: https://issues.apache.org/jira/browse/BEAM-14177 > Project: Beam > Issue Type: Bug > Components: sdk-java-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: P2 > Fix For: 2.38.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > The wrong cache key is being used as it has not been namespaced to the state > key. > This was previously being done within StateFetchingIterators but > https://github.com/apache/beam/pull/17121 changed that to use a single shared > key. > The fix is to subcache the cache before passing it into > StateFetchingIterators restoring the prior behavior. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (BEAM-14181) BQ: Storage API Sink reuses closed connections
[ https://issues.apache.org/jira/browse/BEAM-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514304#comment-17514304 ] Daniel Oliveira edited comment on BEAM-14181 at 3/29/22, 7:38 PM: -- Cherry-picking #17187 to release-2.38.0: https://github.com/apache/beam/pull/17208 was (Author: danoliveira): Cherry-picking #17187 to release-2.38.0 > BQ: Storage API Sink reuses closed connections > -- > > Key: BEAM-14181 > URL: https://issues.apache.org/jira/browse/BEAM-14181 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Ahmet Altay >Assignee: Reuven Lax >Priority: P1 > Fix For: 2.38.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Creating a jira so that it can be considered whether it is release blocking > or not. > Related change: https://github.com/apache/beam/pull/17187 > This causes the BigQuery sink to sometimes get full stuck and never recover, > and the pipeline grinds to a halt. Likely the regression was introduced in > the last Beam release -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14181) BQ: Storage API Sink reuses closed connections
[ https://issues.apache.org/jira/browse/BEAM-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514304#comment-17514304 ] Daniel Oliveira commented on BEAM-14181: Cherry-picking #17187 to release-2.38.0 > BQ: Storage API Sink reuses closed connections > -- > > Key: BEAM-14181 > URL: https://issues.apache.org/jira/browse/BEAM-14181 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Ahmet Altay >Assignee: Reuven Lax >Priority: P1 > Fix For: 2.38.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Creating a jira so that it can be considered whether it is release blocking > or not. > Related change: https://github.com/apache/beam/pull/17187 > This causes the BigQuery sink to sometimes get full stuck and never recover, > and the pipeline grinds to a halt. Likely the regression was introduced in > the last Beam release -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-14181) BQ: Storage API Sink reuses closed connections
[ https://issues.apache.org/jira/browse/BEAM-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-14181: --- Description: Creating a jira so that it can be considered whether it is release blocking or not. Related change: https://github.com/apache/beam/pull/17187 This causes the BigQuery sink to sometimes get full stuck and never recover, and the pipeline grinds to a halt. Likely the regression was introduced in the last Beam release was: Creating a jira so that it can be considered whether it is release blocking or not. Related change: https://github.com/apache/beam/pull/17187 > BQ: Storage API Sink reuses closed connections > -- > > Key: BEAM-14181 > URL: https://issues.apache.org/jira/browse/BEAM-14181 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Ahmet Altay >Assignee: Reuven Lax >Priority: P1 > Fix For: 2.38.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Creating a jira so that it can be considered whether it is release blocking > or not. > Related change: https://github.com/apache/beam/pull/17187 > This causes the BigQuery sink to sometimes get full stuck and never recover, > and the pipeline grinds to a halt. Likely the regression was introduced in > the last Beam release -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14116) Fix Pub/Sub Lite IO and SDF performance issues with shuffles
[ https://issues.apache.org/jira/browse/BEAM-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513762#comment-17513762 ] Daniel Oliveira commented on BEAM-14116: Just a heads up that I'll be timeboxing a fix for this in 2.38.0 based on the progress of https://issues.apache.org/jira/browse/BEAM-14163. That is, if that gets fixed and there isn't a fix for this yet, I'll rollback these PRs on the release branch instead of delaying the RC further. > Fix Pub/Sub Lite IO and SDF performance issues with shuffles > > > Key: BEAM-14116 > URL: https://issues.apache.org/jira/browse/BEAM-14116 > Project: Beam > Issue Type: Task > Components: io-java-gcp, runner-dataflow >Reporter: Daniel Collins >Assignee: Daniel Collins >Priority: P2 > Fix For: 2.38.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14179) MonitoringInfoMetricName null value guard uncovering additional issues
[ https://issues.apache.org/jira/browse/BEAM-14179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512580#comment-17512580 ] Daniel Oliveira commented on BEAM-14179: Regarding this bug's release blocker status, I'm currently leaning towards just rolling back the culprit PR on the release branch. However I'm currently waiting on some other cherrypicks before I can make RC1, so if a fix is available before then I'll cherry-pick it in. > MonitoringInfoMetricName null value guard uncovering additional issues > -- > > Key: BEAM-14179 > URL: https://issues.apache.org/jira/browse/BEAM-14179 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, sdk-java-harness >Reporter: Luke Cwik >Assignee: Daniel Oliveira >Priority: P2 > Fix For: 2.38.0 > > > Additional integration testing > (//cloud/dataflow/testing/integration/sdk:V1ReadIT_testE2EV1Read) caught that > https://github.com/apache/beam/pull/17094 causes a regression: > The test failed with: > {noformat} > Caused by: java.lang.NullPointerException: null value in entry: > DATASTORE_NAMESPACE=null > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:32) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:100) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.RegularImmutableMap.fromEntries(RegularImmutableMap.java:74) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:464) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:437) > at > org.apache.beam.runners.core.metrics.MonitoringInfoMetricName.(MonitoringInfoMetricName.java:46) > at > org.apache.beam.runners.core.metrics.MonitoringInfoMetricName.named(MonitoringInfoMetricName.java:93) > at > org.apache.beam.runners.core.metrics.ServiceCallMetric.call(ServiceCallMetric.java:82) > at > org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read$ReadFn.runQueryWithRetries(DatastoreV1.java:927) > at > org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read$ReadFn.processElement(DatastoreV1.java:965) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14064) ElasticSearchIO#Write buffering and outputting across windows
[ https://issues.apache.org/jira/browse/BEAM-14064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512139#comment-17512139 ] Daniel Oliveira commented on BEAM-14064: That sounds reasonable. I'd agree that if this is now silently dropping elements, that's what I would call a new regression. If that's all that remains to get in a PR and cherry-pick it into the release branch, then it sounds reasonable, we can just focus on getting an answer to your question on watermark behavior ASAP. > ElasticSearchIO#Write buffering and outputting across windows > - > > Key: BEAM-14064 > URL: https://issues.apache.org/jira/browse/BEAM-14064 > Project: Beam > Issue Type: Bug > Components: io-java-elasticsearch >Affects Versions: 2.35.0, 2.36.0, 2.37.0 >Reporter: Luke Cwik >Assignee: Evan Galpin >Priority: P1 > Fix For: 2.38.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > Source: https://lists.apache.org/thread/mtwtno2o88lx3zl12jlz7o5w1lcgm2db > Bug PR: https://github.com/apache/beam/pull/15381 > ElasticsearchIO is collecting results from elements in window X and then > trying to output them in window Y when flushing the batch. This exposed a bug > where elements that were being buffered were being output as part of a > different window than what the window that produced them was. > This became visible because validation was added recently to ensure that when > the pipeline is processing elements in window X that output with a timestamp > is valid for window X. Note that this validation only occurs in > *@ProcessElement* since output is associated with the current window with the > input element that is being processed. > It is ok to do this in *@FinishBundle* since there is no existing windowing > context and when you output that element is assigned to an appropriate window. > *Further Context* > We’ve bisected it to being introduced in 2.35.0, and I’m reasonably certain > it’s this PR https://github.com/apache/beam/pull/15381 > Our scenario is pretty trivial, we read off Pubsub and write to Elastic in a > streaming job, the config for the source and sink is respectively > {noformat} > pipeline.apply( > PubsubIO.readStrings().fromSubscription(subscription) > ).apply(ParseJsons.of(OurObject::class.java)) > .setCoder(KryoCoder.of()) > {noformat} > and > {noformat} > ElasticsearchIO.write() > .withUseStatefulBatches(true) > .withMaxParallelRequestsPerWindow(1) > .withMaxBufferingDuration(Duration.standardSeconds(30)) > // 5 bytes **> KiB **> MiB, so 5 MiB > .withMaxBatchSizeBytes(5L * 1024 * 1024) > // # of docs > .withMaxBatchSize(1000) > .withConnectionConfiguration( > ElasticsearchIO.ConnectionConfiguration.create( > arrayOf(host), > "fubar", > "_doc" > ).withConnectTimeout(5000) > .withSocketTimeout(3) > ) > .withRetryConfiguration( > ElasticsearchIO.RetryConfiguration.create( > 10, > // the duration is wall clock, against the connection and > socket timeouts specified > // above. I.e., 10 x 30s is gonna be more than 3 minutes, > so if we're getting > // 10 socket timeouts in a row, this would ignore the > "10" part and terminate > // after 6. The idea is that in a mixed failure mode, > you'd get different timeouts > // of different durations, and on average 10 x fails < 4m. > // That said, 4m is arbitrary, so adjust as and when > needed. > Duration.standardMinutes(4) > ) > ) > .withIdFn { f: JsonNode -> f["id"].asText() } > .withIndexFn { f: JsonNode -> f["schema_name"].asText() } > .withIsDeleteFn { f: JsonNode -> f["_action"].asText("noop") == > "delete" } > {noformat} > We recently tried upgrading 2.33 to 2.36 and immediately hit a bug in the > consumer, due to alleged time skew, specifically > {noformat} > 2022-03-07 10:48:37.886 GMTError message from worker: > java.lang.IllegalArgumentException: Cannot output with timestamp > 2022-03-07T10:43:38.640Z. Output timestamps must be no earlier than the > timestamp of the > current input (2022-03-07T10:43:43.562Z) minus the allowed skew (0 > milliseconds) and no later than 294247-01-10T04:00:54.775Z. See the > DoFn#getAllowedTimestampSkew() Javadoc > for details on changing the allowed skew. > org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.cor
[jira] [Assigned] (BEAM-14163) Performance Regressions in streaming python ParDo and GBK Load Tests
[ https://issues.apache.org/jira/browse/BEAM-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira reassigned BEAM-14163: -- Assignee: Valentyn Tymofieiev (was: Daniel Oliveira) > Performance Regressions in streaming python ParDo and GBK Load Tests > > > Key: BEAM-14163 > URL: https://issues.apache.org/jira/browse/BEAM-14163 > Project: Beam > Issue Type: Bug > Components: community-metrics, sdk-py-core >Affects Versions: 2.38.0 >Reporter: Daniel Oliveira >Assignee: Valentyn Tymofieiev >Priority: P0 > Fix For: 2.38.0 > > > As specified in the [Beam Release > Guide|https://beam.apache.org/contribute/release-guide/#4-investigate-performance-regressions], > I'm investigating performance regressions. The following load test metrics > show a clear and persistant performance regression starting approximately > around March 17 and affecting version 2.38.0. > ParDo Load Tests: > http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python > GBK Load Tests: > http://metrics.beam.apache.org/d/UYZ-oJ3Zk/gbk-load-tests?orgId=1&var-processingType=streaming&var-sdk=python&from=now-30d&to=now -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14171) CoGroupByKey loses values with large groups on Dataflow v1
[ https://issues.apache.org/jira/browse/BEAM-14171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512130#comment-17512130 ] Daniel Oliveira commented on BEAM-14171: How does this issue look in terms of timeline to getting a fix in and regarding blocking the release? I see this issue recently got reported and a PR just went up, but not sure if that PR addresses the root cause or is tangentially related. It looks like this affected the previous two versions, which by precedent is a reason for us _not_ to block since this wouldn't be a new regression. But it also sounds like a pretty major problem if it affects all CoGroupByKeys, which I think might be worth making an exception _if_ we think a fix can be implemented in a timely manner. Like within the next three workdays as a rough target. Alternatively, if there are simple workarounds/mitigations for this, then I think we can just list it as a known issue and describe how users can mitigate it. > CoGroupByKey loses values with large groups on Dataflow v1 > -- > > Key: BEAM-14171 > URL: https://issues.apache.org/jira/browse/BEAM-14171 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-java-core >Affects Versions: 2.36.0, 2.37.0 >Reporter: Niel Markwick >Assignee: Robert Bradshaw >Priority: P1 > Fix For: 2.38.0 > > Time Spent: 20m > Remaining Estimate: 0h > > CoGroupByKey can lose elements - replacing them with null values when a group > is large (>10,000 elements). > > This only occurs in dataflow v1, not dataflow-v2 runner > Possibly related to BEAM-13541. > > https://lists.apache.org/thread/5y56kbgm3q0m1byzf7186rrkomrcfldm > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14153) Reshuffled Row Coder PCollection used direct to Side Input breaks Dataflow & PyPortable
[ https://issues.apache.org/jira/browse/BEAM-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512127#comment-17512127 ] Daniel Oliveira commented on BEAM-14153: I'm looking at release-blocking bugs and trying to see if any can be safely removed from the list of release blockers. What's the status on this? It does sound like a regression which indicates that it's a release blocker, but how soon is a fix incoming? If a fix isn't coming soon, is it a major regression or something that can easily be worked around? It sounds pretty specific to trigger this since it needs to be a reshuffled PCollection, maybe we can just provide workaround instructions along with marking this as a known issue? > Reshuffled Row Coder PCollection used direct to Side Input breaks Dataflow & > PyPortable > --- > > Key: BEAM-14153 > URL: https://issues.apache.org/jira/browse/BEAM-14153 > Project: Beam > Issue Type: Bug > Components: sdk-go >Affects Versions: 2.37.0 >Reporter: Robert Burke >Assignee: Robert Burke >Priority: P2 > Fix For: 2.38.0 > > > Since First class Iterable side inputs were implemented, passing a reshuffled > PCollection directly to a Side Input will cause a coder mismatch between > encoding the reshuffle and decoding it on Dataflow and on Python Portable. In > particular, the Row values will be encoded without a Length Prefix, but then > be requested to decode them with a length prefix, which wasn't included. > This is similar to the issue in BEAM-12438 which has been hacked around. > In this instance it's likely more resilient to always length prefix Row > encoded types, and make it explicit in the pipeline proto. This should avoid > issues with runners having odd behaviors WRT row coders at this time, while > not preventing them from introspecting row encoded values should they chose. > This may also allow us to avoid the hack for BEAM-12438, though that is > something to be verified independently. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14064) ElasticSearchIO#Write buffering and outputting across windows
[ https://issues.apache.org/jira/browse/BEAM-14064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512106#comment-17512106 ] Daniel Oliveira commented on BEAM-14064: Hi, I'm managing the 2.38.0 release. I see this is set to release-blocking, but I'm not sure if it's actually worth blocking the release. While it does fit our "significant regression or loss of functionality" requirement [from the website|https://beam.apache.org/contribute/release-blocking/], we've set a precedent of blocking for new regressions and not blocking for known existing issues. Basically, since this issue has been present for the past 3 releases, it's not a new regression and not something we want to block a release on. The cat's already out of the bag, so to speak. In addition to all that, the scope of this seems limited to ElasticsearchIO. If it had broad impact it might be worth making an exception, but not as it stands. Evan, can we take this off the release blocker list and just get it in for 2.39.0 instead? Do you have an argument for keeping it as a release blocker? > ElasticSearchIO#Write buffering and outputting across windows > - > > Key: BEAM-14064 > URL: https://issues.apache.org/jira/browse/BEAM-14064 > Project: Beam > Issue Type: Bug > Components: io-java-elasticsearch >Affects Versions: 2.35.0, 2.36.0, 2.37.0 >Reporter: Luke Cwik >Assignee: Evan Galpin >Priority: P1 > Fix For: 2.38.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > Source: https://lists.apache.org/thread/mtwtno2o88lx3zl12jlz7o5w1lcgm2db > Bug PR: https://github.com/apache/beam/pull/15381 > ElasticsearchIO is collecting results from elements in window X and then > trying to output them in window Y when flushing the batch. This exposed a bug > where elements that were being buffered were being output as part of a > different window than what the window that produced them was. > This became visible because validation was added recently to ensure that when > the pipeline is processing elements in window X that output with a timestamp > is valid for window X. Note that this validation only occurs in > *@ProcessElement* since output is associated with the current window with the > input element that is being processed. > It is ok to do this in *@FinishBundle* since there is no existing windowing > context and when you output that element is assigned to an appropriate window. > *Further Context* > We’ve bisected it to being introduced in 2.35.0, and I’m reasonably certain > it’s this PR https://github.com/apache/beam/pull/15381 > Our scenario is pretty trivial, we read off Pubsub and write to Elastic in a > streaming job, the config for the source and sink is respectively > {noformat} > pipeline.apply( > PubsubIO.readStrings().fromSubscription(subscription) > ).apply(ParseJsons.of(OurObject::class.java)) > .setCoder(KryoCoder.of()) > {noformat} > and > {noformat} > ElasticsearchIO.write() > .withUseStatefulBatches(true) > .withMaxParallelRequestsPerWindow(1) > .withMaxBufferingDuration(Duration.standardSeconds(30)) > // 5 bytes **> KiB **> MiB, so 5 MiB > .withMaxBatchSizeBytes(5L * 1024 * 1024) > // # of docs > .withMaxBatchSize(1000) > .withConnectionConfiguration( > ElasticsearchIO.ConnectionConfiguration.create( > arrayOf(host), > "fubar", > "_doc" > ).withConnectTimeout(5000) > .withSocketTimeout(3) > ) > .withRetryConfiguration( > ElasticsearchIO.RetryConfiguration.create( > 10, > // the duration is wall clock, against the connection and > socket timeouts specified > // above. I.e., 10 x 30s is gonna be more than 3 minutes, > so if we're getting > // 10 socket timeouts in a row, this would ignore the > "10" part and terminate > // after 6. The idea is that in a mixed failure mode, > you'd get different timeouts > // of different durations, and on average 10 x fails < 4m. > // That said, 4m is arbitrary, so adjust as and when > needed. > Duration.standardMinutes(4) > ) > ) > .withIdFn { f: JsonNode -> f["id"].asText() } > .withIndexFn { f: JsonNode -> f["schema_name"].asText() } > .withIsDeleteFn { f: JsonNode -> f["_action"].asText("noop") == > "delete" } > {noformat} > We recently tried upgrading 2.33 to 2.36 and immediately hit a bug in the > consum
[jira] [Commented] (BEAM-14129) Fix issues with Pub/Sub Lite IO at high volumes
[ https://issues.apache.org/jira/browse/BEAM-14129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512088#comment-17512088 ] Daniel Oliveira commented on BEAM-14129: Currently this Jira is marked as a release-blocker (due to the fix version being set to an upcoming release, and the issue not being resolved). This issue doesn't look like it's "a significant regression or loss of functionality" ([see this page|https://beam.apache.org/contribute/release-blocking/]). Can we unmark it as release-blocking by clearing the "Fix Version" field until after it's resolved? > Fix issues with Pub/Sub Lite IO at high volumes > --- > > Key: BEAM-14129 > URL: https://issues.apache.org/jira/browse/BEAM-14129 > Project: Beam > Issue Type: Task > Components: io-java-gcp >Reporter: Daniel Collins >Assignee: Daniel Collins >Priority: P1 > Fix For: 2.38.0 > > Time Spent: 19h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-13695) Provide more accurate size estimates for cache objects in Java 17
[ https://issues.apache.org/jira/browse/BEAM-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512085#comment-17512085 ] Daniel Oliveira commented on BEAM-13695: Currently this Jira is marked as a release-blocker (due to the fix version being set to an upcoming release, and the issue not being resolved). This issue doesn't look like it's "a significant regression or loss of functionality" ([see this page|https://beam.apache.org/contribute/release-blocking/]). Can we unmark it as release-blocking? > Provide more accurate size estimates for cache objects in Java 17 > - > > Key: BEAM-13695 > URL: https://issues.apache.org/jira/browse/BEAM-13695 > Project: Beam > Issue Type: Improvement > Components: sdk-java-harness >Reporter: Kiley Sok >Assignee: Kiley Sok >Priority: P2 > Fix For: 2.38.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (BEAM-14163) Performance Regressions in streaming python ParDo and GBK Load Tests
Daniel Oliveira created BEAM-14163: -- Summary: Performance Regressions in streaming python ParDo and GBK Load Tests Key: BEAM-14163 URL: https://issues.apache.org/jira/browse/BEAM-14163 Project: Beam Issue Type: Bug Components: community-metrics, sdk-py-core Affects Versions: 2.38.0 Reporter: Daniel Oliveira Fix For: 2.38.0 As specified in the [Beam Release Guide|https://beam.apache.org/contribute/release-guide/#4-investigate-performance-regressions], I'm investigating performance regressions. The following load test metrics show a clear and persistant performance regression starting approximately around March 17 and affecting version 2.38.0. ParDo Load Tests: http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python GBK Load Tests: http://metrics.beam.apache.org/d/UYZ-oJ3Zk/gbk-load-tests?orgId=1&var-processingType=streaming&var-sdk=python&from=now-30d&to=now -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14122) Python portable precommit broken: 'get_installed_distributions'
[ https://issues.apache.org/jira/browse/BEAM-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509110#comment-17509110 ] Daniel Oliveira commented on BEAM-14122: I'm running into this on all the XLang Dataflow test suites too (which is blocking me since I'm trying to add a new one for Go). > Python portable precommit broken: 'get_installed_distributions' > --- > > Key: BEAM-14122 > URL: https://issues.apache.org/jira/browse/BEAM-14122 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Priority: P1 > Labels: currently-failing > > Successfully installed PTable-0.9.2 pip-licenses-2.3.0 > WARNING: Running pip as the 'root' user can result in broken permissions and > conflicting behaviour with the system package manager. It is recommended to > use a virtual environment instead: https://pip.pypa.io/warnings/venv > Traceback (most recent call last): > File "/usr/local/lib/python3.9/site-packages/piplicenses.py", line 40, in > > from pip._internal.utils.misc import get_installed_distributions > ImportError: cannot import name 'get_installed_distributions' from > 'pip._internal.utils.misc' > (/usr/local/lib/python3.9/site-packages/pip/_internal/utils/misc.py) > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/usr/local/bin/pip-licenses", line 5, in > from piplicenses import main > File "/usr/local/lib/python3.9/site-packages/piplicenses.py", line 42, in > > from pip import get_installed_distributions > ImportError: cannot import name 'get_installed_distributions' from 'pip' > (/usr/local/lib/python3.9/site-packages/pip/__init__.py) > Traceback (most recent call last): > File "/tmp/license_scripts/pull_licenses_py.py", line 166, in > dependencies = run_pip_licenses() > File "/tmp/license_scripts/pull_licenses_py.py", line 49, in > run_pip_licenses > dependencies = run_bash_command(command) > File "/tmp/license_scripts/pull_licenses_py.py", line 44, in > run_bash_command > return subprocess.check_output(command.split()).decode('utf-8') > File "/usr/local/lib/python3.9/subprocess.py", line 424, in check_output > return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, > File "/usr/local/lib/python3.9/subprocess.py", line 528, in run > raise CalledProcessError(retcode, process.args, > subprocess.CalledProcessError: Command '['pip-licenses', > '--with-license-file', '--with-urls', '--from=mixed', '--ignore', > 'apache-beam', '--format=json']' returned non-zero exit status 1. > The command '/bin/sh -c if [ "$pull_licenses" = "true" ] ; then pip > install 'pip-licenses<3.0.0' pyyaml tenacity && python > /tmp/license_scripts/pull_licenses_py.py ; fi' returned a non-zero code: 1 > > Task :sdks:python:container:py39:docker FAILED > https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/4748 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14017) beam_PreCommit_CommunityMetrics_Cron is failing.
[ https://issues.apache.org/jira/browse/BEAM-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503924#comment-17503924 ] Daniel Oliveira commented on BEAM-14017: Also relevant: The failing gradle task is happening here, where the scripts are: https://github.com/apache/beam/tree/master/.test-infra/metrics And the port that's failing to be found, 443, seems to be an HTTP port and is hardcoded in a few places in that directory https://github.com/apache/beam/search?l=Python&q=%22443%22 > beam_PreCommit_CommunityMetrics_Cron is failing. > > > Key: BEAM-14017 > URL: https://issues.apache.org/jira/browse/BEAM-14017 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Valentyn Tymofieiev >Assignee: Daniel Oliveira >Priority: P1 > > https://ci-beam.apache.org/job/beam_PreCommit_CommunityMetrics_Cron/4805/console > 10:14:48 > Task :beam-test-infra-metrics:validateConfiguration > 10:14:48 W0228 18:14:48.092605 389274 helpers.go:549] --dry-run=true is > deprecated (boolean value) and can be replaced with --dry-run=client. > 10:15:20 Unable to connect to the server: dial tcp 104.154.102.21:443: i/o > timeout (Client.Timeout exceeded while awaiting headers) > 10:15:20 > 10:15:20 > Task :beam-test-infra-metrics:validateConfiguration FAILED > 10:15:20 > 10:15:20 FAILURE: Build failed with an exception. > 10:15:20 > 10:15:20 * What went wrong: > 10:15:20 Execution failed for task > ':beam-test-infra-metrics:validateConfiguration'. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (BEAM-14017) beam_PreCommit_CommunityMetrics_Cron is failing.
[ https://issues.apache.org/jira/browse/BEAM-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira reassigned BEAM-14017: -- Assignee: Daniel Oliveira (was: Heejong Lee) > beam_PreCommit_CommunityMetrics_Cron is failing. > > > Key: BEAM-14017 > URL: https://issues.apache.org/jira/browse/BEAM-14017 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Valentyn Tymofieiev >Assignee: Daniel Oliveira >Priority: P1 > > https://ci-beam.apache.org/job/beam_PreCommit_CommunityMetrics_Cron/4805/console > 10:14:48 > Task :beam-test-infra-metrics:validateConfiguration > 10:14:48 W0228 18:14:48.092605 389274 helpers.go:549] --dry-run=true is > deprecated (boolean value) and can be replaced with --dry-run=client. > 10:15:20 Unable to connect to the server: dial tcp 104.154.102.21:443: i/o > timeout (Client.Timeout exceeded while awaiting headers) > 10:15:20 > 10:15:20 > Task :beam-test-infra-metrics:validateConfiguration FAILED > 10:15:20 > 10:15:20 FAILURE: Build failed with an exception. > 10:15:20 > 10:15:20 * What went wrong: > 10:15:20 Execution failed for task > ':beam-test-infra-metrics:validateConfiguration'. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-14017) beam_PreCommit_CommunityMetrics_Cron is failing.
[ https://issues.apache.org/jira/browse/BEAM-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503923#comment-17503923 ] Daniel Oliveira commented on BEAM-14017: I spent some time looking into this and couldn't figure out anything really useful. The best lead I have is that the error that's appearing is probably due to being unable to connect to a Kubernetes cluster, and that the error first started appearing Feb. 25 sometime between 4 PM PST and 10 PM PST. It seems likely that this is due to a change in our GCP environment but I haven't been able to find what it could be. Last good run: https://ci-beam.apache.org/job/beam_PreCommit_CommunityMetrics_Cron/4794/ First bad run: https://ci-beam.apache.org/job/beam_PreCommit_CommunityMetrics_Cron/4795/ > beam_PreCommit_CommunityMetrics_Cron is failing. > > > Key: BEAM-14017 > URL: https://issues.apache.org/jira/browse/BEAM-14017 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Valentyn Tymofieiev >Assignee: Heejong Lee >Priority: P1 > > https://ci-beam.apache.org/job/beam_PreCommit_CommunityMetrics_Cron/4805/console > 10:14:48 > Task :beam-test-infra-metrics:validateConfiguration > 10:14:48 W0228 18:14:48.092605 389274 helpers.go:549] --dry-run=true is > deprecated (boolean value) and can be replaced with --dry-run=client. > 10:15:20 Unable to connect to the server: dial tcp 104.154.102.21:443: i/o > timeout (Client.Timeout exceeded while awaiting headers) > 10:15:20 > 10:15:20 > Task :beam-test-infra-metrics:validateConfiguration FAILED > 10:15:20 > 10:15:20 FAILURE: Build failed with an exception. > 10:15:20 > 10:15:20 * What went wrong: > 10:15:20 Execution failed for task > ':beam-test-infra-metrics:validateConfiguration'. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-13857) Add expansion service startup to Go integration test flags.
[ https://issues.apache.org/jira/browse/BEAM-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-13857: --- Description: Currently a separate debezium io expansion address flag needs to be passed to the runner when running cross-language debezium IO pipelines from Go SDK. Find a way to do this in a better way so that we could have it started along with java io expansion service while spinning up the test without bulking :sdks:java:io:expansion-service. In particular, needing to add a flag per expansion service jar to our integration tests will eventually become quite cluttered, so we may wish to settle on some kind of KV map flag approach instead to reduce copypasta code overhead. Edit: Decided on going with the KV map flag approach within the Go SDK instead of in a bash script, and moving expansion service startup into the codebase as well. was: Currently a separate debezium io expansion address flag needs to be passed to the runner when running cross-language debezium IO pipelines from Go SDK. Find a way to do this in a better way so that we could have it started along with java io expansion service while spinning up the test without bulking :sdks:java:io:expansion-service. In particular, needing to add a flag per expansion service jar to our integration tests will eventually become quite cluttered, so we may wish to settle on some kind of KV map flag approach instead to reduce copypasta code overhead. > Add expansion service startup to Go integration test flags. > --- > > Key: BEAM-13857 > URL: https://issues.apache.org/jira/browse/BEAM-13857 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Ritesh Ghorse >Assignee: Daniel Oliveira >Priority: P2 > > Currently a separate debezium io expansion address flag needs to be passed to > the runner when running cross-language debezium IO pipelines from Go SDK. > Find a way to do this in a better way so that we could have it started along > with java io expansion service while spinning up the test without bulking > :sdks:java:io:expansion-service. > In particular, needing to add a flag per expansion service jar to our > integration tests will eventually become quite cluttered, so we may wish to > settle on some kind of KV map flag approach instead to reduce copypasta code > overhead. > Edit: Decided on going with the KV map flag approach within the Go SDK > instead of in a bash script, and moving expansion service startup into the > codebase as well. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-13857) Add expansion service startup to Go integration test flags.
[ https://issues.apache.org/jira/browse/BEAM-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-13857: --- Summary: Add expansion service startup to Go integration test flags. (was: DebeziumIO expansion address flag in Go SDK) > Add expansion service startup to Go integration test flags. > --- > > Key: BEAM-13857 > URL: https://issues.apache.org/jira/browse/BEAM-13857 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Ritesh Ghorse >Assignee: Daniel Oliveira >Priority: P2 > > Currently a separate debezium io expansion address flag needs to be passed to > the runner when running cross-language debezium IO pipelines from Go SDK. > Find a way to do this in a better way so that we could have it started along > with java io expansion service while spinning up the test without bulking > :sdks:java:io:expansion-service. > In particular, needing to add a flag per expansion service jar to our > integration tests will eventually become quite cluttered, so we may wish to > settle on some kind of KV map flag approach instead to reduce copypasta code > overhead. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (BEAM-13857) DebeziumIO expansion address flag in Go SDK
[ https://issues.apache.org/jira/browse/BEAM-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira reassigned BEAM-13857: -- Assignee: Daniel Oliveira > DebeziumIO expansion address flag in Go SDK > --- > > Key: BEAM-13857 > URL: https://issues.apache.org/jira/browse/BEAM-13857 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Ritesh Ghorse >Assignee: Daniel Oliveira >Priority: P2 > > Currently a separate debezium io expansion address flag needs to be passed to > the runner when running cross-language debezium IO pipelines from Go SDK. > Find a way to do this in a better way so that we could have it started along > with java io expansion service while spinning up the test without bulking > :sdks:java:io:expansion-service. > In particular, needing to add a flag per expansion service jar to our > integration tests will eventually become quite cluttered, so we may wish to > settle on some kind of KV map flag approach instead to reduce copypasta code > overhead. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-13321) [Cross-Language] Externalize a minimal Implementation of Java's BigQuery IO
[ https://issues.apache.org/jira/browse/BEAM-13321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-13321: --- Fix Version/s: 2.37.0 Resolution: Fixed Status: Resolved (was: In Progress) > [Cross-Language] Externalize a minimal Implementation of Java's BigQuery IO > --- > > Key: BEAM-13321 > URL: https://issues.apache.org/jira/browse/BEAM-13321 > Project: Beam > Issue Type: New Feature > Components: cross-language, io-java-gcp >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: P2 > Fix For: 2.37.0 > > Time Spent: 8h 50m > Remaining Estimate: 0h > > This is described in detail in this design doc: > [https://s.apache.org/beam-bigquery-externalization] > The short version of this task is to have a minimum viable implementation of > BigQuery IO available for cross-language usage via SchemaIO. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-13732) [Cross-Language] Implement Go SDK wrapper for xlang BigQuery IO
[ https://issues.apache.org/jira/browse/BEAM-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-13732: --- Fix Version/s: 2.37.0 Resolution: Fixed Status: Resolved (was: In Progress) > [Cross-Language] Implement Go SDK wrapper for xlang BigQuery IO > --- > > Key: BEAM-13732 > URL: https://issues.apache.org/jira/browse/BEAM-13732 > Project: Beam > Issue Type: New Feature > Components: cross-language, sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: P2 > Fix For: 2.37.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > Title says it all. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work started] (BEAM-13806) [Cross-Language] Jenkins integration test for Go SDK BigQuery IO.
[ https://issues.apache.org/jira/browse/BEAM-13806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-13806 started by Daniel Oliveira. -- > [Cross-Language] Jenkins integration test for Go SDK BigQuery IO. > - > > Key: BEAM-13806 > URL: https://issues.apache.org/jira/browse/BEAM-13806 > Project: Beam > Issue Type: New Feature > Components: cross-language, io-go-gcp, sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: P2 > Time Spent: 1h 20m > Remaining Estimate: 0h > > Title says it all. Add an integration test for cross-language BigQuery IO > that runs on Jenkins. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (BEAM-13806) [Cross-Language] Jenkins integration test for Go SDK BigQuery IO.
Daniel Oliveira created BEAM-13806: -- Summary: [Cross-Language] Jenkins integration test for Go SDK BigQuery IO. Key: BEAM-13806 URL: https://issues.apache.org/jira/browse/BEAM-13806 Project: Beam Issue Type: New Feature Components: cross-language, io-go-gcp, sdk-go Reporter: Daniel Oliveira Title says it all. Add an integration test for cross-language BigQuery IO that runs on Jenkins. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-13732) [Cross-Language] Implement Go SDK wrapper for xlang BigQuery IO
[ https://issues.apache.org/jira/browse/BEAM-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481378#comment-17481378 ] Daniel Oliveira commented on BEAM-13732: Requires BEAM-13321 to be implemented. > [Cross-Language] Implement Go SDK wrapper for xlang BigQuery IO > --- > > Key: BEAM-13732 > URL: https://issues.apache.org/jira/browse/BEAM-13732 > Project: Beam > Issue Type: New Feature > Components: cross-language, sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: P2 > > Title says it all. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work started] (BEAM-13321) [Cross-Language] Externalize a minimal Implementation of Java's BigQuery IO
[ https://issues.apache.org/jira/browse/BEAM-13321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-13321 started by Daniel Oliveira. -- > [Cross-Language] Externalize a minimal Implementation of Java's BigQuery IO > --- > > Key: BEAM-13321 > URL: https://issues.apache.org/jira/browse/BEAM-13321 > Project: Beam > Issue Type: New Feature > Components: cross-language, io-java-gcp >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: P2 > Time Spent: 4h 10m > Remaining Estimate: 0h > > This is described in detail in this design doc: > [https://s.apache.org/beam-bigquery-externalization] > The short version of this task is to have a minimum viable implementation of > BigQuery IO available for cross-language usage via SchemaIO. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (BEAM-13732) [Cross-Language] Implement Go SDK wrapper for xlang BigQuery IO
Daniel Oliveira created BEAM-13732: -- Summary: [Cross-Language] Implement Go SDK wrapper for xlang BigQuery IO Key: BEAM-13732 URL: https://issues.apache.org/jira/browse/BEAM-13732 Project: Beam Issue Type: New Feature Components: cross-language, sdk-go Reporter: Daniel Oliveira Assignee: Daniel Oliveira Title says it all. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work started] (BEAM-13732) [Cross-Language] Implement Go SDK wrapper for xlang BigQuery IO
[ https://issues.apache.org/jira/browse/BEAM-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-13732 started by Daniel Oliveira. -- > [Cross-Language] Implement Go SDK wrapper for xlang BigQuery IO > --- > > Key: BEAM-13732 > URL: https://issues.apache.org/jira/browse/BEAM-13732 > Project: Beam > Issue Type: New Feature > Components: cross-language, sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: P2 > > Title says it all. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-13321) [Cross-Language] Externalize a minimal Implementation of Java's BigQuery IO
[ https://issues.apache.org/jira/browse/BEAM-13321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17474236#comment-17474236 ] Daniel Oliveira commented on BEAM-13321: It doesn't, there's still some additional PRs coming up. > [Cross-Language] Externalize a minimal Implementation of Java's BigQuery IO > --- > > Key: BEAM-13321 > URL: https://issues.apache.org/jira/browse/BEAM-13321 > Project: Beam > Issue Type: New Feature > Components: cross-language, io-java-gcp >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: P2 > Time Spent: 2h 50m > Remaining Estimate: 0h > > This is described in detail in this design doc: > [https://s.apache.org/beam-bigquery-externalization] > The short version of this task is to have a minimum viable implementation of > BigQuery IO available for cross-language usage via SchemaIO. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (BEAM-13618) Java BigQuery IO: DirectRead does not work with Beam Schema support.
Daniel Oliveira created BEAM-13618: -- Summary: Java BigQuery IO: DirectRead does not work with Beam Schema support. Key: BEAM-13618 URL: https://issues.apache.org/jira/browse/BEAM-13618 Project: Beam Issue Type: Bug Components: io-java-gcp Affects Versions: 2.35.0 Reporter: Daniel Oliveira Currently in BigQueryIO, Reads with Beam Schema support (for example using [readTableRowsWithSchema|https://github.com/apache/beam/blob/v2.35.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L553]) don't actually have Schema support if using DirectRead as a read method. This appears to be because the expansion logic for DirectReads takes [a different path|https://github.com/apache/beam/blob/v2.35.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1060] that doesn't include any considerations for beam schemas ([example of the code handling Beam schemas in the default path|https://github.com/apache/beam/blob/v2.35.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1204]). Part of the reason for this is likely that the current approach to Beam Schema support is to get a description of the BQ table's schema and then convert it to a Beam schema. However, with DirectRead specific columns can be excluded while reading, meaning that the Beam schema needed doesn't actually convert directly to the table's schema, it would need to be constructed based on the specific fields selected for the read. (As a side note, this is currently not documented anywhere, leading me to believe this is an oversight or potential bug. I will add some documentation indicating that schema support currently does not work with DirectRead.) -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (BEAM-13456) beam_PostCommit_Java consistently timing out.
[ https://issues.apache.org/jira/browse/BEAM-13456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira reassigned BEAM-13456: -- Assignee: Kenneth Knowles > beam_PostCommit_Java consistently timing out. > - > > Key: BEAM-13456 > URL: https://issues.apache.org/jira/browse/BEAM-13456 > Project: Beam > Issue Type: Bug > Components: test-failures >Affects Versions: 2.36.0 >Reporter: Daniel Oliveira >Assignee: Kenneth Knowles >Priority: P1 > > This seems to have first appeared with build #8367: > [https://ci-beam.apache.org/job/beam_PostCommit_Java/8367/] > Frustratingly, no build scans pop up when the test fails this way, and no > error messages appear except for the timeout. It may be easiest to attempt to > determine which commit introduced the error. > The previous successful test is at commit > b52762bf150cacceb0fdeb1f0dc85cbea6e6f39c > The first failing test is at commit 06a5e67332aae53ea90dedb4ef6421c2a7d65035 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-13456) beam_PostCommit_Java consistently timing out.
[ https://issues.apache.org/jira/browse/BEAM-13456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-13456: --- Description: This seems to have first appeared with build #8367: [https://ci-beam.apache.org/job/beam_PostCommit_Java/8367/] Frustratingly, no build scans pop up when the test fails this way, and no error messages appear except for the timeout. It may be easiest to attempt to determine which commit introduced the error. The previous successful test is at commit b52762bf150cacceb0fdeb1f0dc85cbea6e6f39c The first failing test is at commit 06a5e67332aae53ea90dedb4ef6421c2a7d65035 was: This seems to have first appeared with build #8367: [https://ci-beam.apache.org/job/beam_PostCommit_Java/8367/] Frustratingly, no build scans pop up when the test fails this way, and no error messages appear except for the timeout. It may be easiest to attempt to determine which CL introduced the error. > beam_PostCommit_Java consistently timing out. > - > > Key: BEAM-13456 > URL: https://issues.apache.org/jira/browse/BEAM-13456 > Project: Beam > Issue Type: Bug > Components: test-failures >Affects Versions: 2.36.0 >Reporter: Daniel Oliveira >Priority: P1 > > This seems to have first appeared with build #8367: > [https://ci-beam.apache.org/job/beam_PostCommit_Java/8367/] > Frustratingly, no build scans pop up when the test fails this way, and no > error messages appear except for the timeout. It may be easiest to attempt to > determine which commit introduced the error. > The previous successful test is at commit > b52762bf150cacceb0fdeb1f0dc85cbea6e6f39c > The first failing test is at commit 06a5e67332aae53ea90dedb4ef6421c2a7d65035 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-13456) beam_PostCommit_Java consistently timing out.
[ https://issues.apache.org/jira/browse/BEAM-13456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458912#comment-17458912 ] Daniel Oliveira commented on BEAM-13456: As a side note, the beam_PostCommit_Java_ValidatesRunner_ULR test appears to be suffering from the same issue starting at nearly the same time, so it's probably related: [https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_ULR/] > beam_PostCommit_Java consistently timing out. > - > > Key: BEAM-13456 > URL: https://issues.apache.org/jira/browse/BEAM-13456 > Project: Beam > Issue Type: Bug > Components: test-failures >Affects Versions: 2.36.0 >Reporter: Daniel Oliveira >Priority: P1 > > This seems to have first appeared with build #8367: > [https://ci-beam.apache.org/job/beam_PostCommit_Java/8367/] > Frustratingly, no build scans pop up when the test fails this way, and no > error messages appear except for the timeout. It may be easiest to attempt to > determine which CL introduced the error. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (BEAM-13456) beam_PostCommit_Java consistently timing out.
Daniel Oliveira created BEAM-13456: -- Summary: beam_PostCommit_Java consistently timing out. Key: BEAM-13456 URL: https://issues.apache.org/jira/browse/BEAM-13456 Project: Beam Issue Type: Bug Components: test-failures Affects Versions: 2.36.0 Reporter: Daniel Oliveira This seems to have first appeared with build #8367: [https://ci-beam.apache.org/job/beam_PostCommit_Java/8367/] Frustratingly, no build scans pop up when the test fails this way, and no error messages appear except for the timeout. It may be easiest to attempt to determine which CL introduced the error. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-13433) beam_PostCommit_Python37 failing, potentially due to apache_beam.ml.gcp.cloud_dlp_it_test.CloudDLPIT
[ https://issues.apache.org/jira/browse/BEAM-13433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456879#comment-17456879 ] Daniel Oliveira commented on BEAM-13433: Assigning to you Pablo as a general python IO dev, since I can't find any specific owner for the test. > beam_PostCommit_Python37 failing, potentially due to > apache_beam.ml.gcp.cloud_dlp_it_test.CloudDLPIT > > > Key: BEAM-13433 > URL: https://issues.apache.org/jira/browse/BEAM-13433 > Project: Beam > Issue Type: Bug > Components: test-failures >Affects Versions: 2.36.0 >Reporter: Daniel Oliveira >Assignee: Pablo Estrada >Priority: P2 > > It's difficult for me to test for sure, because each run seems to show > slightly different errors, and sometimes the errors don't even show at all. > To track this down, you need to check the gradle build scan for the test, > because the raw logs are too long to find the appropriate error. > This is one that shows an error: > [https://ci-beam.apache.org/job/beam_PostCommit_Python37/4617/] > As far as I can tell, this is the error, apparently happening due to > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/gcp/cloud_dlp_it_test.py] > {noformat} > Traceback (most recent call last): > File "apache_beam/runners/common.py", line 1198, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 536, in > apache_beam.runners.common.SimpleInvoker.invoke_process > File "apache_beam/runners/common.py", line 1334, in > apache_beam.runners.common._OutputProcessor.process_outputs > File > "/usr/local/lib/python3.7/site-packages/apache_beam/ml/gcp/cloud_dlp.py", > line 199, in process > item={"value": element}, **self.params) > TypeError: deidentify_content() got an unexpected keyword argument 'item' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", > line 644, in do_work > work_executor.execute() > File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", > line 208, in execute > op.start() > File "dataflow_worker/native_operations.py", line 38, in > dataflow_worker.native_operations.NativeReadOperation.start > File "dataflow_worker/native_operations.py", line 39, in > dataflow_worker.native_operations.NativeReadOperation.start > File "dataflow_worker/native_operations.py", line 44, in > dataflow_worker.native_operations.NativeReadOperation.start > File "dataflow_worker/native_operations.py", line 54, in > dataflow_worker.native_operations.NativeReadOperation.start > File "apache_beam/runners/worker/operations.py", line 348, in > apache_beam.runners.worker.operations.Operation.output > File "apache_beam/runners/worker/operations.py", line 215, in > apache_beam.runners.worker.operations.SingletonConsumerSet.receive > File "apache_beam/runners/worker/operations.py", line 707, in > apache_beam.runners.worker.operations.DoOperation.process > File "apache_beam/runners/worker/operations.py", line 708, in > apache_beam.runners.worker.operations.DoOperation.process > File "apache_beam/runners/common.py", line 1200, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 1281, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > File "apache_beam/runners/common.py", line 1198, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 536, in > apache_beam.runners.common.SimpleInvoker.invoke_process > File "apache_beam/runners/common.py", line 1334, in > apache_beam.runners.common._OutputProcessor.process_outputs > File > "/usr/local/lib/python3.7/site-packages/apache_beam/ml/gcp/cloud_dlp.py", > line 199, in process > item={"value": element}, **self.params) > TypeError: deidentify_content() got an unexpected keyword argument 'item' > [while running 'MaskDetectedDetails/ParDo(_DeidentifyFn)']{noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-13433) beam_PostCommit_Python37 failing, potentially due to apache_beam.ml.gcp.cloud_dlp_it_test.CloudDLPIT
[ https://issues.apache.org/jira/browse/BEAM-13433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-13433: --- Description: It's difficult for me to test for sure, because each run seems to show slightly different errors, and sometimes the errors don't even show at all. To track this down, you need to check the gradle build scan for the test, because the raw logs are too long to find the appropriate error. This is one that shows an error: [https://ci-beam.apache.org/job/beam_PostCommit_Python37/4617/] As far as I can tell, this is the error, apparently happening due to [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/gcp/cloud_dlp_it_test.py] {noformat} Traceback (most recent call last): File "apache_beam/runners/common.py", line 1198, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 536, in apache_beam.runners.common.SimpleInvoker.invoke_process File "apache_beam/runners/common.py", line 1334, in apache_beam.runners.common._OutputProcessor.process_outputs File "/usr/local/lib/python3.7/site-packages/apache_beam/ml/gcp/cloud_dlp.py", line 199, in process item={"value": element}, **self.params) TypeError: deidentify_content() got an unexpected keyword argument 'item' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 644, in do_work work_executor.execute() File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 208, in execute op.start() File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start File "apache_beam/runners/worker/operations.py", line 348, in apache_beam.runners.worker.operations.Operation.output File "apache_beam/runners/worker/operations.py", line 215, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive File "apache_beam/runners/worker/operations.py", line 707, in apache_beam.runners.worker.operations.DoOperation.process File "apache_beam/runners/worker/operations.py", line 708, in apache_beam.runners.worker.operations.DoOperation.process File "apache_beam/runners/common.py", line 1200, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 1281, in apache_beam.runners.common.DoFnRunner._reraise_augmented File "apache_beam/runners/common.py", line 1198, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 536, in apache_beam.runners.common.SimpleInvoker.invoke_process File "apache_beam/runners/common.py", line 1334, in apache_beam.runners.common._OutputProcessor.process_outputs File "/usr/local/lib/python3.7/site-packages/apache_beam/ml/gcp/cloud_dlp.py", line 199, in process item={"value": element}, **self.params) TypeError: deidentify_content() got an unexpected keyword argument 'item' [while running 'MaskDetectedDetails/ParDo(_DeidentifyFn)']{noformat} was: It's difficult for me to test for sure, because each run seems to show slightly different errors, and sometimes the errors don't even show at all. To track this down, you need to check the gradle build scan for the test, because the raw logs are too long to find the appropriate error. This is one that shows an error: [https://ci-beam.apache.org/job/beam_PostCommit_Python37/4617/] As far as I can tell, this is the error, apparently happening due to [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/gcp/cloud_dlp_it_test.py] {noformat} Traceback (most recent call last): File "apache_beam/runners/common.py", line 1198, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 536, in apache_beam.runners.common.SimpleInvoker.invoke_process File "apache_beam/runners/common.py", line 1334, in apache_beam.runners.common._OutputProcessor.process_outputs File "/usr/local/lib/python3.7/site-packages/apache_beam/ml/gcp/cloud_dlp.py", line 199, in process item={"value": element}, **self.params)
[jira] [Created] (BEAM-13433) beam_PostCommit_Python37 failing, potentially due to apache_beam.ml.gcp.cloud_dlp_it_test.CloudDLPIT
Daniel Oliveira created BEAM-13433: -- Summary: beam_PostCommit_Python37 failing, potentially due to apache_beam.ml.gcp.cloud_dlp_it_test.CloudDLPIT Key: BEAM-13433 URL: https://issues.apache.org/jira/browse/BEAM-13433 Project: Beam Issue Type: Bug Components: test-failures Affects Versions: 2.36.0 Reporter: Daniel Oliveira Assignee: Pablo Estrada It's difficult for me to test for sure, because each run seems to show slightly different errors, and sometimes the errors don't even show at all. To track this down, you need to check the gradle build scan for the test, because the raw logs are too long to find the appropriate error. This is one that shows an error: [https://ci-beam.apache.org/job/beam_PostCommit_Python37/4617/] As far as I can tell, this is the error, apparently happening due to [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/gcp/cloud_dlp_it_test.py] {noformat} Traceback (most recent call last): File "apache_beam/runners/common.py", line 1198, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 536, in apache_beam.runners.common.SimpleInvoker.invoke_process File "apache_beam/runners/common.py", line 1334, in apache_beam.runners.common._OutputProcessor.process_outputs File "/usr/local/lib/python3.7/site-packages/apache_beam/ml/gcp/cloud_dlp.py", line 199, in process item={"value": element}, **self.params) TypeError: deidentify_content() got an unexpected keyword argument 'item' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 644, in do_work work_executor.execute() File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 208, in execute op.start() File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start File "apache_beam/runners/worker/operations.py", line 348, in apache_beam.runners.worker.operations.Operation.output File "apache_beam/runners/worker/operations.py", line 215, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive File "apache_beam/runners/worker/operations.py", line 707, in apache_beam.runners.worker.operations.DoOperation.process File "apache_beam/runners/worker/operations.py", line 708, in apache_beam.runners.worker.operations.DoOperation.process File "apache_beam/runners/common.py", line 1200, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 1281, in apach
[jira] [Created] (BEAM-13420) [Go SDK] Decoding a schema row doesn't respect field names from struct tags
Daniel Oliveira created BEAM-13420: -- Summary: [Go SDK] Decoding a schema row doesn't respect field names from struct tags Key: BEAM-13420 URL: https://issues.apache.org/jira/browse/BEAM-13420 Project: Beam Issue Type: Bug Components: sdk-go Reporter: Daniel Oliveira Will attempt to create a reproducible code snippet soon. For now, the basics of the bug are that when an element encoded as a Row via schemas gets decoded to a Go struct, it doesn't respect struct tags somewhere in the conversion process. More specifically, if I have an external transform that outputs a Row, and I take that as input to a native Go transform that accepts some Go struct "Foo", it will fail if the field names are different, even if the struct tags on Foo match the external row's field names. Example error message. The following error is caused by the WordCount and CorpusDate fields in the native struct not matching the field names "word_count" and "corpus_date" from the raw Row. The row gets decoded into a struct with the field names Word_count and Corpus_date, and ignoring the struct tags of the struct it's attempting to match: {noformat} panic: reflect: Call using struct { Word string "beam:\"word\""; Word_count int64 "beam:\"word_count\""; Corpus string "beam:\"corpus\""; Corpus_date int64 "beam:\"corpus_date\"" } as type struct { Word string "beam:\"word\""; WordCount int64 "beam:\"word_count\""; Corpus string "beam:\"corpus\""; CorpusDate int64 "beam:\"corpus_date\"" } Full error: while executing Process for Plan[process-bundle-descriptor-291]: 2: DataSink[S[ptransform-289@localhost:12371]] Coder:W;coder-315>!GWC 3: PCollection[pcollection-304] Out:[2] 4: ParDo[beam.addFixedKeyFn] Out:[2] 5: PCollection[pcollection-300] Out:[4] 6: ParDo[main.main.func1] Out:[5] 1: DataSource[S[ptransform-288@localhost:12371], 0] Coder:W;coder-310>!GWC Out:6 caused by: panic: reflect: Call using struct { Word string "beam:\"word\""; Word_count int64 "beam:\"word_count\""; Corpus string "beam:\"corpus\""; Corpus_date int64 "beam:\"corpus_date\"" } as type struct { Word string "beam:\"word\""; WordCount int64 "beam:\"word_count\""; Corpus string "beam:\"corpus\""; CorpusDate int64 "beam:\"corpus_date\"" } goroutine 39 [running]: runtime/debug.Stack() /usr/lib/google-golang/src/runtime/debug/stack.go:24 +0x65 github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec.callNoPanic.func1() /usr/local/google/home/danoliveira/repos/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:58 +0xa5 panic({0xe8f160, 0xc0003149e0}) /usr/lib/google-golang/src/runtime/panic.go:1038 +0x215 reflect.Value.call({0xc0003267e0, 0xc0003b3e10, 0x5be732}, {0x102e0e5, 0x4}, {0xcfe3d8, 0x1, 0x679031}) /usr/lib/google-golang/src/reflect/value.go:411 +0x1965 reflect.Value.Call({0xc0003267e0, 0xc0003b3e10, 0x1}, {0xcfe3d8, 0x1, 0x1}) /usr/lib/google-golang/src/reflect/value.go:339 +0xc5 github.com/apache/beam/sdks/v2/go/pkg/beam/core/util/reflectx.(*reflectFunc).Call(0xc000426060, {0xc0003326f0, 0x0, 0xf5ef00}) /usr/local/google/home/danoliveira/repos/beam/sdks/go/pkg/beam/core/util/reflectx/call.go:87 +0x59 github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec.(*invoker).initCall.func33({0x18e7a90, 0x1, 0x1}, 0xffdf3b645a1cac09) /usr/local/google/home/danoliveira/repos/beam/sdks/go/pkg/beam/core/runtime/exec/fn_arity.go:229 +0x7b github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec.(*invoker).Invoke(0xc000477860, {0x1168a50, 0xc00033cdc0}, {0x18e7a90, 0x0, 0x1}, 0x203000, 0xc00015d660, {0x1934830, 0x0, ...}) /usr/local/google/home/danoliveira/repos/beam/sdks/go/pkg/beam/core/runtime/exec/fn.go:186 +0x7a2 github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec.(*ParDo).invokeProcessFn(0xc0004420e0, {0x1168a50, 0xc00033cdc0}, {0x18e7a90, 0x1, 0x1}, 0x30, 0x1904be0) /usr/local/google/home/danoliveira/repos/beam/sdks/go/pkg/beam/core/runtime/exec/pardo.go:316 +0x146 github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec.(*ParDo).processSingleWindow(0xc0004420e0, 0xc00015d660) /usr/local/google/home/danoliveira/repos/beam/sdks/go/pkg/beam/core/runtime/exec/pardo.go:166 +0x4b github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec.(*ParDo).processMainInput(0xc000348fc0, 0x1168a50) /usr/local/google/home/danoliveira/repos/beam/sdks/go/pkg/beam/core/runtime/exec/pardo.go:146 +0x9c github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec.(*ParDo).ProcessElement(0xc0004420e0, {0x114d8a0, 0xc000426300}, 0xc0001f2540, {0x0, 0x0, 0x0}) /usr/local/google/home/danoliveira/repos/beam/sdks/go/pkg/beam/core/runtime/exec/pardo.go:132 +0x1a5 github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec.(*DataSource).Process(0xcec8c0, {0x1168a50, 0xc00033cd00}) /u
[jira] [Created] (BEAM-13419) Add Go integration test errors when forgetting ptest.Main/beam.Init
Daniel Oliveira created BEAM-13419: -- Summary: Add Go integration test errors when forgetting ptest.Main/beam.Init Key: BEAM-13419 URL: https://issues.apache.org/jira/browse/BEAM-13419 Project: Beam Issue Type: Improvement Components: sdk-go Reporter: Daniel Oliveira Assignee: Daniel Oliveira Currently when someone writes an integration test and forgets to put ptest.Main into TestMain (or their own code calling beam.Init), then the SDK harness runs the tests as unit tests and ends up passing them because ptest.Run and beam.Run seem to just instantly pass without a problem when beam.Init hasn't been called. The end result is that SDK harnesses in this setup just instantly pass all the tests and then close without any error messages. This code path should have an error added so that if beam.Init hasn't been run when ptest.Run executes, then it fails with an error. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (BEAM-13321) [Cross-Language] Externalize a minimal Implementation of Java's BigQuery IO
Daniel Oliveira created BEAM-13321: -- Summary: [Cross-Language] Externalize a minimal Implementation of Java's BigQuery IO Key: BEAM-13321 URL: https://issues.apache.org/jira/browse/BEAM-13321 Project: Beam Issue Type: New Feature Components: cross-language, io-java-gcp Reporter: Daniel Oliveira Assignee: Daniel Oliveira This is described in detail in this design doc: [https://s.apache.org/beam-bigquery-externalization] The short version of this task is to have a minimum viable implementation of BigQuery IO available for cross-language usage via SchemaIO. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (BEAM-12862) Implement Go SDK-side initialization of Java/Python expansion services.
[ https://issues.apache.org/jira/browse/BEAM-12862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-12862: --- Status: Open (was: Triage Needed) > Implement Go SDK-side initialization of Java/Python expansion services. > --- > > Key: BEAM-12862 > URL: https://issues.apache.org/jira/browse/BEAM-12862 > Project: Beam > Issue Type: New Feature > Components: cross-language, sdk-go >Reporter: Daniel Oliveira >Priority: P2 > > This feature allows users to run cross-language transforms without manually > starting an expansion service beforehand. If no expansion service is running, > the SDK will default to starting up a predetermined expansion service. > This behavior already exists in Java and Python, which might be a useful > reference point. > Note: It may be preferable to implement this after cross-language override > registration is implemented. That feature will allow registering alternate > behavior for expanding cross-language transforms, which is a good place to > slot in a default behavior for when no expansion address is provided. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (BEAM-12862) Implement Go SDK-side initialization of Java/Python expansion services.
[ https://issues.apache.org/jira/browse/BEAM-12862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira reassigned BEAM-12862: -- Assignee: Jack McCluskey > Implement Go SDK-side initialization of Java/Python expansion services. > --- > > Key: BEAM-12862 > URL: https://issues.apache.org/jira/browse/BEAM-12862 > Project: Beam > Issue Type: New Feature > Components: cross-language, sdk-go >Reporter: Daniel Oliveira >Assignee: Jack McCluskey >Priority: P2 > > This feature allows users to run cross-language transforms without manually > starting an expansion service beforehand. If no expansion service is running, > the SDK will default to starting up a predetermined expansion service. > This behavior already exists in Java and Python, which might be a useful > reference point. > Note: It may be preferable to implement this after cross-language override > registration is implemented. That feature will allow registering alternate > behavior for expanding cross-language transforms, which is a good place to > slot in a default behavior for when no expansion address is provided. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (BEAM-13215) Portable OSS runners do not support GCP credentials for GCP IOs.
Daniel Oliveira created BEAM-13215: -- Summary: Portable OSS runners do not support GCP credentials for GCP IOs. Key: BEAM-13215 URL: https://issues.apache.org/jira/browse/BEAM-13215 Project: Beam Issue Type: Bug Components: io-go-gcp, io-java-gcp, io-py-gcp, java-fn-execution Reporter: Daniel Oliveira The situation here is that when a pipeline is run on a portable runner using a GCP IO, and uses docker for the SDK Harness environment, the SDK Harness does not have the user's GCP credentials available and the pipeline fails. There are apparently [pipeline options for setting credentials|https://github.com/apache/beam/blob/v2.33.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L170], but as far as I can tell they are either meant only for non-portable pipelines, or only for the Dataflow runner. The tricky part of implementing this is that credentials for GCP are not straightforward, and having them available for something like the Application Default Credentials API involves copying over multiple files or environment variables. The following article provides a lot of context for the difficulties involved: [https://medium.com/datamindedbe/application-default-credentials-477879e31cb5] Possible solutions. Note these are mostly untested: # Perform some volume-mounting when calling the "docker run" command to mount directories containing credentials. Preferably this can be set via some sort of pipeline option. (This could potentially also be used to provide directories for docker containers to write output files to with TextIO or FileIO.) See the article above for an example. ** This solution may not work with runners on remote endpoints though. The directory mounted must be on the same machine as the docker container to work properly, which may not be possible in some cases with remote runners. # Require custom containers with appropriate credentials provided. This is more robust than the solution above, but less user-friendly, and would require a good amount of documentation to be available. ** This could be possible in conjunction with the solution above, and might be a good way of supporting GCP credentials on remote runners. Custom containers can store any valid credentials of the user's choice, (for example service account credentials for a production service) and then be run on any machine. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (BEAM-13037) Update Golang version on Jenkins worker images.
[ https://issues.apache.org/jira/browse/BEAM-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430737#comment-17430737 ] Daniel Oliveira commented on BEAM-13037: This seems to be some kind of PATH issue. When I SSH into the workers, /snap/bin/go is there and available and I can run the go tests manually from my instance just fine. But when the Jenkins workers try to use go, it's not available, and creating a symlink from /snap/bin/go to /usr/bin/go seems to fix the issue. I suspect the Jenkins agents either have some hardcoded path in their configuration that doesn't include the snap/bin directory, or it's cached in some way and it's trying to find the go binary in /usr/bin/go after it moved. As a workaround for now I'm creating a symlink on all the VMs as a workaround for now, but we should try to fix the root cause. > Update Golang version on Jenkins worker images. > --- > > Key: BEAM-13037 > URL: https://issues.apache.org/jira/browse/BEAM-13037 > Project: Beam > Issue Type: Task > Components: sdk-go, testing >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: P2 > Fix For: Not applicable > > > Update the version of the `go` command on our Jenkins VMs from 1.12.X to > 1.16.X, to match as closely as possible the current version specified for Go > in our BeamModulePlugin (1.16.5). > Follows the instructions on Confluence here: > https://cwiki.apache.org/confluence/display/BEAM/Jenkins+Tips#JenkinsTips-HowtoinstallandupgradesoftwareonJenkinsworkers -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-13037) Update Golang version on Jenkins worker images.
[ https://issues.apache.org/jira/browse/BEAM-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-13037: --- Fix Version/s: Not applicable Resolution: Fixed Status: Resolved (was: Open) > Update Golang version on Jenkins worker images. > --- > > Key: BEAM-13037 > URL: https://issues.apache.org/jira/browse/BEAM-13037 > Project: Beam > Issue Type: Task > Components: sdk-go, testing >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: P2 > Fix For: Not applicable > > > Update the version of the `go` command on our Jenkins VMs from 1.12.X to > 1.16.X, to match as closely as possible the current version specified for Go > in our BeamModulePlugin (1.16.5). > Follows the instructions on Confluence here: > https://cwiki.apache.org/confluence/display/BEAM/Jenkins+Tips#JenkinsTips-HowtoinstallandupgradesoftwareonJenkinsworkers -- This message was sent by Atlassian Jira (v8.3.4#803005)