Flaky test issue report
This is your daily summary of Beam's current flaky tests. These are P1 issues because they have a major negative impact on the community and make it hard to determine the quality of the software. BEAM-12200: SamzaStoreStateInternalsTest is flaky (https://issues.apache.org/jira/browse/BEAM-12200) BEAM-12163: Python GHA PreCommits flake with grpc.FutureTimeoutError on SDK harness startup (https://issues.apache.org/jira/browse/BEAM-12163) BEAM-12061: beam_PostCommit_SQL failing on KafkaTableProviderIT.testFakeNested (https://issues.apache.org/jira/browse/BEAM-12061) BEAM-12020: :sdks:java:container:java8:docker failing missing licenses (https://issues.apache.org/jira/browse/BEAM-12020) BEAM-12019: apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics is flaky (https://issues.apache.org/jira/browse/BEAM-12019) BEAM-11792: Python precommit failed (flaked?) installing package (https://issues.apache.org/jira/browse/BEAM-11792) BEAM-11733: [beam_PostCommit_Java] [testFhirIO_Import|export] flaky (https://issues.apache.org/jira/browse/BEAM-11733) BEAM-11666: apache_beam.runners.interactive.recording_manager_test.RecordingManagerTest.test_basic_execution is flaky (https://issues.apache.org/jira/browse/BEAM-11666) BEAM-11662: elasticsearch tests failing (https://issues.apache.org/jira/browse/BEAM-11662) BEAM-11661: hdfsIntegrationTest flake: network not found (py38 postcommit) (https://issues.apache.org/jira/browse/BEAM-11661) BEAM-11646: beam_PostCommit_XVR_Spark failing (https://issues.apache.org/jira/browse/BEAM-11646) BEAM-11645: beam_PostCommit_XVR_Flink failing (https://issues.apache.org/jira/browse/BEAM-11645) BEAM-11541: testTeardownCalledAfterExceptionInProcessElement flakes on direct runner. (https://issues.apache.org/jira/browse/BEAM-11541) BEAM-11540: Linter sometimes flakes on apache_beam.dataframe.frames_test (https://issues.apache.org/jira/browse/BEAM-11540) BEAM-11493: Spark test failure: org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyAndWindows (https://issues.apache.org/jira/browse/BEAM-11493) BEAM-11492: Spark test failure: org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyMergingWindows (https://issues.apache.org/jira/browse/BEAM-11492) BEAM-11491: Spark test failure: org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyMultipleWindows (https://issues.apache.org/jira/browse/BEAM-11491) BEAM-11490: Spark test failure: org.apache.beam.sdk.transforms.ReifyTimestampsTest.inValuesSucceeds (https://issues.apache.org/jira/browse/BEAM-11490) BEAM-11489: Spark test failure: org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics (https://issues.apache.org/jira/browse/BEAM-11489) BEAM-11488: Spark test failure: org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedCounterMetrics (https://issues.apache.org/jira/browse/BEAM-11488) BEAM-11487: Spark test failure: org.apache.beam.sdk.transforms.WithTimestampsTest.withTimestampsShouldApplyTimestamps (https://issues.apache.org/jira/browse/BEAM-11487) BEAM-11486: Spark test failure: org.apache.beam.sdk.testing.PAssertTest.testSerializablePredicate (https://issues.apache.org/jira/browse/BEAM-11486) BEAM-11485: Spark test failure: org.apache.beam.sdk.transforms.CombineFnsTest.testComposedCombineNullValues (https://issues.apache.org/jira/browse/BEAM-11485) BEAM-11484: Spark test failure: org.apache.beam.runners.core.metrics.MetricsPusherTest.pushesUserMetrics (https://issues.apache.org/jira/browse/BEAM-11484) BEAM-11483: Spark portable streaming PostCommit Test Improvements (https://issues.apache.org/jira/browse/BEAM-11483) BEAM-10995: Java + Universal Local Runner: WindowingTest.testWindowPreservation fails (https://issues.apache.org/jira/browse/BEAM-10995) BEAM-10987: stager_test.py::StagerTest::test_with_main_session flaky on windows py3.6,3.7 (https://issues.apache.org/jira/browse/BEAM-10987) BEAM-10968: flaky test: org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics (https://issues.apache.org/jira/browse/BEAM-10968) BEAM-10955: Flink Java Runner test flake: Could not find Flink job (https://issues.apache.org/jira/browse/BEAM-10955) BEAM-10923: Python requirements installation in docker container is flaky (https://issues.apache.org/jira/browse/BEAM-10923) BEAM-10899: test_FhirIO_exportFhirResourcesGcs flake with OOM (https://issues.apache.org/jira/browse/BEAM-10899) BEAM-10866: PortableRunnerTestWithSubprocesses.test_register_finalizations flaky on macOS (https://issues.apache.org/jira/browse/BEAM-10866) BEAM-10763: Spotless flake (NullPointerException) (https://issues.apache.org/jira/browse/BEAM-10763) BEAM-10590: BigQueryQueryToTableIT flaky: test_big_query_new_types (https
P1 issues report
This is your daily summary of Beam's current P1 issues, not including flaky tests. See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the meaning and expectations around P1 issues. BEAM-1: Dataflow side input translation "Unknown producer for value" (https://issues.apache.org/jira/browse/BEAM-1) BEAM-12205: Dataflow pipelines broken NoSuchMethodError DoFnInvoker.invokeSetup() (https://issues.apache.org/jira/browse/BEAM-12205) BEAM-12195: Flink Runner 1.11 uses old Scala-Version (https://issues.apache.org/jira/browse/BEAM-12195) BEAM-11959: Python Beam SDK Harness hangs when installing pip packages (https://issues.apache.org/jira/browse/BEAM-11959) BEAM-11906: No trigger early repeatedly for session windows (https://issues.apache.org/jira/browse/BEAM-11906) BEAM-11875: XmlIO.Read does not handle XML encoding per spec (https://issues.apache.org/jira/browse/BEAM-11875) BEAM-11828: JmsIO is not acknowledging messages correctly (https://issues.apache.org/jira/browse/BEAM-11828) BEAM-11755: Cross-language consistency (RequiresStableInputs) is quietly broken (at least on portable flink runner) (https://issues.apache.org/jira/browse/BEAM-11755) BEAM-11578: `dataflow_metrics` (python) fails with TypeError (when int overflowing?) (https://issues.apache.org/jira/browse/BEAM-11578) BEAM-11576: Go ValidatesRunner failure: TestFlattenDup on Dataflow Runner (https://issues.apache.org/jira/browse/BEAM-11576) BEAM-11434: Expose Spanner admin/batch clients in Spanner Accessor (https://issues.apache.org/jira/browse/BEAM-11434) BEAM-11227: Upgrade beam-vendor-grpc-1_26_0-0.3 to fix CVE-2020-27216 (https://issues.apache.org/jira/browse/BEAM-11227) BEAM-11148: Kafka commitOffsetsInFinalize OOM on Flink (https://issues.apache.org/jira/browse/BEAM-11148) BEAM-11017: Timer with dataflow runner can be set multiple times (dataflow runner) (https://issues.apache.org/jira/browse/BEAM-11017) BEAM-10861: Adds URNs and payloads to PubSub transforms (https://issues.apache.org/jira/browse/BEAM-10861) BEAM-10617: python CombineGlobally().with_fanout() cause duplicate combine results for sliding windows (https://issues.apache.org/jira/browse/BEAM-10617) BEAM-10569: SpannerIO tests don't actually assert anything. (https://issues.apache.org/jira/browse/BEAM-10569) BEAM-10288: Quickstart documents are out of date (https://issues.apache.org/jira/browse/BEAM-10288) BEAM-10244: Populate requirements cache fails on poetry-based packages (https://issues.apache.org/jira/browse/BEAM-10244) BEAM-10100: FileIO writeDynamic with AvroIO.sink not writing all data (https://issues.apache.org/jira/browse/BEAM-10100) BEAM-9564: Remove insecure ssl options from MongoDBIO (https://issues.apache.org/jira/browse/BEAM-9564) BEAM-9455: Environment-sensitive provisioning for Dataflow (https://issues.apache.org/jira/browse/BEAM-9455) BEAM-9293: Python direct runner doesn't emit empty pane when it should (https://issues.apache.org/jira/browse/BEAM-9293) BEAM-8986: SortValues may not work correct for numerical types (https://issues.apache.org/jira/browse/BEAM-8986) BEAM-8985: SortValues should fail if SecondaryKey coder is not deterministic (https://issues.apache.org/jira/browse/BEAM-8985) BEAM-8407: [SQL] Some Hive tests throw NullPointerException, but get marked as passing (Direct Runner) (https://issues.apache.org/jira/browse/BEAM-8407) BEAM-7717: PubsubIO watermark tracking hovers near start of epoch (https://issues.apache.org/jira/browse/BEAM-7717) BEAM-7716: PubsubIO returns empty message bodies for all messages read (https://issues.apache.org/jira/browse/BEAM-7716) BEAM-7195: BigQuery - 404 errors for 'table not found' when using dynamic destinations - sometimes, new table fails to get created (https://issues.apache.org/jira/browse/BEAM-7195) BEAM-6839: User reports protobuf ClassChangeError running against 2.6.0 or above (https://issues.apache.org/jira/browse/BEAM-6839) BEAM-6466: KafkaIO doesn't commit offsets while being used as bounded source (https://issues.apache.org/jira/browse/BEAM-6466)
Re: [VOTE] Release 2.29.0, release candidate #1
I did an additional round of making sure the human-readable quickstart instructions also succeed. Kenn On Thu, Apr 22, 2021 at 6:47 PM Ahmet Altay wrote: > +1 (binding) > > I ran some python quick start examples. Most validations in the sheet were > already done :) Thank you all! > > On Thu, Apr 22, 2021 at 9:15 AM Kyle Weaver wrote: > >> +1 (non-) >> >> Ran Python wordcount on Flink and Spark. >> >> On Wed, Apr 21, 2021 at 5:20 PM Brian Hulette >> wrote: >> >>> +1 (non-binding) >>> >>> I ran a python pipeline exercising the DataFrame API, and another >>> exercising SQLTransform in Python, both on Dataflow. >>> >>> On Wed, Apr 21, 2021 at 12:55 PM Kenneth Knowles >>> wrote: >>> Since the artifacts were changed about 26 hours ago, I intend to leave this vote open until 46 hours from now. Specifically, around noon my time (US Pacific) on Friday I will close the vote and finalize the release, if no problems are discovered. Kenn On Wed, Apr 21, 2021 at 12:52 PM Kenneth Knowles wrote: > +1 (binding) > > I ran the script at > https://beam.apache.org/contribute/release-guide/#run-validations-using-run_rc_validationsh > except for the part that requires a GitHub PR, since Cham already did that > part. > > Kenn > > On Wed, Apr 21, 2021 at 12:11 PM Valentyn Tymofieiev < > valen...@google.com> wrote: > >> +1, verified that my previous findings are fixed. >> >> On Wed, Apr 21, 2021 at 8:17 AM Chamikara Jayalath < >> chamik...@google.com> wrote: >> >>> +1 (binding) >>> >>> Ran some Python scenarios and updated the spreadsheet. >>> >>> Thanks, >>> Cham >>> >>> On Tue, Apr 20, 2021 at 3:39 PM Kenneth Knowles >>> wrote: >>> On Tue, Apr 20, 2021 at 3:24 PM Robert Bradshaw < rober...@google.com> wrote: > The artifacts and signatures look good to me. +1 (binding) > > (The release branch still has the .dev name, maybe you didn't > push? > https://github.com/apache/beam/blob/release-2.29.0/sdks/python/apache_beam/version.py > ) > Good point. I'll highlight that I finally implemented the branching changes from https://lists.apache.org/thread.html/205472bdaf3c2c5876533750d417c19b0d1078131a3dc04916082ce8%40%3Cdev.beam.apache.org%3E The new guide with diagram is here: https://beam.apache.org/contribute/release-guide/#tag-a-chosen-commit-for-the-rc TL;DR: - the release branch continues to be dev/SNAPSHOT for 2.29.0 while the main branch is now dev/SNAPSHOT for 2.30.0 - the RC tag v2.29.0-RC1 no longer lies on the release branch. It is a single tagged commit that removes the dev/SNAPSHOT suffix Kenn > On Tue, Apr 20, 2021 at 10:36 AM Kenneth Knowles > wrote: > >> Please take another look. >> >> - I re-ran the RC creation script so the source release and >> wheels are new and built from the RC tag. I confirmed the source zip >> and >> wheels have version 2.29.0 (not .dev or -SNAPSHOT). >> - I fixed and rebuilt Dataflow worker container images from >> exactly the RC commit, added dataclasses, with internal changes to >> get the >> version to match. >> - I confirmed that the staged jars already have version 2.29.0 >> (not -SNAPSHOT). >> - I confirmed with `diff -r -q` that the source tarball matches >> the RC tag (minus the .git* files and directories and gradlew) >> >> Kenn >> >> On Mon, Apr 19, 2021 at 9:19 PM Kenneth Knowles >> wrote: >> >>> At this point, the release train has just about come around to >>> 2.30.0 which will pick up that change. I don't think it makes sense >>> to >>> cherry-pick anything more into 2.29.0 unless it is nonfunctional. >>> As it is, >>> I think we have a good commit and just need to build the expected >>> artifacts. Since it isn't all the artifacts, I was planning on just >>> overwriting the RC1 artifacts in question and re-verify. I could >>> also roll >>> a new RC2 from the same commit fairly easily. >>> >>> Kenn >>> >>> On Mon, Apr 19, 2021 at 8:57 PM Reuven Lax >>> wrote: >>> Any chance we could include https://github.com/apache/beam/pull/14548? On Mon, Apr 19, 2021 at 8:54 PM Kenneth Knowles < k...@apache.org> wrote: > To clarify: I am running and fixing the release scripts on the > `master` branch. They work from fresh clones of the RC tag so > this s
Re: [VOTE] Release 2.29.0, release candidate #1
+1 (non-binding) Thanks for tirelessly working on improving the python client :). This is a friendly visit from Apache Airflow here. I've just tested the 2.29.0rc1 in our "apache.beam" provider's tests and they are all Green. Just to give a bit of context here. We are eagerly waiting for the 2.29.0rc1 release as it will unblock a few things for us - most notably, relaxing PyArrow dependency will help us to add Python 3.9 support to Apache Airflow (It's been long overdue and pyarrow < 3.0.0 coming from Apache Beam was one of the last blockers). Also FYI. I am happy to be a bit more involved with some (possible) future dependency improvements for Beam. We had a bit of struggle with PIP 21 which has hard time with some of the dependency conflicts. We've managed to workaround it for the moment (https://github.com/apache/airflow/pull/15513), but looking forward to improve this and make it better (especially moving all google python clients to > 2). On 2021/04/23 01:46:51, Ahmet Altay wrote: > +1 (binding) > > I ran some python quick start examples. Most validations in the sheet were > already done :) Thank you all! > > On Thu, Apr 22, 2021 at 9:15 AM Kyle Weaver wrote: > > > +1 (non-) > > > > Ran Python wordcount on Flink and Spark. > > > > On Wed, Apr 21, 2021 at 5:20 PM Brian Hulette wrote: > > > >> +1 (non-binding) > >> > >> I ran a python pipeline exercising the DataFrame API, and another > >> exercising SQLTransform in Python, both on Dataflow. > >> > >> On Wed, Apr 21, 2021 at 12:55 PM Kenneth Knowles wrote: > >> > >>> Since the artifacts were changed about 26 hours ago, I intend to leave > >>> this vote open until 46 hours from now. Specifically, around noon my time > >>> (US Pacific) on Friday I will close the vote and finalize the release, if > >>> no problems are discovered. > >>> > >>> Kenn > >>> > >>> On Wed, Apr 21, 2021 at 12:52 PM Kenneth Knowles > >>> wrote: > >>> > +1 (binding) > > I ran the script at > https://beam.apache.org/contribute/release-guide/#run-validations-using-run_rc_validationsh > except for the part that requires a GitHub PR, since Cham already did > that > part. > > Kenn > > On Wed, Apr 21, 2021 at 12:11 PM Valentyn Tymofieiev < > valen...@google.com> wrote: > > > +1, verified that my previous findings are fixed. > > > > On Wed, Apr 21, 2021 at 8:17 AM Chamikara Jayalath < > > chamik...@google.com> wrote: > > > >> +1 (binding) > >> > >> Ran some Python scenarios and updated the spreadsheet. > >> > >> Thanks, > >> Cham > >> > >> On Tue, Apr 20, 2021 at 3:39 PM Kenneth Knowles > >> wrote: > >> > >>> > >>> > >>> On Tue, Apr 20, 2021 at 3:24 PM Robert Bradshaw > >>> wrote: > >>> > The artifacts and signatures look good to me. +1 (binding) > > (The release branch still has the .dev name, maybe you didn't push? > https://github.com/apache/beam/blob/release-2.29.0/sdks/python/apache_beam/version.py > ) > > >>> > >>> Good point. I'll highlight that I finally implemented the branching > >>> changes from > >>> https://lists.apache.org/thread.html/205472bdaf3c2c5876533750d417c19b0d1078131a3dc04916082ce8%40%3Cdev.beam.apache.org%3E > >>> > >>> The new guide with diagram is here: > >>> https://beam.apache.org/contribute/release-guide/#tag-a-chosen-commit-for-the-rc > >>> > >>> TL;DR: > >>> - the release branch continues to be dev/SNAPSHOT for 2.29.0 while > >>> the main branch is now dev/SNAPSHOT for 2.30.0 > >>> - the RC tag v2.29.0-RC1 no longer lies on the release branch. It > >>> is a single tagged commit that removes the dev/SNAPSHOT suffix > >>> > >>> Kenn > >>> > >>> > On Tue, Apr 20, 2021 at 10:36 AM Kenneth Knowles > wrote: > > > Please take another look. > > > > - I re-ran the RC creation script so the source release and > > wheels are new and built from the RC tag. I confirmed the source > > zip and > > wheels have version 2.29.0 (not .dev or -SNAPSHOT). > > - I fixed and rebuilt Dataflow worker container images from > > exactly the RC commit, added dataclasses, with internal changes to > > get the > > version to match. > > - I confirmed that the staged jars already have version 2.29.0 > > (not -SNAPSHOT). > > - I confirmed with `diff -r -q` that the source tarball matches > > the RC tag (minus the .git* files and directories and gradlew) > > > > Kenn > > > > On Mon, Apr 19, 2021 at 9:19 PM Kenneth Knowles > > wrote: > > > >> At this point, the release train has just about come around to > >> 2.30.0 which will pick up that change. I don't think it makes > >> sense to > >>>