First, one of my learnings was that a ticket assigned to an issue in one branch of butler doesn't carry to another. So always search.
New failures from build lead week 7: I created a Jira filter for finding the tickets I created: https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20issuetype%20%3D%20Bug%20AND%20component%20in%20(%22Test%2Fdtest%2Fjava%22%2C%20%22Test%2Fdtest%2Fpython%22%2C%20%22Test%2Ffuzz%22%2C%20%22Test%2Funit%22)%20AND%20created%20%3E%3D%20-7d%20AND%20reporter%20in%20(xgerman42) *** CASSANDRA-18257<https://issues.apache.org/jira/browse/CASSANDRA-18257> - Test Failures: org.apache.cassandra.net.ProxyHandlerConnectionsTest.testExpireSome - linked in 4.0, 4.1, trunk *** CASSANDRA-18253<https://issues.apache.org/jira/browse/CASSANDRA-18253> - Test Failures: dtest repair_tests.repair_test.TestRepair.test_simple_sequential_repair - linked in 4.0, trunk *** CASSANDRA-18246<https://issues.apache.org/jira/browse/CASSANDRA-18246> - Test Failures: org.apache.cassandra.cql3.validation.operations.TTLTest.testCapNoWarnExpirationOverflowPolicy - linked in 3.11 *** CASSANDRA-18245<https://issues.apache.org/jira/browse/CASSANDRA-18245> - Test Failures: org.apache.cassandra.db.compaction.CompactionsTest.testDontPurgeAccidentally - linked in 3.11 - ________________________________ From: Dan Jatnieks <d...@datastax.com> Sent: Friday, February 10, 2023 2:42 PM To: dev@cassandra.apache.org <dev@cassandra.apache.org>; Claude Warren, Jr <claude.war...@aiven.io> Subject: [EXTERNAL] Re: Cassandra CI Status 2023-01-07 You don't often get email from d...@datastax.com. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> New Failures from Build Lead Week 6: *** CASSANDRA-18021 - Flaky org.apache.cassandra.distributed.test.ReprepareTestOldBehaviour#testReprepareMixedVersionWithoutReset - This existing ticket has been linked in butler to new failures on 3.11 *** CASSANDRA-17608 - Fix testMetricsWithRebuildAndStreamingToTwoNodes - Re-opened as intermittent failure occurred in build 1445 on trunk Several new failures had only a single occurrence; no new tickets were opened during this time. On Fri, Feb 10, 2023 at 12:44 AM Claude Warren, Jr via dev <dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>> wrote: New Failures from Build Lead Week 5 *** CASSANDRA-18198 - "AttributeError: module 'py' has no attribute 'io'" reported in multiple tests - reported in 4.1, 3.11, and 3.0 - identified as a possible class loader issue associated with CASSANDRA-18150 *** CASSANDRA-18191 - Native Transport SSL tests failing - TestNativeTransportSSL.test_connect_to_ssl and TestNativeTransportSSL.test_connect_to_ssl (novnode) - TestNativeTransportSSL.test_connect_to_ssl_optional and TestNativeTransportSSL.test_connect_to_ssl_optional (nvnode) On Mon, Jan 23, 2023 at 10:10 PM Caleb Rackliffe <calebrackli...@gmail.com<mailto:calebrackli...@gmail.com>> wrote: New failures from Build Lead Week 4: *** CASSANDRA-18188 - Test failure in upgrade_tests.cql_tests.cls.test_limit_ranges - trunk - AttributeError: module 'py' has no attribute 'io' *** CASSANDRA-18189 - Test failure in cqlsh_tests.test_cqlsh_copy.TestCqlshCopy.test_bulk_round_trip_with_timeouts - 4.0 - assert 100000 == 94764 - other failures currently open in this test class, but at least superficially, different errors (see CASSANDRA-17322, CASSANDRA-18162) Timeouts continue to manifest in many places. On Sun, Jan 15, 2023 at 6:02 AM Mick Semb Wever <m...@apache.org<mailto:m...@apache.org>> wrote: *** The Butler (Build Lead) The introduction of Butler and the Build Lead was a wonderful improvement to our CI efforts. It has brought a lot of hygiene in listing out flakies as they happened. Noted that this has in-turn increased the burden in getting our major releases out, but that's to be seen as a one-off cost. New Failures from Build Lead Week 3. *** CASSANDRA-18156 – repair_tests.deprecated_repair_test.TestDeprecatedRepairNotifications.test_deprecated_repair_error_notification - AssertionError: Node logs don't have an error message for the failed repair - hard regression - 3.0, 3.11, *** CASSANDRA-18164 – CASTest Message serializedSize(12) does not match what was written with serialize(out, 12) for verb PAXOS2_COMMIT_AND_PREPARE_RSP - serializer class org.apache.cassandra.net.Message$Serializer; expected 1077, actual 1079 - 4.1, trunk *** CASSANDRA-18158 – org.apache.cassandra.distributed.upgrade.MixedModeReadTest.mixedModeReadColumnSubsetDigestCheck - Cannot achieve consistency level ALL - 3.11, trunk *** CASSANDRA-18159 – repair_tests.repair_test.TestRepair.test_*dc_repair - AssertionError: null in MemtablePool$SubPool.released(MemtablePool.java:193) - 3.11, 4.0, 4.1, trunk *** CASSANDRA-18160 – cdc_test.TestCDC.test_insertion_and_commitlog_behavior_after_reaching_cdc_total_space - Found orphaned index file in after CDC state not in former - 4.1, trunk *** CASSANDRA-18161 – org.apache.cassandra.transport.CQLConnectionTest.handleCorruptionOfLargeMessageFrame - AssertionFailedError in CQLConnectionTest.testFrameCorruption(CQLConnectionTest.java:491) - 4.0, 4.1, trunk *** CASSANDRA-18162 – cqlsh_tests.test_cqlsh_copy.TestCqlshCopy.test_bulk_round_trip_non_prepared_statements - Inet address 127.0.0.3:7000<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__http%3A%2F%2F127.0.0.3%3A7000__%3B!!PbtH5S7Ebw!YEoii6nwyF5UJmdW2-iNyty-vVKK9DSX9YlVQz6-_ah_qD977sCWk4JjyvTkmgaR0v4kGAUf9U82elTW%24&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7C63aa0b85f533423bb8e908db0bb837c9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638116658079192598%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HN2agapzMXtZsruO%2FhgPdrmXinDVtL0FSmr1%2F2AxV8o%3D&reserved=0> is not available: [Errno 98] Address already in use - 3.0, 3.11, 4.0, 4.1, trunk *** CASSANDRA-18163 – transient_replication_test.TestTransientReplicationRepairLegacyStreaming.test_speculative_write_repair_cycle - AssertionError Incoming stream entireSSTable - 4.0, 4.1, trunk While writing these up, some thoughts… - While Butler reports failures against multiple branches, there's no feedback/sync that the ticket needs its fixVersions updated when failures happen in other branches after the ticket is created. - In 4.0 onwards, a majority of the failures are timeouts (>900s), reinforcing that the current main problem we are facing in ci-cassandra.a.o is saturation/infra