Greetings everyone! Let's check in on 4.1, see how we're doing: https://butler.cassandra.apache.org/#/ We had 4 failures on our last run. We've gone back and forth a bit with the CASTest failure, a test introduced back in CASSANDRA-12126 @Ignore'd, however that showed some legitimate failures that should be addressed by Paxos V2. If anyone from the discussion has the cycles (or someone with familiarity with the area) could take assignee on the test failure ticket (17461) and responsibility for driving it to resolution that would help clarify our efforts there. (https://issues.apache.org/jira/browse/CASSANDRA-17461)
Along with that, we saw a failure in TopPartitionsTest.testServiceTopPartitionsSingleTable (cdc) and TestBootstrap.test_simultaneous_bootstrap (offheap). Given both are specific configurations of tests that ran successfully to completion in other configurations there's a reasonable chance they're flaky, be it from the logic of the test or the CI environment in which they're executing. Neither tickets appear to have active JIRA's associated with them in butler or in the kanban board (https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=496&quickFilter=2252) so we could use a volunteer here to both create those tickets and to drive them. We're close enough that we're ready to again visit how we want to treat the requirement for no flaky failures before we cut beta (https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle, "No flaky tests - All tests (Unit Tests and DTests) should pass consistently"). After seeing a couple releases with this requirement (4.0 and now 4.1), I'm inclined to agree with the comment from Dinesh that we should revise this requirement formally if we're going to effectively release with flaky tests anyway; best to be honest with ourselves and acknowledge it's not proving to be a forcing function for changing behavior. If this email doesn't see much traction on this topic I'll hit up the dev list with a DISCUSS thread on it. The kanban for 4.1 blockers show us 13 tickets: https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2455. Most of them are assigned and many in progress, however we have 3 unassigned if anyone wants to pick those up: https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2455&quickFilter=2160 [New Contributors Getting Started] One of the three issues on 4.1 blocker list or either of the 2 failing tests listed above would be great areas to focus your attention! Nuts and bolts / env / etc: here's an explanation of various types of contribution: https://cassandra.apache.org/_/community.html#how-to-contribute An overview of the C* architecture: https://cassandra.apache.org/doc/latest/cassandra/architecture/overview.html And here's our getting started contributing guide: https://cassandra.apache.org/_/development/index.html We hang out in #cassandra-dev on https://the-asf.slack.com, and you can ping the @cassandra_mentors alias to reach 13 of us who have volunteered to mentor new contributors on the project. Looking forward to seeing you there. [Dev list Digest] https://lists.apache.org/list?dev@cassandra.apache.org:lte=2w: The challenge of our eclectic usage of NULL strikes again with CEP-15. Avi opened up a ticket about this with https://issues.apache.org/jira/browse/CASSANDRA-17762. Caleb's working on the CQL support for multi-partition transactions on https://issues.apache.org/jira/browse/CASSANDRA-17719 where the general sentiment seems to be "let's go with a SQL-congruent syntax". Discussion about the potential benefits and downsides of a multi-threaded flushing CommitLog continue: https://lists.apache.org/thread/5j8ljtpdw3g0gyrx6m31gh1gjdkztclg. As this project is quite complex and has very different performance characteristics over time (in-memory initially only vs. long-term flushed to disk maintaining LSM trees), benchmarking features like this has proven difficult. Anyone with a perspective on the cost/benefits or who's interested in balancing that complexity vs. functionality feel free to chime in. An interesting question about inclusivity or exclusivity of token ranges and API consistency came up thanks to https://issues.apache.org/jira/browse/CASSANDRA-17575. https://lists.apache.org/thread/4tm626ffnqlvt4cbmopdfpd8w6fpqscd. This link doesn't capture the entire thread for some reason; the most clarifying observation to me comes from Jeremiah about the current usage of tokens in the tool: "Reading the responses here and taking a step back, I think the current behavior of nodetool compact is probably the correct behavior. The main use case I can see for using nodetool compact is someone wants to take some sstable and compact it with all the overlapping sstables" And last but not least, Claude Warren is looking for a reviewer on https://issues.apache.org/jira/browse/CASSANDRA-14218. Looks like Dinesh was flagged on that as reviewer awhile ago. [CI Trends] https://butler.cassandra.apache.org/#/ The last three weeks show us ticking up but the reason is not too surprising: 3.0: 10 -> 14 3.11: 15 -> 17 4.0: 1 -> 6 4.1: 5 -> 4 trunk: 5 -> 7 On the 3.0-4.0 branches, this looks to be due to TestRepair failing (https://issues.apache.org/jira/browse/CASSANDRA-17701 and https://issues.apache.org/jira/browse/CASSANDRA-17702). Neither of those tickets yet have an assignee so if anyone has the cycles or context to look into them that'd be great. 4.1 failures are slowly but surely contracting. [Release progress] https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2175 4.1 beta: We closed out 8 issues in the past couple of weeks. Some test fixes, restarting on gossip only nodes (CASSANDRA-17752), adding validation that the new config params are structured as we expect in 4.1 for JMX (CASSANDRA-17738), and cleaning up a straightforward doubling of the writePreparedStatement call in CASSANDRA-17764. 4.1 rc: Test fix (CASSANDRA-17769) Been a pretty quiet week on our older branches. So to sum it up: - CASTest failures blocking 4.1: https://issues.apache.org/jira/browse/CASSANDRA-17461, needs assignee - Regression on some TestRepair: https://issues.apache.org/jira/browse/CASSANDRA-17701 and https://issues.apache.org/jira/browse/CASSANDRA-17702, needs assignee - We should discuss whether we want to cut 4.1 w/known flaky tests in ASF CI or if we need to introduce more formal metrics around what "having no flakes" means (3, 5, 10 clean runs? Something else?) Thanks as always everyone; see you on slack. ~Josh