Re: Cassandra project status update 2022-08-03

Ekaterina Dimitrova Wed, 03 Aug 2022 12:17:26 -0700

Re: 17738 - the ticket was about any new properties which are actually not
of the new types. It had to guarantee that there is no disconnect between
updating Settings Virtual Table after startup and JMX setters/getters. (In
one of its “brother” tickets the issues we found exist since 4.0) I bring
it up as we need to ensure configuration parameters update the original
Config parameters from JMX if we want Settings Virtual Table to be properly
updated after startup and thus cut the confusion for the users. This is
actually a goal for this VT stated also in our Docs and the original ticket.
Raising the point again as while we still have both VT and JMX we need to
be sure we provide consistent information for our users. I will put also a
note in the Config docs to stress on this and remind people.
Probably when we add the update option for the Settings Virtual Table in
the next version we will need to think of better way to keep this in sync
or even start deprecating JMX but for now this is what we have in place and
we need to maintain it.


Thank you Josh for the report, it is always valuable!

About flaky tests - in my personal opinion it is more about what
outstanding flaky tests we have then how many. We can have 3 which surface
legit bugs, we can have 10 presenting only timeouts which are due to
environmental issues. These days I see Circle CI green all the time which
is really promising as many of our legit bugs were discovered there. With
that said, I guess we can just revise on a regular basis what exactly are
the last flakes and not numbers which also change quickly up and down with
the first change in the Infra.

On Wed, 3 Aug 2022 at 13:17, Josh McKenzie <jmcken...@apache.org> wrote:

> Greetings everyone! Let's check in on 4.1, see how we're doing:
>
> https://butler.cassandra.apache.org/#/
> We had 4 failures on our last run. We've gone back and forth a bit with
> the CASTest failure, a test introduced back in CASSANDRA-12126 @Ignore'd,
> however that showed some legitimate failures that should be addressed by
> Paxos V2. If anyone from the discussion has the cycles (or someone with
> familiarity with the area) could take assignee on the test failure ticket
> (17461) and responsibility for driving it to resolution that would help
> clarify our efforts there. (
> https://issues.apache.org/jira/browse/CASSANDRA-17461)
>
> Along with that, we saw a failure in
> TopPartitionsTest.testServiceTopPartitionsSingleTable (cdc) and
> TestBootstrap.test_simultaneous_bootstrap (offheap). Given both are
> specific configurations of tests that ran successfully to completion in
> other configurations there's a reasonable chance they're flaky, be it from
> the logic of the test or the CI environment in which they're executing.
> Neither tickets appear to have active JIRA's associated with them in butler
> or in the kanban board (
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=496&quickFilter=2252)
> so we could use a volunteer here to both create those tickets and to drive
> them.
>
> We're close enough that we're ready to again visit how we want to treat
> the requirement for no flaky failures before we cut beta (
> https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle,
> "No flaky tests - All tests (Unit Tests and DTests) should pass
> consistently"). After seeing a couple releases with this requirement (4.0
> and now 4.1), I'm inclined to agree with the comment from Dinesh that we
> should revise this requirement formally if we're going to effectively
> release with flaky tests anyway; best to be honest with ourselves and
> acknowledge it's not proving to be a forcing function for changing
> behavior. If this email doesn't see much traction on this topic I'll hit up
> the dev list with a DISCUSS thread on it.
>
> The kanban for 4.1 blockers show us 13 tickets:
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2455.
> Most of them are assigned and many in progress, however we have 3
> unassigned if anyone wants to pick those up:
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2455&quickFilter=2160
>
>
> [New Contributors Getting Started]
> One of the three issues on 4.1 blocker list or either of the 2 failing
> tests listed above would be great areas to focus your attention!
>
> Nuts and bolts / env / etc: here's an explanation of various types of
> contribution:
> https://cassandra.apache.org/_/community.html#how-to-contribute
> An overview of the C* architecture:
> https://cassandra.apache.org/doc/latest/cassandra/architecture/overview.html
> And here's our getting started contributing guide:
> https://cassandra.apache.org/_/development/index.html
> We hang out in #cassandra-dev on https://the-asf.slack.com, and you can
> ping the @cassandra_mentors alias to reach 13 of us who have volunteered to
> mentor new contributors on the project. Looking forward to seeing you there.
>
>
> [Dev list Digest]
> https://lists.apache.org/list?dev@cassandra.apache.org:lte=2w:
>
> The challenge of our eclectic usage of NULL strikes again with CEP-15. Avi
> opened up a ticket about this with
> https://issues.apache.org/jira/browse/CASSANDRA-17762. Caleb's working on
> the CQL support for multi-partition transactions on
> https://issues.apache.org/jira/browse/CASSANDRA-17719 where the general
> sentiment seems to be "let's go with a SQL-congruent syntax".
>
> Discussion about the potential benefits and downsides of a multi-threaded
> flushing CommitLog continue:
> https://lists.apache.org/thread/5j8ljtpdw3g0gyrx6m31gh1gjdkztclg. As this
> project is quite complex and has very different performance characteristics
> over time (in-memory initially only vs. long-term flushed to disk
> maintaining LSM trees), benchmarking features like this has proven
> difficult. Anyone with a perspective on the cost/benefits or who's
> interested in balancing that complexity vs. functionality feel free to
> chime in.
>
> An interesting question about inclusivity or exclusivity of token ranges
> and API consistency came up thanks to
> https://issues.apache.org/jira/browse/CASSANDRA-17575.
> https://lists.apache.org/thread/4tm626ffnqlvt4cbmopdfpd8w6fpqscd. This
> link doesn't capture the entire thread for some reason; the most clarifying
> observation to me comes from Jeremiah about the current usage of tokens in
> the tool: "Reading the responses here and taking a step back, I think the
> current behavior of nodetool compact is probably the correct behavior. The
> main use case I can see for using nodetool compact is someone wants to take
> some sstable and compact it with all the overlapping sstables"
>
> And last but not least, Claude Warren is looking for a reviewer on
> https://issues.apache.org/jira/browse/CASSANDRA-14218. Looks like Dinesh
> was flagged on that as reviewer awhile ago.
>
> [CI Trends]
> https://butler.cassandra.apache.org/#/
>
> The last three weeks show us ticking up but the reason is not too
> surprising:
>
> 3.0: 10 -> 14
> 3.11: 15 -> 17
> 4.0: 1 -> 6
> 4.1: 5 -> 4
> trunk: 5 -> 7
>
> On the 3.0-4.0 branches, this looks to be due to TestRepair failing (
> https://issues.apache.org/jira/browse/CASSANDRA-17701 and
> https://issues.apache.org/jira/browse/CASSANDRA-17702). Neither of those
> tickets yet have an assignee so if anyone has the cycles or context to look
> into them that'd be great.
>
> 4.1 failures are slowly but surely contracting.
>
>
> [Release progress]
>
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2175
>
> 4.1 beta:
> We closed out 8 issues in the past couple of weeks. Some test fixes,
> restarting on gossip only nodes (CASSANDRA-17752), adding validation that
> the new config params are structured as we expect in 4.1 for JMX
> (CASSANDRA-17738), and cleaning up a straightforward doubling of the
> writePreparedStatement call in CASSANDRA-17764.
>
> 4.1 rc:
> Test fix (CASSANDRA-17769)
>
> Been a pretty quiet week on our older branches.
>
> So to sum it up:
> - CASTest failures blocking 4.1:
> https://issues.apache.org/jira/browse/CASSANDRA-17461, needs assignee
> - Regression on some TestRepair:
> https://issues.apache.org/jira/browse/CASSANDRA-17701 and
> https://issues.apache.org/jira/browse/CASSANDRA-17702, needs assignee
> - We should discuss whether we want to cut 4.1 w/known flaky tests in ASF
> CI or if we need to introduce more formal metrics around what "having no
> flakes" means (3, 5, 10 clean runs? Something else?)
>
> Thanks as always everyone; see you on slack.
>
>
> ~Josh
>

Re: Cassandra project status update 2022-08-03

Reply via email to