I confirm we're almost done with CASSANDRA-15580 (Repair QA testing).
Nightly runs are scheduled already and I'm tuning some knobs to get them
nicely stable:
https://app.circleci.com/pipelines/github/riptano/cassandra-rtest?branch=trunk
Andres also created an in-jvm dtest for the mixed cluster repair test that
is under review.

Scope-wise, I'd suggest we keep the repair tests repo separate for now and
work on integrating it into the Cassandra codebase post-4.0. It could as
well be a separate repo altogether, depending on what would be the
consensus here.
I also had to reduce our ambitions on node density (from 100GB down to
20GB) due to how long the tests are taking already (almost 3 hours with
Full/Incremental/Subrange running in parallel, so that's roughly 6 hours of
AWS instance time per run when things go nicely). It's possible that the
test backup has too much entropy, but it may be a good thing as I'd rather
test smaller datasets with a lot of entropy rather than big ones with not
much. It already allowed to uncover CASSANDRA-16406
<https://issues.apache.org/jira/browse/CASSANDRA-16406>.

I'll update the tickets to reflect the current state.

Le jeu. 21 janv. 2021 à 23:23, Scott Andreas <sc...@paradoxica.net> a
écrit :

> Thanks Benjamin!
>
> I propose we de-scope 15538 as the ticket does not currently have a clear
> definition of done. Unless others disagree, we can remove the fix version
> via lazy consensus in a couple days. That leaves us with a well-defined set
> of tickets that are making progress.
>
> Re: the next question:
> "Do you have a timeframe in mind for releasing 4.0 GA? Assuming that there
> is no sudden burst in the number of issues."
>
> This is a great question for all on the list. Please consider what follows
> as my interpretation of our current status relative to the project's
> Release Lifecycle doc (and all "we/you" pronouns collective):
> https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle
>
> We're currently meeting all criteria for the Beta phase except "No flaky
> tests" and a small number of known bugs (eg., 16307, 16078). The good news
> is we have the tickets in both categories identified (discussed earlier in
> this thread), and they don't appear to be a large amount of work -
> potentially with the exception of CASSANDRA-16078: Performance regression
> for queries accessing multiple rows. The ticket reports a 39% perf
> regression for queries fetching multiple rows in a partition via IN clauses
> – a major regression that should block release until understood/fixed.
> Caleb's working on this now.
>
> Once those issues and the validation epics that are now in review are
> wrapped (which look like a few weeks' work if contributors can jump on the
> flaky test tickets), we'll have met our criteria for graduating beta.
>
> The definition of an RC release is that any SHA we cut an RC build from
> may legitimately be the SHA declared "Apache Cassandra 4.0.0." This is
> where it gets real. When the project declares a build "RC," we're staking
> our collective credibility on it and recommend that users upgrade to a
> build that received this designation.
>
> I feel very good about where 4.0 is at. We've all surfaced and resolved a
> large number of important issues. We've enhanced the project's testing
> infrastructure to broaden the surface covered, which reduces the
> probability of unknown unknowns. And we've collectively developed
> toolchains for large-scale verification, including of existing live
> clusters via diff.
>
> After beta’s complete, the next chasm to cross seems like our own
> collective willingness to deploy and operate Cassandra 4.0 clusters in
> production. Once we're at RC, willing to do so, and to recommend users do
> the same, I think we'll have hit our definition of done.
>
> As we wrap up the remaining beta issues and flaky tests, now's a good time
> for that RC gut check. If there's a remaining issue that would prevent you
> from running trunk in a prod environment, please file it and raise
> attention - it'll help us finish polishing the release. And if there isn't
> - deploy it!
>
> We still need to finish the remaining bugs in scope and get tests reliably
> green. But it feels good to be this close.
>
> – Scott
>
> ________________________________________
> From: Benjamin Lerer <benjamin.le...@datastax.com>
> Sent: Tuesday, January 19, 2021 1:54 AM
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] Revisiting the quality testing epic scope
>
> Thank you for your reply, Scott.
>
> My understanding is that Alexander is moving forward on CASSANDRA-15580
> (Repair)  and that Andres is focussing with Caleb on the tickets of
> CASSANDRA-15579 (Distributed Read/Write Path). The biggest unknown here
> seems to be CASSANDRA-16262 as you mentioned.
>
> Regarding CASSANDRA-15582 (Metrics), I shifted my focus toward helping with
> reviews for the release candidate. By consequence, outside of 2 patches
> created by  Sumanth during the holidays, the epic has not been moving
> forward.
>
> the silver lining is that it shouldn’t be long before the others wrap up.
> >
>
> Do you have a timeframe in mind for releasing 4.0 GA? Assuming that there
> is no sudden burst in the number of issues.
>
> We do have several flaky test tickets that could use attention, though
> >
>
> I believe that Adam, Berenguer and Brandon have started focusing on them.
>
> On Sat, Jan 16, 2021 at 10:49 PM Scott Andreas <sc...@paradoxica.net>
> wrote:
>
> > Thanks for raising the question, Benjamin! Notes on a few tickets inline
> > below.
> >
> > Non-Blocking:
> > – CASSANDRA-15537 Local Read/Write Path: Upgrade and Diff Test
> > I think it’s reasonable to consider this ticket complete. Yifan and
> others
> > have worked to execute several dozen diff tests and while I’m sure others
> > will continue, it’s reasonable to say cassandra-diff has been used to
> > compare 3.0 vs. 4.0 clusters with a wide variety of data models. I’ll
> check
> > with Yifan on Tuesday re: updating the status of the ticket. It would be
> > wonderful to hear of diff runs and experience from additional
> contributors
> > if others can share.
> >
> > – CASSANDRA-15584 Tooling - External Ecosystem
> > Great collaboration on this one (including issues filed arising from this
> > coverage, such as a recent ticket related to Medusa).
> >
> > Blocking GA:
> > – CASSANDRA-15579 Distributed Read/Write Path
> > The coordination and replication subtasks (16180, 16181) are making good
> > progress. I’ll check with Caleb and David on 16262 (the fuzz testing
> > subtask on Tuesday).
> >
> > – CASSANDRA-15581 Compaction
> > Most of these are perf tests rather than development tasks, though the
> > ones complete are listed as Patch Available. I’ll check with Yifan if
> it’d
> > make sense to move those for which no planned work remains to Resolved. I
> > don’t think there’s a lot left here.
> >
> > – CASSANDRA-15538 Local Read/Write Path - Other Areas
> > Will see if anything specific is planned, as scope is relatively
> undefined.
> >
> > With the exception of 15538, most of these look to be moving along or
> > nearly complete. I don’t think I’d shift others aside from it into the
> > non-blocking category - but the silver lining is that it shouldn’t be
> long
> > before the others wrap up.
> >
> > We do have several flaky test tickets that could use attention, though —
> > these may be quick to push through if anyone is able to pick them up:
> >
> > – CASSANDRA-16236: Fix flaky testTrackMaxDeletionTime
> > – CASSANDRA-16238: Fix flaky test
> > test_insert_data_during_replace_same_address -
> > replace_address_test.TestReplaceAddress
> > – CASSANDRA-16239: Fix flaky test
> > org.apache.cassandra.distributed.test.NetstatsRepairStreamingTest
> > testWithCompressionDisabled
> > – CASSANDRA-16317: Fix flaky test incompleteCommit -
> > org.apache.cassandra.distributed.test.CASTest
> > – CASSANDRA-16355: Fix flaky test incompletePropose -
> > org.apache.cassandra.distributed.test.CASTest
> > – CASSANDRA-16382: Fix flaky
> > LongSharedExecutorPoolTest.testPromptnessOfExecution
> > – CASSANDRA-16358: Minor Flakiness in
> > ProxyHandlerConnectionsTest#testExpireSomeFromBatch
> > – CASSANDRA-16229: Flaky jvm-dtest:
> >
> org.apache.cassandra.distributed.test.ring.NodeNotInRingTest.nodeNotInRingTest
> > – CASSANDRA-16061:
> >
> transient_replication_ring_test.py::TestTransientReplicationRing::test_move_forwards_and_cleanup
> >
> > Cheers,
> >
> > – Scott
> >
> > > On Jan 14, 2021, at 9:05 AM, Benjamin Lerer <
> benjamin.le...@datastax.com>
> > wrote:
> > >
> > > Hi everybody,
> > >
> > > As discussed before the holidays, it might make sense to revisit the
> > scope
> > > of the quality testing tickets for 4.0 GA to ensure that the 4.0
> release
> > is
> > > not held for longer than necessary.
> > >
> > > The current status of the quality testing tasks are the following:
> > >
> > > *DONE:*
> > >
> > > * CASSANDRA-15583 <
> https://issues.apache.org/jira/browse/CASSANDRA-15583
> > >
> > > Tooling, Bundled and First Party*
> > > CASSANDRA-15586 <https://issues.apache.org/jira/browse/CASSANDRA-15586
> >
> > > Cluster Setup and Maintenance
> > > CASSANDRA-15587 <https://issues.apache.org/jira/browse/CASSANDRA-15587
> >
> > > Platforms and Runtimes
> > >
> > >
> > > *NON BLOCKING:*
> > >
> > > The goals of the following ticket have been reached. Once GA is closed
> > they
> > > will be marked as done.
> > >
> > > CASSANDRA-15537 <https://issues.apache.org/jira/browse/CASSANDRA-15537
> >
> > > Local Read/Write Path: Upgrade and Diff Test
> > > CASSANDRA-15584 <https://issues.apache.org/jira/browse/CASSANDRA-15584
> >
> > > Tooling - External Ecosystem
> > >
> > > If I understood Jordan comment correctly on the following ticket, its
> > > should also not be a blocker for 4.0
> > > CASSANDRA-15585 <https://issues.apache.org/jira/browse/CASSANDRA-15585
> >
> > > Test Frameworks, Tooling, Infra / Automation
> > >
> > > *BLOCKING GA:*
> > >
> > > CASSANDRA-15579 <https://issues.apache.org/jira/browse/CASSANDRA-15579
> >
> > > Distributed Read/Write Path
> > >    4 sub-tasks: 1 resolved, 2 in progress, 1 open
> > >
> > > CASSANDRA-15580 <https://issues.apache.org/jira/browse/CASSANDRA-15580
> >
> > > Repair
> > >    Test scenarios are ready, working on integrating them to circle-ci
> > >
> > > CASSANDRA-15581 <https://issues.apache.org/jira/browse/CASSANDRA-15581
> >
> > > Compaction
> > >    9 sub-tasks: 5 patch available, 1 review in progress, 3 triage
> needed
> > >
> > > CASSANDRA-15582 <https://issues.apache.org/jira/browse/CASSANDRA-15582
> >
> > > Metrics
> > >   16 sub-tasks: 9 resolved, 5 patch available, 5 open
> > >
> > > CASSANDRA-15588 <https://issues.apache.org/jira/browse/CASSANDRA-15588
> >
> > > Cluster Upgrade
> > > 6 sub-tasks: 4 resolved, 1 in progress, 1 open
> > > CASSANDRA-15538 <https://issues.apache.org/jira/browse/CASSANDRA-15538
> >
> > > Local Read/Write Path No progress has been made on that ticket. The
> > > conclusion so far is that Harry is our best choice to uncover issues in
> > > that area but there is no clear plan on how to move forward.
> > > We have made some progress across the quality testing tickets.
> > Nevertheless
> > > there is still a significant amount of tickets to fix. As our time and
> > > resources are limited it might make sense to focus on what we believe
> are
> > > the most critical for 4.0 and relax our constraints on others. For
> > example
> > > it seems to me that the metrics tickets will mainly help to discover
> non
> > > critical old issues that are not blockers for 4.0. It is clear to me
> that
> > > they should be fixed but that could probably be done for the 4.0.x/4.1
> > > release  (I fully volunteer for that :-)). The same could be true for
> > some
> > > other areas of the code.
> > >
> > > In my opinion the important questions we would need to answer are:
> > >
> > >   1. Are there some tickets that we should make non-blocking for 4.0 ?
> > >   2. What do we do about CASSANDRA-15538
> > >   <https://issues.apache.org/jira/browse/CASSANDRA-15538> Local
> > Read/Write
> > >   Path?
> > >
> > > Thanks in advance for your feedback :-)
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Reply via email to