jenkins_jira_integration <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py> script updating the JIRA ticket with test results if you cause a regression + us building a muscle around reverting your commit if they break tests.“
I am not sure people finding the time to fix their breakages will be solved but at least they will be pinged automatically. Hopefully many follow Jira updates. “ I don't take the past as strongly indicative of the future here since we've been allowing circle to validate pre-commit and haven't been multiplexing.” I am interested to compare how many tickets for flaky tests we will have pre-5.0 now compared to pre-4.1. On Wed, 12 Jul 2023 at 8:41, Josh McKenzie <jmcken...@apache.org> wrote: > (This response ended up being a bit longer than intended; sorry about that) > > What is more common though is packaging errors, > cdc/compression/system_ks_directory targeted fixes, CI w/wo > upgrade tests, being less responsive post-commit as you already > moved on > > *Two that **should **be resolved in the new regime:* > * Packaging errors should be caught pre as we're making the artifact > builds part of pre-commit. > * I'm hoping to merge the commit log segment allocation so CDC allocator > is the only one for 5.0 (and just bypasses the cdc-related work on > allocation if it's disabled thus not impacting perf); the existing targeted > testing of cdc specific functionality should be sufficient to confirm its > correctness as it doesn't vary from the primary allocation path when it > comes to mutation space in the buffer > * Upgrade tests are going to be part of the pre-commit suite > > *Outstanding issues:* > * compression. If we just run with defaults we won't test all cases so > errors could pop up here > * system_ks_directory related things: is this still ongoing or did we have > a transient burst of these types of issues? And would we expect these to > vary based on different JDK's, non-default configurations, etc? > * Being less responsive post-commit: My only ideas here are a combination > of the jenkins_jira_integration > <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py> > script updating the JIRA ticket with test results if you cause a regression > + us building a muscle around reverting your commit if they break tests. > > To quote Jacek: > > why don't run dtests w/wo sstable compression x w/wo internode encryption > x w/wo vnodes, > w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I > think this is a matter of cost vs result. > > > I think we've organically made these decisions and tradeoffs in the past > without being methodical about it. If we can: > 1. Multiplex changed or new tests > 2. Tighten the feedback loop of "tests were green, now they're > *consistently* not, you're the only one who changed something", and > 3. Instill a culture of "if you can't fix it immediately revert your > commit" > > Then I think we'll only be vulnerable to flaky failures introduced across > different non-default configurations as side effects in tests that aren't > touched, which *intuitively* feels like a lot less than we're facing > today. We could even get clever as a day 2 effort and define packages in > the primary codebase where changes take place and multiplex (on a smaller > scale) their respective packages of unit tests in the future if we see > problems in this area. > > Flakey tests are a giant pain in the ass and a huge drain on productivity, > don't get me wrong. *And* we have to balance how much cost we're paying > before each commit with the benefit we expect to gain from that. > > Does the above make sense? Are there things you've seen in the trenches > that challenge or invalidate any of those perspectives? > > On Wed, Jul 12, 2023, at 7:28 AM, Jacek Lewandowski wrote: > > Isn't novnodes a special case of vnodes with n=1 ? > > We should rather select a subset of tests for which it makes sense to run > with different configurations. > > The set of configurations against which we run the tests currently is > still only the subset of all possible cases. > I could ask - why don't run dtests w/wo sstable compression x w/wo > internode encryption x w/wo vnodes, > w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I > think this is a matter of cost vs result. > This equation contains the likelihood of failure in configuration X, given > there was no failure in the default > configuration, the cost of running those tests, the time we delay merging, > the likelihood that we wait for > the test results so long that our branch diverge and we will have to rerun > them or accept the fact that we merge > a code which was tested on outdated base. Eventually, the overall new > contributors experience - whether they > want to participate in the future. > > > > śr., 12 lip 2023 o 07:24 Berenguer Blasi <berenguerbl...@gmail.com> > napisał(a): > > On our 4.0 release I remember a number of such failures but not recently. > What is more common though is packaging errors, > cdc/compression/system_ks_directory targeted fixes, CI w/wo upgrade tests, > being less responsive post-commit as you already moved on,... Either the > smoke pre-commit has approval steps for everything or we should give imo a > devBranch alike job to the dev pre-commit. I find it terribly useful. My > 2cts. > On 11/7/23 18:26, Josh McKenzie wrote: > > 2: Pre-commit 'devBranch' full suite for high risk/disruptive merges: at > reviewer's discretion > > In general, maybe offering a dev the option of choosing either "pre-commit > smoke" or "post-commit full" at their discretion for any work would be the > right play. > > A follow-on thought: even with something as significant as Accord, TCM, > Trie data structures, etc - I'd be a bit surprised to see tests fail on > JDK17 that didn't on 11, or with vs. without vnodes, in ways that weren't > immediately clear the patch stumbled across something surprising and was > immediately trivially attributable if not fixable. *In theory* the things > we're talking about excluding from the pre-commit smoke test suite are all > things that are supposed to be identical across environments and thus > opaque / interchangeable by default (JDK version outside checking build > which we will, vnodes vs. non, etc). > > Has that not proven to be the case in your experience? > > On Tue, Jul 11, 2023, at 10:15 AM, Derek Chen-Becker wrote: > > A strong +1 to getting to a single CI system. CircleCI definitely has some > niceties and I understand why it's currently used, but right now we get 2 > CI systems for twice the price. +1 on the proposed subsets. > > Derek > > On Mon, Jul 10, 2023 at 9:37 AM Josh McKenzie <jmcken...@apache.org> > wrote: > > > I'm personally not thinking about CircleCI at all; I'm envisioning a world > where all of us have 1 CI *software* system (i.e. reproducible on any > env) that we use for pre-commit validation, and then post-commit happens on > reference ASF hardware. > > So: > 1: Pre-commit subset of tests (suites + matrices + env) runs. On green, > merge. > 2: Post-commit tests (all suites, matrices, env) runs. If failure, link > back to the JIRA where the commit took place > > Circle would need to remain in lockstep with the requirements for point 1 > here. > > On Mon, Jul 10, 2023, at 1:04 AM, Berenguer Blasi wrote: > > +1 to Josh which is exactly my line of thought as well. But that is only > valid if we have a solid Jenkins that will eventually run all test configs. > So I think I lost track a bit here. Are you proposing: > > 1- CircleCI: Run pre-commit a single (the most common/meaningful, TBD) > config of tests > > 2- Jenkins: Runs post-commit _all_ test configs and emails/notifies you in > case of problems? > > Or sthg different like having 1 also in Jenkins? > On 7/7/23 17:55, Andrés de la Peña wrote: > > I think 500 runs combining all configs could be reasonable, since it's > unlikely to have config-specific flaky tests. As in five configs with 100 > repetitions each. > > On Fri, 7 Jul 2023 at 16:14, Josh McKenzie <jmcken...@apache.org> wrote: > > Maybe. Kind of depends on how long we write our tests to run doesn't it? :) > > But point taken. Any non-trivial test would start to be something of a > beast under this approach. > > On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote: > > On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie <jmcken...@apache.org> > wrote: > > 3. Multiplexed tests (changed, added) run against all JDK's and a > broader range of configs (no-vnode, vnode default, compression, etc) > > I think this is going to be too heavy...we're taking 500 iterations > and multiplying that by like 4 or 5? > > > > > > -- > +---------------------------------------------------------------+ > | Derek Chen-Becker | > | GPG Key available at https://keybase.io/dchenbecker and | > | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | > | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | > +---------------------------------------------------------------+ > > > >