Revert for only trunk patches right? I’d say we need to completely stabilize the environment, no noise before we go into that direction.
On Wed, 12 Jul 2023 at 8:55, Jacek Lewandowski <lewandowski.ja...@gmail.com> wrote: > Would it be re-opening the ticket or creating a new ticket with "revert of > fix" ? > > > > śr., 12 lip 2023 o 14:51 Ekaterina Dimitrova <e.dimitr...@gmail.com> > napisał(a): > >> jenkins_jira_integration >> <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py> >> script >> updating the JIRA ticket with test results if you cause a regression + us >> building a muscle around reverting your commit if they break tests.“ >> >> I am not sure people finding the time to fix their breakages will be >> solved but at least they will be pinged automatically. Hopefully many >> follow Jira updates. >> >> “ I don't take the past as strongly indicative of the future here since >> we've been allowing circle to validate pre-commit and haven't been >> multiplexing.” >> I am interested to compare how many tickets for flaky tests we will have >> pre-5.0 now compared to pre-4.1. >> >> >> On Wed, 12 Jul 2023 at 8:41, Josh McKenzie <jmcken...@apache.org> wrote: >> >>> (This response ended up being a bit longer than intended; sorry about >>> that) >>> >>> What is more common though is packaging errors, >>> cdc/compression/system_ks_directory targeted fixes, CI w/wo >>> upgrade tests, being less responsive post-commit as you already >>> moved on >>> >>> *Two that **should **be resolved in the new regime:* >>> * Packaging errors should be caught pre as we're making the artifact >>> builds part of pre-commit. >>> * I'm hoping to merge the commit log segment allocation so CDC allocator >>> is the only one for 5.0 (and just bypasses the cdc-related work on >>> allocation if it's disabled thus not impacting perf); the existing targeted >>> testing of cdc specific functionality should be sufficient to confirm its >>> correctness as it doesn't vary from the primary allocation path when it >>> comes to mutation space in the buffer >>> * Upgrade tests are going to be part of the pre-commit suite >>> >>> *Outstanding issues:* >>> * compression. If we just run with defaults we won't test all cases so >>> errors could pop up here >>> * system_ks_directory related things: is this still ongoing or did we >>> have a transient burst of these types of issues? And would we expect these >>> to vary based on different JDK's, non-default configurations, etc? >>> * Being less responsive post-commit: My only ideas here are a >>> combination of the jenkins_jira_integration >>> <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py> >>> script updating the JIRA ticket with test results if you cause a regression >>> + us building a muscle around reverting your commit if they break tests. >>> >>> To quote Jacek: >>> >>> why don't run dtests w/wo sstable compression x w/wo internode >>> encryption x w/wo vnodes, >>> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. >>> I think this is a matter of cost vs result. >>> >>> >>> I think we've organically made these decisions and tradeoffs in the past >>> without being methodical about it. If we can: >>> 1. Multiplex changed or new tests >>> 2. Tighten the feedback loop of "tests were green, now they're >>> *consistently* not, you're the only one who changed something", and >>> 3. Instill a culture of "if you can't fix it immediately revert your >>> commit" >>> >>> Then I think we'll only be vulnerable to flaky failures introduced >>> across different non-default configurations as side effects in tests that >>> aren't touched, which *intuitively* feels like a lot less than we're >>> facing today. We could even get clever as a day 2 effort and define >>> packages in the primary codebase where changes take place and multiplex (on >>> a smaller scale) their respective packages of unit tests in the future if >>> we see problems in this area. >>> >>> Flakey tests are a giant pain in the ass and a huge drain on >>> productivity, don't get me wrong. *And* we have to balance how much >>> cost we're paying before each commit with the benefit we expect to gain >>> from that. >>> >>> Does the above make sense? Are there things you've seen in the trenches >>> that challenge or invalidate any of those perspectives? >>> >>> On Wed, Jul 12, 2023, at 7:28 AM, Jacek Lewandowski wrote: >>> >>> Isn't novnodes a special case of vnodes with n=1 ? >>> >>> We should rather select a subset of tests for which it makes sense to >>> run with different configurations. >>> >>> The set of configurations against which we run the tests currently is >>> still only the subset of all possible cases. >>> I could ask - why don't run dtests w/wo sstable compression x w/wo >>> internode encryption x w/wo vnodes, >>> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. >>> I think this is a matter of cost vs result. >>> This equation contains the likelihood of failure in configuration X, >>> given there was no failure in the default >>> configuration, the cost of running those tests, the time we delay >>> merging, the likelihood that we wait for >>> the test results so long that our branch diverge and we will have to >>> rerun them or accept the fact that we merge >>> a code which was tested on outdated base. Eventually, the overall new >>> contributors experience - whether they >>> want to participate in the future. >>> >>> >>> >>> śr., 12 lip 2023 o 07:24 Berenguer Blasi <berenguerbl...@gmail.com> >>> napisał(a): >>> >>> On our 4.0 release I remember a number of such failures but not >>> recently. What is more common though is packaging errors, >>> cdc/compression/system_ks_directory targeted fixes, CI w/wo upgrade tests, >>> being less responsive post-commit as you already moved on,... Either the >>> smoke pre-commit has approval steps for everything or we should give imo a >>> devBranch alike job to the dev pre-commit. I find it terribly useful. My >>> 2cts. >>> On 11/7/23 18:26, Josh McKenzie wrote: >>> >>> 2: Pre-commit 'devBranch' full suite for high risk/disruptive merges: at >>> reviewer's discretion >>> >>> In general, maybe offering a dev the option of choosing either >>> "pre-commit smoke" or "post-commit full" at their discretion for any work >>> would be the right play. >>> >>> A follow-on thought: even with something as significant as Accord, TCM, >>> Trie data structures, etc - I'd be a bit surprised to see tests fail on >>> JDK17 that didn't on 11, or with vs. without vnodes, in ways that weren't >>> immediately clear the patch stumbled across something surprising and was >>> immediately trivially attributable if not fixable. *In theory* the >>> things we're talking about excluding from the pre-commit smoke test suite >>> are all things that are supposed to be identical across environments and >>> thus opaque / interchangeable by default (JDK version outside checking >>> build which we will, vnodes vs. non, etc). >>> >>> Has that not proven to be the case in your experience? >>> >>> On Tue, Jul 11, 2023, at 10:15 AM, Derek Chen-Becker wrote: >>> >>> A strong +1 to getting to a single CI system. CircleCI definitely has >>> some niceties and I understand why it's currently used, but right now we >>> get 2 CI systems for twice the price. +1 on the proposed subsets. >>> >>> Derek >>> >>> On Mon, Jul 10, 2023 at 9:37 AM Josh McKenzie <jmcken...@apache.org> >>> wrote: >>> >>> >>> I'm personally not thinking about CircleCI at all; I'm envisioning a >>> world where all of us have 1 CI *software* system (i.e. reproducible on >>> any env) that we use for pre-commit validation, and then post-commit >>> happens on reference ASF hardware. >>> >>> So: >>> 1: Pre-commit subset of tests (suites + matrices + env) runs. On green, >>> merge. >>> 2: Post-commit tests (all suites, matrices, env) runs. If failure, link >>> back to the JIRA where the commit took place >>> >>> Circle would need to remain in lockstep with the requirements for point >>> 1 here. >>> >>> On Mon, Jul 10, 2023, at 1:04 AM, Berenguer Blasi wrote: >>> >>> +1 to Josh which is exactly my line of thought as well. But that is only >>> valid if we have a solid Jenkins that will eventually run all test configs. >>> So I think I lost track a bit here. Are you proposing: >>> >>> 1- CircleCI: Run pre-commit a single (the most common/meaningful, TBD) >>> config of tests >>> >>> 2- Jenkins: Runs post-commit _all_ test configs and emails/notifies you >>> in case of problems? >>> >>> Or sthg different like having 1 also in Jenkins? >>> On 7/7/23 17:55, Andrés de la Peña wrote: >>> >>> I think 500 runs combining all configs could be reasonable, since it's >>> unlikely to have config-specific flaky tests. As in five configs with 100 >>> repetitions each. >>> >>> On Fri, 7 Jul 2023 at 16:14, Josh McKenzie <jmcken...@apache.org> wrote: >>> >>> Maybe. Kind of depends on how long we write our tests to run doesn't it? >>> :) >>> >>> But point taken. Any non-trivial test would start to be something of a >>> beast under this approach. >>> >>> On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote: >>> >>> On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie <jmcken...@apache.org> >>> wrote: >>> > 3. Multiplexed tests (changed, added) run against all JDK's and a >>> broader range of configs (no-vnode, vnode default, compression, etc) >>> >>> I think this is going to be too heavy...we're taking 500 iterations >>> and multiplying that by like 4 or 5? >>> >>> >>> >>> >>> >>> -- >>> +---------------------------------------------------------------+ >>> | Derek Chen-Becker | >>> | GPG Key available at https://keybase.io/dchenbecker and | >>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | >>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | >>> +---------------------------------------------------------------+ >>> >>> >>> >>>