jenkins_jira_integration
<https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py>
script
updating the JIRA ticket with test results if you cause a regression + us
building a muscle around reverting your commit if they break tests.“

I am not sure people finding the time to fix their breakages will be solved
but at least they will be pinged automatically. Hopefully many follow Jira
updates.

“  I don't take the past as strongly indicative of the future here since
we've been allowing circle to validate pre-commit and haven't been
multiplexing.”
I am interested to compare how many tickets for flaky tests we will have
pre-5.0 now compared to pre-4.1.


On Wed, 12 Jul 2023 at 8:41, Josh McKenzie <jmcken...@apache.org> wrote:

> (This response ended up being a bit longer than intended; sorry about that)
>
> What is more common though is packaging errors,
> cdc/compression/system_ks_directory targeted fixes, CI w/wo
> upgrade tests, being less responsive post-commit as you already
> moved on
>
> *Two that **should **be resolved in the new regime:*
> * Packaging errors should be caught pre as we're making the artifact
> builds part of pre-commit.
> * I'm hoping to merge the commit log segment allocation so CDC allocator
> is the only one for 5.0 (and just bypasses the cdc-related work on
> allocation if it's disabled thus not impacting perf); the existing targeted
> testing of cdc specific functionality should be sufficient to confirm its
> correctness as it doesn't vary from the primary allocation path when it
> comes to mutation space in the buffer
> * Upgrade tests are going to be part of the pre-commit suite
>
> *Outstanding issues:*
> * compression. If we just run with defaults we won't test all cases so
> errors could pop up here
> * system_ks_directory related things: is this still ongoing or did we have
> a transient burst of these types of issues? And would we expect these to
> vary based on different JDK's, non-default configurations, etc?
> * Being less responsive post-commit: My only ideas here are a combination
> of the jenkins_jira_integration
> <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py>
> script updating the JIRA ticket with test results if you cause a regression
> + us building a muscle around reverting your commit if they break tests.
>
> To quote Jacek:
>
> why don't run dtests w/wo sstable compression x w/wo internode encryption
> x w/wo vnodes,
> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I
> think this is a matter of cost vs result.
>
>
> I think we've organically made these decisions and tradeoffs in the past
> without being methodical about it. If we can:
> 1. Multiplex changed or new tests
> 2. Tighten the feedback loop of "tests were green, now they're
> *consistently* not, you're the only one who changed something", and
> 3. Instill a culture of "if you can't fix it immediately revert your
> commit"
>
> Then I think we'll only be vulnerable to flaky failures introduced across
> different non-default configurations as side effects in tests that aren't
> touched, which *intuitively* feels like a lot less than we're facing
> today. We could even get clever as a day 2 effort and define packages in
> the primary codebase where changes take place and multiplex (on a smaller
> scale) their respective packages of unit tests in the future if we see
> problems in this area.
>
> Flakey tests are a giant pain in the ass and a huge drain on productivity,
> don't get me wrong. *And* we have to balance how much cost we're paying
> before each commit with the benefit we expect to gain from that.
>
> Does the above make sense? Are there things you've seen in the trenches
> that challenge or invalidate any of those perspectives?
>
> On Wed, Jul 12, 2023, at 7:28 AM, Jacek Lewandowski wrote:
>
> Isn't novnodes a special case of vnodes with n=1 ?
>
> We should rather select a subset of tests for which it makes sense to run
> with different configurations.
>
> The set of configurations against which we run the tests currently is
> still only the subset of all possible cases.
> I could ask - why don't run dtests w/wo sstable compression x w/wo
> internode encryption x w/wo vnodes,
> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I
> think this is a matter of cost vs result.
> This equation contains the likelihood of failure in configuration X, given
> there was no failure in the default
> configuration, the cost of running those tests, the time we delay merging,
> the likelihood that we wait for
> the test results so long that our branch diverge and we will have to rerun
> them or accept the fact that we merge
> a code which was tested on outdated base. Eventually, the overall new
> contributors experience - whether they
> want to participate in the future.
>
>
>
> śr., 12 lip 2023 o 07:24 Berenguer Blasi <berenguerbl...@gmail.com>
> napisał(a):
>
> On our 4.0 release I remember a number of such failures but not recently.
> What is more common though is packaging errors,
> cdc/compression/system_ks_directory targeted fixes, CI w/wo upgrade tests,
> being less responsive post-commit as you already moved on,... Either the
> smoke pre-commit has approval steps for everything or we should give imo a
> devBranch alike job to the dev pre-commit. I find it terribly useful. My
> 2cts.
> On 11/7/23 18:26, Josh McKenzie wrote:
>
> 2: Pre-commit 'devBranch' full suite for high risk/disruptive merges: at
> reviewer's discretion
>
> In general, maybe offering a dev the option of choosing either "pre-commit
> smoke" or "post-commit full" at their discretion for any work would be the
> right play.
>
> A follow-on thought: even with something as significant as Accord, TCM,
> Trie data structures, etc - I'd be a bit surprised to see tests fail on
> JDK17 that didn't on 11, or with vs. without vnodes, in ways that weren't
> immediately clear the patch stumbled across something surprising and was
> immediately trivially attributable if not fixable. *In theory* the things
> we're talking about excluding from the pre-commit smoke test suite are all
> things that are supposed to be identical across environments and thus
> opaque / interchangeable by default (JDK version outside checking build
> which we will, vnodes vs. non, etc).
>
> Has that not proven to be the case in your experience?
>
> On Tue, Jul 11, 2023, at 10:15 AM, Derek Chen-Becker wrote:
>
> A strong +1 to getting to a single CI system. CircleCI definitely has some
> niceties and I understand why it's currently used, but right now we get 2
> CI systems for twice the price. +1 on the proposed subsets.
>
> Derek
>
> On Mon, Jul 10, 2023 at 9:37 AM Josh McKenzie <jmcken...@apache.org>
> wrote:
>
>
> I'm personally not thinking about CircleCI at all; I'm envisioning a world
> where all of us have 1 CI *software* system (i.e. reproducible on any
> env) that we use for pre-commit validation, and then post-commit happens on
> reference ASF hardware.
>
> So:
> 1: Pre-commit subset of tests (suites + matrices + env) runs. On green,
> merge.
> 2: Post-commit tests (all suites, matrices, env) runs. If failure, link
> back to the JIRA where the commit took place
>
> Circle would need to remain in lockstep with the requirements for point 1
> here.
>
> On Mon, Jul 10, 2023, at 1:04 AM, Berenguer Blasi wrote:
>
> +1 to Josh which is exactly my line of thought as well. But that is only
> valid if we have a solid Jenkins that will eventually run all test configs.
> So I think I lost track a bit here. Are you proposing:
>
> 1- CircleCI: Run pre-commit a single (the most common/meaningful, TBD)
> config of tests
>
> 2- Jenkins: Runs post-commit _all_ test configs and emails/notifies you in
> case of problems?
>
> Or sthg different like having 1 also in Jenkins?
> On 7/7/23 17:55, Andrés de la Peña wrote:
>
> I think 500 runs combining all configs could be reasonable, since it's
> unlikely to have config-specific flaky tests. As in five configs with 100
> repetitions each.
>
> On Fri, 7 Jul 2023 at 16:14, Josh McKenzie <jmcken...@apache.org> wrote:
>
> Maybe. Kind of depends on how long we write our tests to run doesn't it? :)
>
> But point taken. Any non-trivial test would start to be something of a
> beast under this approach.
>
> On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote:
>
> On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie <jmcken...@apache.org>
> wrote:
> > 3. Multiplexed tests (changed, added) run against all JDK's and a
> broader range of configs (no-vnode, vnode default, compression, etc)
>
> I think this is going to be too heavy...we're taking 500 iterations
> and multiplying that by like 4 or 5?
>
>
>
>
>
> --
> +---------------------------------------------------------------+
> | Derek Chen-Becker                                             |
> | GPG Key available at https://keybase.io/dchenbecker and       |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---------------------------------------------------------------+
>
>
>
>

Reply via email to