[ https://issues.apache.org/jira/browse/CASSANDRA-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747878#comment-17747878 ]
Berenguer Blasi commented on CASSANDRA-18137: --------------------------------------------- Just opened https://issues.apache.org/jira/browse/CASSANDRA-18699 which might be worth including in this effort? > Repeatable ci-cassandra.a.o > ---------------------------- > > Key: CASSANDRA-18137 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18137 > Project: Cassandra > Issue Type: Epic > Components: CI > Reporter: Michael Semb Wever > Assignee: Michael Semb Wever > Priority: Normal > Fix For: 5.x > > > Goals > - Reproducible reference ASF CI environment so contributors can clone it. > - An accepted “test result output” format that will certify a commit > regardless of CI env. > - Turnaround times as fast as circleci (cloned environment scales to > capacity). > - Intuitive CI implementation accessible to new contributors. > Existing Problems > - Many unknown flakies due to infrequent failure rates and limited test > history, > - time-consuming to identify flakies as infra-related, test-related, > code-related, > - ci-cassandra.a.o is hard to debug (donated heterogenous servers around the > world, ASF controlled with limited physical access, two executors per agent > [noisy neighbour]), > - slow turnaround times compared to circleci, also very variable times as the > fixed resource pool running both post- and pre-commit CI becomes easily > saturated, > - difficult to pre-commit test jenkins and cassandra-build changes, > - CI development efforts is split between ci-cassandra and circleci, despite > ci-cassandra being our canonical and non-commercial CI, > - lacking parity of what is tested between ci-cassandra and circleci > - circleci is restricted to those with access to premium commercial circleci, > creating classes of engineers in the community and an exclusive OSS culture, > - cassandra-builds as a separate repo (without release branches matching > in-tree) adds complexity to changing matrix values (jdks, pythons, dist) > - mixture of jenkins dsl groovy, declarative and scripting pipeline. > - different pre-commit and post-commit jenkins pipelines are used. > Additional Goals > - Identify all remaining test flakies. > - Thin CI implementation that builds on a common set of CI-agnostic build and > test scripts. Contributors are free to add/maintain other CI solutions while > remaining aligned on what and how we test. > - Extendable by downstream codebases (designed for re-use and extension). > Proposal > - Provide a jenkins k8s operator based script that with one command-line > spawns a ci-cassandra.a.o clone on the k8s cluster in context, runs the > Jenkinsfile pipeline, saves the test result, and tears down the > ci-cassandra.a.o clone. [turnkey solution] > - Parameters make spawn and tear-down optional, so ci-cassandra.a.o clones > can be re-usable. > - Bring build and test scripts (including their docker images) from > cassandra-builds to in-tree > - Provide a declarative jenkins pipeline that maps stages to CI-agnostic > build and test scripts. > - CI-agnostic build and test scripts can be run with docker, and without any > CI, on any machine. > - Branch specific testing context is defined outside of the CI code. > Unknowns > - with the known pipeline steps, the matrixes we desire (jdk, python, dist, > arch), and the parallelisation possible, what is the fastest turnaround time > we can expect, > - what parameterisation to the script is required for typical developer > testing pre-commit, > - what is the cost of a single pipeline run, what is the expected cost of a > year's post-commit CI, > - what is the size of the test result and how can it be saved and shared, > - what is the ideal stable agent resource specifications, can we use > heterogenous environments if different test types have different minimum > requirements, > - is multiplexing testing in jenkins a requirement to this epic (see > CASSANDRA-17932) > - how does the community provide CI to contributors that cannot afford the > k8s cluster costs > Non Goals > - deciding what to do with ci-cassandra.a.o if a ci-cassandra.a.o clone is > donated and provides a more stable post-commit environment than the original > ci-cassandra.a.o > - any work/improvements on circleci (e.g. CASSANDRA-18001) > - discussing or changing our branching and merging strategies > - introduction of a develop or staging branch for final pass pre-commit CI > optimisation > - jira integration (automatic bot comments of test results) (see > CASSANDRA-17277) > - introduction/support of additional tests: code coverage, jmh benchmarking, > or larger performance testing; or matrix axis (jdks, pythons, dists, etc) or > other checks. (e.g. CASSANDRA-18077, CASSANDRA-18072) > Timing and Priorisation > - Both 4.0 and 4.1 major releases have been delayed (and significant amounts > of engineering time spent) on addressing an unknown number of flakies > (further exacerbated by unknown CI infra problems). > - Test failures are still being caught months after commit and merge. > - 5.0 promises a significant increase in large contributions, increasing the > risk and cost of failing to do stable trunk development. > - Downstream users are requesting closer alignment to upstream, and > convergence to upstream's CI approach (including extending it for additional > QA). > History and background context can be read in the previous 'Cassandra CI > Status' dev@ threads: > https://lists.apache.org/list?d...@cassandra.apache.org:lte=5y:%22Cassandra%20CI%20Status%22 > and in this > [gdoc|https://docs.google.com/document/d/1XSlYDce3IvvRURlIawOYoMRyzeybCe94gIFS4riKze0/edit#heading=h.7u7l5ky8gzwi] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org