Hi Zoltan! Thank you for all of your efforts, this looks really promising. With moving to github PRs, would that also mean that we move away from the reviewboard for code review? Also, what happens if a PR is updated? Will the tests run for both or just for the latest version?
Regards, Zoltan On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich <k...@rxd.hu> wrote: > Hello all! > > The proposed system have become more stable lately - and I think I've > solved a few sources of flakiness. > To be really usable I also wanted to add a way to dynamically > enable/disable a set of tests (for example the replication tests take ~7 > hours to execute from the total of 24 > hours - and they are also a bit unstable, so not running them when not > neccesary would be beneficial in multiple ways) - but to do this the best > would be to throw in > junit5; unfortunately the current ptest installation uses maven 3.0.5 > which doesn't like these kind of things - so instead of hacking a fix for > that ....I've removed it > from the dev branch for now. > > I would like to propose to start an evaluation phase of the new test > procedures(INFRA-20269) > The process would look something like this: > * someone opens a PR - the tests will be run on the changes > * on every active branches the tests will run from time to time > * this will produce a bunch of test runs on the master branch as well ; > which will show how well the tests behave on the master branch without any > patches > * runs on branches (PRs or active development branches(eg:master)) will be > rate limited to 5 builds/day > * at most ~4 builds at a time - to maximize resource usage > * turnaround time for a build is right now 2 hours - which I feel like a > balanced choice between speed/response time > > Possible future benefits: > * toggle features using github tags > * optional testgroups (metastore/replication) tests > * ability to run the metastore verification tests > * possibility to add smoke tests > > To enable this I will have to finish the HIVE-22942 ticket - beyond the > new Jenkinsfile which defines the full logic; > although I've sinked a lot of time into fixing all kind of flaky tests I > would would like to disable around ~25 tests. > > I also would like to propose a method to verify the stability of a single > test: run it a 100 times in series at the same place where the precommit > tests are running. > This will put the bar high enough that only totally stable tests could > satisfy it (a 99% stable test has 36% chance to pass this without being > caught :D) > After this will be in service it could be used to: validate that an > existing test is unstable (before disabling it) - and then used again to > prove that it got fixed during > re-enabling it. > > Please let me know what you think! > > cheers, > Zoltan > > > > On 4/29/20 4:28 PM, Zoltan Haindrich wrote: > > Hey All! > > > > I was planning to replace the ptest stuff with something less complex > for a while now - I see that we struggle a lot because of ptest is more > complicated than it should be... > > It would be much better if it would be constructed from well made > existing CI piece. - because of that I've started working on [1] a few > months ago. > > > > It has it's pros and cons...but it's not the same as the existing ptest > stuff. > > I've collected some infos about how it compares against the existing one > - but it became too long so I've moved it into a google docs document at > [3]. > > > > It's not yet ready... I still have some remaining problems/concerns/etc > > * what do you think about changing to a github PR based workflow? > > * it will not support at all things like "isolation" - so we will have > to make our tests work with eachother without bending the rules... > > * I've tried to overcommit the cpu resources which creates a more noisy > environment for the actual tests - this squeezes out some new problems > which should be fixed before > > this could be enabled. > > * for every PR the first run is somewhat sub-optimal...there are some > reasons for this - the actually used resources are the same; but the > overall execution time is not > > optimal; I could accept this as a compromise because right now I wait > >24 hours for a precommit run. > > > > It's deployed at [2] and anyone can start a testrun on it: > > * merge my HIVE-22942-ptest-alt branch from [4] into your branch > > * open a PR against my hive repo on github [5] > > > > cheers, > > Zoltan > > > > > > [1] https://issues.apache.org/jira/browse/HIVE-22942 > > [2] http://34.66.156.144:8080/job/hive-precommit > > [3] > https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit?usp=sharing > > [4] https://github.com/kgyrtkirk/hive/tree/HIVE-22942-ptest-alt > > [5] https://github.com/kgyrtkirk/hive/ >