Hey all!

The patch is now in master - so every new PR or a push on it will trigger a new 
run.

Please decide which one would you like to use - open a PR to see the new one work...or upload a patch file to the jira - but please don't do both; because in that case 2 execution will happen.

The job execution time(2-4 hours) of a single run is a bit higher than the 
usual on the ptest server - this is mostly to increase throughput.

The patch also disabled a set of tests; I will send the full list of skipped 
tests shortly.

cheers,
Zoltan


On 5/27/20 1:50 PM, Zoltan Haindrich wrote:
Hello all!

The new stuff is ready to be switched on-to. It needs to be merged into master 
- and after that anyone who opens a PR will get a run by the new HiveQA infra.
I propose to run the 2 systems side-by-side for some time - the regular master 
builds will start; and we will see how frequently that is polluted by flaky 
tests.

Note that the current patch also disables around ~25 more tests to increase stability - to get a better overview about the disabled tests I think the "direction of the information flow" should be altered; what I mean by that is: instead of just throwing in a jira for "disable test x" and opening a new one like "fix test x"; only open the latter and place the jira reference into the ignore message; meanwhile also add a regular report about the actually disabled tests - so people who do know about the importance of a particular test can get involved.

Note: the builds.apache.org instance will be shutdown somewhere in the future as well...but I think the new one is a good-enough alternative to not have to migrate the Hive-precommit job over to https://ci-hadoop.apache.org/.

http://34.66.156.144:8080/job/hive-precommit/job/PR-948/5/
https://issues.apache.org/jira/browse/HIVE-22942
https://github.com/apache/hive/pull/948/files

cheers,
Zoltan

On 5/18/20 1:42 PM, Zoltan Haindrich wrote:
Hey!

On 5/18/20 11:51 AM, Zoltan Chovan wrote:
Thank you for all of your efforts, this looks really promising. With moving
to github PRs, would that also mean that we move away from the reviewboard
for code review?
I didn't thinked about that. I think using github's review interface will remain optional, because both review systems has there own strong points - I wouldn't force anyone to use one over the other. (For some patches reviewboard is much better; because it's able to track content moves a bit better than github. - meanwhile github has a small feature that enables to mark files as reviewed) As a matter of fact we had sometimes patches on the jira's which never had neither an RB or a PR to review them - having a PR there at least will make it easier for reviewers to comment.

Also, what happens if a PR is updated? Will the tests run for both or just
for the latest version?
It will trigger a new build - if there is already a build in progress that will prevent a new build from starting until it finishes...and there is also a 5 builds/day limit; which might induce some wait.

cheers,
Zoltan


Regards,
Zoltan

On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich <k...@rxd.hu> wrote:

Hello all!

The proposed system have become more stable lately - and I think I've
solved a few sources of flakiness.
To be really usable I also wanted to add a way to dynamically
enable/disable a set of tests (for example the replication tests take ~7
hours to execute from the total of 24
hours - and they are also a bit unstable, so not running them when not
neccesary would be beneficial in multiple ways) - but to do this the best
would be to throw in
junit5; unfortunately the current ptest installation uses maven 3.0.5
which doesn't like these kind of things - so instead of hacking a fix for
that ....I've removed it
from the dev branch for now.

I would like to propose to start an evaluation phase of the new test
procedures(INFRA-20269)
The process would look something like this:
* someone opens a PR - the tests will be run on the changes
* on every active branches the tests will run from time to time
    * this will produce a bunch of test runs on the master branch as well ;
which will show how well the tests behave on the master branch without any
patches
* runs on branches (PRs or active development branches(eg:master)) will be
rate limited to 5 builds/day
* at most ~4 builds at a time - to maximize resource usage
* turnaround time for a build is right now 2 hours - which I feel like a
balanced choice between speed/response time

Possible future benefits:
* toggle features using github tags
* optional testgroups (metastore/replication) tests
* ability to run the metastore verification tests
* possibility to add smoke tests

To enable this I will have to finish the HIVE-22942 ticket - beyond the
new Jenkinsfile which defines the full logic;
although I've sinked a lot of time into fixing all kind of flaky tests I
would would like to disable around ~25 tests.

I also would like to propose a method to verify the stability of a single
test: run it a 100 times in series at the same place where the precommit
tests are running.
This will put the bar high enough that only totally stable tests could
satisfy it (a 99% stable test has 36% chance to pass this without being
caught :D)
After this will be in service it could be used to: validate that an
existing test is unstable (before disabling it) - and then used again to
prove that it got fixed during
re-enabling it.

Please let me know what you think!

cheers,
Zoltan



On 4/29/20 4:28 PM, Zoltan Haindrich wrote:
Hey All!

I was planning to replace the ptest stuff with something less complex
for a while now - I see that we struggle a lot because of ptest is more
complicated than it should be...
It would be much better if it would be constructed from well made
existing CI piece. - because of that I've started working on [1] a few
months ago.

It has it's pros and cons...but it's not the same as the existing ptest
stuff.
I've collected some infos about how it compares against the existing one
- but it became too long so I've moved it into a google docs document at
[3].

It's not yet ready... I still have some remaining problems/concerns/etc
* what do you think about changing to a github PR based workflow?
* it will not support at all things like "isolation" - so we will have
to make our tests work with eachother without bending the rules...
* I've tried to overcommit the cpu resources which creates a more noisy
environment for the actual tests - this squeezes out some new problems
which should be fixed before
this could be enabled.
* for every PR the first run is somewhat sub-optimal...there are some
reasons for this - the actually used resources are the same; but the
overall execution time is not
optimal; I could accept this as a compromise because right now I wait
24 hours for a precommit run.

It's deployed at [2] and anyone can start a testrun on it:
* merge my HIVE-22942-ptest-alt branch from [4] into your branch
* open a PR against my hive repo on github [5]

cheers,
Zoltan


[1] https://issues.apache.org/jira/browse/HIVE-22942
[2] http://34.66.156.144:8080/job/hive-precommit
[3]
https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit?usp=sharing
[4] https://github.com/kgyrtkirk/hive/tree/HIVE-22942-ptest-alt
[5] https://github.com/kgyrtkirk/hive/


Reply via email to