Re: The Wild World of Solr Tests

Mark Miller Thu, 09 Feb 2017 15:54:14 -0800

bq. I feel it would certainly help contributors like me to improve the
quality of patch before it is committed to trunk. There are many apache
projects which provide this type of infrastructure support e.g. when we
submit a patch in hadoop, an automated jenkins bot provides feedback about
various aspects of the patch e.g. check-style errors, unit tests failures,
javadocs etc.

Well, at some point, if this is going to work or survive over the longer
term, it will have to evolve and automate and improve like that to some
degree.

We will see though. One of the biggest problems is the hardware. We have
access to some Apache Jenkins machines that are already quite busy and
probably not the greatest candidates for that job (if it was even possible
to steal one from the jenkins cluster). Meanwhile, last I knew Apache does
not allow companies to donate hardware to Apache for a specific purpose or
project. So hardware for full automation is one issue. Beasting a test well
can be done in 5-30 in most cases on decent hardware with enough RAM (tests
potentially need 512MB for heap alone each, so a decent amount of RAM or
fast swap is required for lots to go in parallel. More in parallel tends to
produce fails faster (though too many will obviously overload the hardware
and not be very useful either).

Back when I was talking about a full beasting test run to Greg Chanan a
couple years back, my main thought was we don't have the hardware for it.
Some company could do it, but how do you actually integrate into
our automated process that way over the long term? Companies come and go,
interest and money comes and goes, etc. So what is the sustainable plan?
I'm not sure, but I think driving in the right direction might present some
kind of new possibilities.

When I brought up the unlikeliness of the build system or reporting or
hardware showing up in a way that works for the project anytime soon, Greg
brought up the same type of idea for a post commit job type thing that
could aluto track and beast new tests (and perhaps altered tests)
automatically and then post the results to JIRA. I still think that is a
great idea if we could figure out a hardware side.

For a while I'm just going to produce reports in the cloud on GCE** and
provide the initial hand holding and support this needs to try and get off
the ground and reliable. If it get's any steam or helps produce any
results, perhaps others will have further ideas on how to automate, find
available hardware/resource and improve the strategy.

Once something is fully automated, beasting all the tests *could*
theoretically  become a much less common exercise. You could have pre or
post commit hooks to look at tests in patches or recent commits, you could
have a job that randomly selects tests to beast every day or other day,
some of those beastings could be for a lot of runs, and of course you could
provide extra coverage to tests that have a poor history.

Anyhow, I'll fake a bit of that for a while once I get a little more ramped
up. But yeah, eventually more automation and strategy will be important.

- Mark

** Thank you Cloudera! That last report was probably $24 excluding my labor
with 10 machine - the first report on a single machine cost much more.

On Thu, Feb 9, 2017 at 6:02 PM Hrishikesh Gadre <gadre.s...@gmail.com>
wrote:

> Hi Mark,
>
> Thanks for taking care of this.
>
> >>preventing new tests that are not solid.
>
> What are your thoughts on this? While keeping track of recently introduced
> and flaky tests is one way to go forward, would it make sense to have some
> sort of automated test run *before* committing the changes ? I feel it
> would certainly help contributors like me to improve the quality of patch
> before it is committed to trunk. There are many apache projects which
> provide this type of infrastructure support e.g. when we submit a patch in
> hadoop, an automated jenkins bot provides feedback about various aspects of
> the patch e.g. check-style errors, unit tests failures, javadocs etc.
>
> Thoughts?
>
> Thanks
> -Hrishikesh
>
>
>
>
> On Thu, Feb 9, 2017 at 2:45 PM, Mark Miller <markrmil...@gmail.com> wrote:
>
> bq. a long time ago and sadly failed.
>
> It's really a quite difficult problem. Other than cutting out tons of test
> coverage, there has been no easy way to get to the top of the hill and stay
> there.
>
> I've gone after this with mostly sweat and time in the past. I still
> remember one xmas day 4 or 5 years ago when I had all the apache and
> policeman and my jenkins green for the main branches I tracked (like 10-14
> jobs?) for the first (and only time I ended up seeing) time at once. I've
> also set up my own jenkins machine and jobs at least 3 or 4 times for
> months on end, with simple job percentage passing plugins and other things
> that many others have done. It's all such a narrow tiny view into the real
> data though. You can just try and tread water or go out and try and
> physically burn down tests.
>
> It's really not so simple as just donating time. Not unless everyone did a
> lot more of it. First off, you can spend a ton of time taking many, many
> tests go from failing 5 in 25 times to something like 1 in 60 or 1 in 100,
> but dozens of tests failing even that often being run all the time is still
> a huge problem that will grow and festure in our current situation. So how
> do you even see where you are or ensure any progress made is kept? Random
> changes from many other areas bring once solid tests into worse shape,
> other tests accumulate new problems because no one is looking into tests
> that commonly fail, etc, etc. Many of theses tests provide critical tests
> coverage.
>
> You really have to know where to focus the time. Sometimes hardening a
> test is pretty quick, but often it's hours or even days of time
> fighting mischievous little issues.
>
> The only way I can see we can get anywhere that we can hold is by
> generating the proper data to tell us what the situation is and to make it
> very simple to track that situation over time.
>
> It's also something a bit more authoritative than one commiters
> opinion when it comes to pushing on authors to harden tests. Some tests
> mainly fail on jenkins, some tests mainly fail for this guy or on that
> platform, "how can you bug me about my test, I see your test fail", etc.
> It's because there are just so many ways for tests to have a problem and so
> many ways to run the unit tests (I run ant tests with 10 jvms and 6 cores,
> others 2 and 2 and or even more). But in a fair and reasonable environment,
> running a test 100 times, 10 times in parallel, is a fantastic neutral and
> probably useful data point. If a test fails something 40% of the large
> majority of other tests can survive then "I can't see it in my env" loses
> most of it's weight. Improve or @BadApple.
>
> I've banged my head against this wall more than once, and maybe this
> amounts to about as much, but I've been thinking of this 'beasting test run
> report' for many years and I really think current and reliable data is a
> required step to get this right.
>
> Beyond that, there are not too many people comfortable just hopping around
> all areas of the code and hardening a broad range of tests, but I'll put in
> my fair share again and see if something sparks.
>
> If we can get to a better point, I'll be happy to help police
> the appearance of new flakey tests.
>
> Even now I will place extra focus on preventing new tests that are not
> solid. The report helps me out with monitoring that by trying to use git to
> get a create date for each test to display and listing the last 3 failure
> percentage results (newer tests will have 0 or 1 or 2 entries).
>
> I've spent a lot of time just getting it all automated, building my
> confidence in its results, and driving down the time it takes to generate a
> report so that I can do it more frequently and/or for much longer
> iterations.
>
> Over time there is a lot of little ways to improve efficiency on that
> front though. For example, new tests can be run many more iterations, tests
> that have failed before can be run higher iterations, etc. Linking the
> reports together gives us some ammo for choosing how much time and beasting
> we should spend on a test, and if a test is improving, getting worse, etc.
> Data we can act on with confidence. I also have all these reports as tsv
> files so they can be easily imported into just about any system for longer
> or more intense cross report tracking or something.
>
> - Mark
>
>
> On Thu, Feb 9, 2017 at 3:39 PM Dawid Weiss <dawid.we...@gmail.com> wrote:
>
> This is a very important, hard and ungrateful task. Thanks for doing
> this Mark. As you know I tried (with your help!) to clean some of this
> mess a long time ago and sadly failed. It'd be great to speed those
> tests up and make them more robust.
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
> --
> - Mark
> about.me/markrmiller
>
>
> --
- Mark
about.me/markrmiller

Re: The Wild World of Solr Tests

Reply via email to