On Wed, Oct 11, 2017 at 10:19 AM, Stack <[email protected]> wrote: > Thats a lovely report Busbey. > > Let me see if I can get a rough answer to your question on minicluster > cores. > > On a clean machine w/ 48 cores, we spend an hour or so on 'smalltests' (no fork). We're using less than 10% of the CPUs (vmstat says ~95% idle). No io. When we get to the second part of the test run (medium+large), CPU goes up (fork = 5) and we move up to maybe 15% of CPU (vmstat is >85+ idle).
I can't go beyond because tests are failing and timing out, even on a 'clean' machine (Let me try w/ the flakies list in place). If I up the forking -- 1/4 of the CPUs for small tests and 1/2 for medium/large -- we seem to spin through the smalls fast (15mins or less -- all pass). The mediums seem to fluctuate between 15-60% of CPU. Overall, I did more tests in 1/4 time w/ upped forking (30odd mins vs two hours). It would seem that our defaults are anemic (Currently we use ~3-4 cores for small test run and 8-10 cores for medium/large). Could have fun setting fork count based off hardware. Could bring down our elapsed time for test runs.. In the past, surefire used to lose a few tests when high-concurrency. It might be better now. St.Ack > S > > > On Wed, Oct 11, 2017 at 6:43 AM, Sean Busbey <[email protected]> wrote: > >> Currently our precommit build has a history of ~233 builds. >> >> Looking across[1] those for those with unit test logs, and treating >> the string "timeout" as an indicator that things failed because of >> timeout rather than a known bad answer, we have 80 builds that had one >> or more test timeout. >> >> breaking this down by host: >> >> | Host | % timeout | Success | Timeout Failure | General Failure | >> | ---- | ---------:| -------:| ---------------:| ---------------:| >> | H0 | 42% | 10 | 15 | 11 | >> | H1 | 54% | 6 | 14 | 6 | >> | H2 | 45% | 18 | 35 | 24 | >> | H3 | 100% | 0 | 1 | 0 | >> | H4 | 0% | 1 | 0 | 2 | >> | H5 | 20% | 1 | 1 | 3 | >> | H6 | 44% | 4 | 4 | 1 | >> | H9 | 35% | 2 | 7 | 11 | >> | H10 | 26% | 4 | 8 | 19 | >> | H11 | 0% | 0 | 0 | 2 | >> | H12 | 43% | 1 | 3 | 3 | >> | H13 | 22% | 1 | 2 | 6 | >> | H26 | 0% | 0 | 0 | 1 | >> >> >> It's odd that we so strongly favor H2. But I don't see evidence that >> we have a bad host that we could just exclude. >> >> Scaling our concurrency by number of CPU cores is something surefire >> can do. Let me see what the H* hosts look like to figure out some >> example mappings. Do we have a rough bound on how many cores a single >> test using MiniCluster should need? 3? >> >> -busbey >> >> [1]: By "looking across" I mean using the python-jenkins library >> >> https://gist.github.com/busbey/ff5f7ae3a292164cc110fdb934935c8c >> >> >> >> On Mon, Oct 9, 2017 at 4:40 PM, Stack <[email protected]> wrote: >> > On Mon, Oct 9, 2017 at 7:38 AM, Sean Busbey <[email protected]> wrote: >> > >> >> Hi folks! >> >> >> >> Lately our precommit runs have had a large amount of noise around unit >> >> test failures due to timeout, especially for the hbase-server module. >> >> >> >> >> > I've not looked at why the timeouts. Anyone? Usually there is a cause. >> > >> > ... >> > >> > >> >> I'd really like to get us back to a place where a precommit -1 doesn't >> >> just result in a reflexive "precommit is unreliable." >> > >> > >> > This is the default. The exception is one of us works on stabilizing >> test >> > suite. It takes a while and a bunch of effort but stabilization has been >> > doable in the past. Once stable, it stays that way a while before the >> rot >> > sets in. >> > >> > >> > >> >> * Do fewer parallel executions. We do 5 tests at once now and the >> >> hbase-server module takes ~1.5 hours. We could tune down just the >> >> hbase-server module to do fewer. >> >> >> > >> > >> > Is it the loading that is the issue or tests stamping on each other. If >> > latter, I'd think we'd want to fix it. If former, would want to look at >> it >> > too; I'd think our tests shouldn't be such that they fall over if the >> > context is other than 'perfect'. >> > >> > I've not looked at a machine when five concurrent hbase tests running. >> Is >> > it even putting up a load? Over the extent of the full test suite? Or >> is it >> > that it is just a few tests that when run together, they cause issue. >> Could >> > we stagger these or give them their own category or have them burn less >> > brightly? >> > >> > If tests are failing because contention for resources, we should fix the >> > test. If given a machine, we should burn it up rather than pussy-foot it >> > I'd say (can we size the concurrency off a query of the underlying OS >> so we >> > step by CPUs say?). >> > >> > Tests could do with an edit. Generally, tests are written once and then >> > never touched again. Meantime the system evolves. Edit could look for >> > redundancy. Edit could look for cases where we start clusters >> > --timeconsumming-- and we don't have to (use Mocks or start standalone >> > instances instead). We also have some crazy tests that spin up lots of >> > clusters all inside a single JVM though the context is the same as that >> of >> > a simple method evaluation. >> > >> > St.Ack >> > >
