Two things: (1) Adding minutes to test runs is no big deal to me. Every distributed system I've built ends up being like this in order to cover off failure scenarios and such. The tests for my paxos implementation total around 4 minutes right now. That excludes a bunch of soak tests that I run separately and for hours at a time.
(2) Concurrency tests are hit and miss by virtue of the inherent non-determinism introduced by variation in number of cores, processor speed, external load, number of threads in the app and operating system characteristics (e.g. scheduler behaviour). You can run them for hours and hours and not hit a problem, then re-run them and find a problem in mere seconds. So I would say, add the more stressful versions and don't worry about the time to test too much equally, don't expect to surface all concurrency bugs this way. Dan. On 10 February 2011 21:40, Patricia Shanahan <[email protected]> wrote: > The investigation that revealed problems in outrigger's FastList and some > of the corresponding test code started from a single failure of a QA test > that usually runs correctly. The failure did not reproduce with the test as > currently checked in. To investigate, I had to drastically increase the > numbers of entries and threads. > > Should I check in more stressful versions of some or all of the outrigger > stress tests? > > Doing so would further slow down QA, by order minutes for each enhanced > test. On the other hand, I have clear evidence that the current versions of > the tests have low probability of finding concurrency bugs that could be > detected by enhanced versions. > > Patricia >
