The QA harness is also supposed to be able to work in distributed mode, i.e. having multiple machines work together on one test run (splitting the work so to speak). I haven't looked into that feature too much though.
2010/8/27 Patricia Shanahan <[email protected]> > Based on some experiments, I am convinced a full run may take more than 24 > hours, so even that may have to be selective. Jonathan Costers reports > killing a full run after several days. We may need three targets, in > addition to problem-specific categories: > > 1. A quick test that one would do, for example, after checking out and > building. > > 2. A more substantive test that would run in less than 24 hours, to do each > day. > > 3. A complete test that might take several machine-days, and that would be > run against a release candidate prior to release. > > Note that even if a test sequence takes several machine-days, that does not > necessarily mean days of elapsed time. Maybe some tests can be run in > parallel under the same OS copy. Even if that is not possible, we may be > able to gang up several physical or virtual machines, each running a subset > of the tests. > > I think virtual machines may work quite well because a lot of the tests do > something then wait around a minute or two to see what happens. They are not > very intensive resource users. > > Patricia > > > > Peter Firmstone wrote: > >> Hi JC, >> >> Can we have an ant target for running all the tests? >> >> And how about a qa.run.hudson target? >> >> I usually use run-categories, to isolate what I'm working on, but we >> definitely need a target that runs everything that should be, even if it >> does take overnight. >> >> Regards, >> >> Peter. >> >> Jonathan Costers wrote: >> >>> 2010/8/24 Patricia Shanahan <[email protected]> >>> >>> >>> >>>> On 8/22/2010 4:57 PM, Peter Firmstone wrote: >>>> ... >>>> >>>> Thanks Patricia, that's very helpful, I'll figure it out where I went >>>> >>>> >>>>> wrong this week, it really shows the importance of full test coverage. >>>>> >>>>> >>>>> >>>> ... >>>> >>>> I strongly agree that test coverage is important. Accordingly, I've done >>>> some analysis of the "ant qa.run" output. >>>> >>>> There are 1059 test description (*.td) files that exist, and are loaded >>>> at >>>> the start of "ant qa.run", but that do not seem to be run. I've >>>> extracted >>>> the top level categories from those files: >>>> >>>> constraint >>>> discoveryproviders_impl >>>> discoveryservice >>>> end2end >>>> eventmailbox >>>> export_spec >>>> io >>>> javaspace >>>> jeri >>>> joinmanager >>>> jrmp >>>> loader >>>> locatordiscovery >>>> lookupdiscovery >>>> lookupservice >>>> proxytrust >>>> reliability >>>> renewalmanager >>>> renewalservice >>>> scalability >>>> security >>>> start >>>> txnmanager >>>> >>>> I'm sure some of these tests are obsolete, duplicates of tests in >>>> categories that are being run, or otherwise inappropriate, but there >>>> does >>>> seem to be a rich vein of tests we could mine. >>>> >>>> >>>> >>> >>> The QA harness loads all .td files under the "spec" and "impl" >>> directories >>> when starting and only witholds the ones that are tagged with the >>> categories >>> that we specify from the Ant target. >>> Whenever a test is really obsolete or otherwise not supposed to run, it >>> is >>> marked with a "SkipTestVerifier" in its .td file. >>> Most of these are genuine and should be run though. >>> There are more categories than the ones you mention above, for instance: >>> "spec", "id", "id_spec", etc. >>> Also, some tests are tagged with multiple categories and as such >>> duplicates >>> can exist when assembling the list of tests to run. >>> >>> The reason not all of them are run (by Hudson) now is that we give a >>> specific set of test categories that are known (to me) to run smoothly. >>> There are many others that are not run (by default) because issue(s) are >>> present with one or more of the tests in that category. >>> >>> I completely agree with the fact that we should not exclude complete test >>> categories because of one test failing. >>> What we probably should do is tag any problematic test (due to >>> infrastructure or other reasons) with a SkipTestVerifier for the time >>> being >>> so that it is not taken into account by the QA harness for now. >>> That way, we can add all test categories to the default Ant run. >>> However, this would take a large amount of time to run (I've tried it >>> once, >>> and killed the process after several days), which brings us to your next >>> point: >>> >>> Part of the problem may be time to run the tests. I'd like to propose >>> >>> >>>> splitting the tests into two sets: >>>> >>>> 1. A small set that one would run in addition to the relevant tests, >>>> whenever making a small change. It should *not* be based on skipping >>>> complete categories, but on doing those tests from each category that >>>> are >>>> most likely to detect regression, especially regression due to changes >>>> in >>>> other areas. >>>> >>>> >>>> >>> >>> Completely agree. However, most of the QA tests are not clear unit or >>> regression tests. They are more integration/conformance tests that test >>> the >>> requirements of the spec and its implementation. >>> Identifying the list of "right" tests to run as part of the small set you >>> mention would require going through all 1059 test descriptions and their >>> sources. >>> >>> 2. A full test set that may take a lot longer. In many projects, there is >>> a >>> >>> >>>> "nightly build" and a test sequence that is run against that build. That >>>> test sequence can take up to 24 hours to run, and should be as complete >>>> as >>>> possible. Does Apache have infrastructure to support this sort of >>>> operation? >>>> >>>> >>>> >>> >>> Again, completely agree. I'm sure Apache supports this through Hudson. We >>> could request to setup a second build job, doing nightly builds and >>> running >>> the whole test suite. Think this is the only way to make running the >>> complete QA suite automatically practical. >>> >>> >>> >>> >>>> Are there any tests that people *know* should not run? I'm thinking of >>>> running the lot just to see what happens, but knowing ones that are not >>>> expected to work would help with result interpretation. >>>> >>>> >>>> >>> >>> See above, tests of that type should have already been tagged to be >>> skipped >>> by the good people that donated this test suite. >>> I've noticed that usually, when a SkipTestVerifier is used in a .td file, >>> someone has put some comments in there to explain why it was tagged as >>> such. >>> >>> >>> >>> >>>> Patricia >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >> >> >> >
