I just ran the set of tests that are currently being selected when executing
the qa.run target, after I added a couple more categories:
# of tests started = 497
# of tests completed = 497
# of tests skipped = 21
# of tests passed = 497
# of tests failed = 0
-----------------------------------------
Date finished:
Fri Aug 27 12:21:04 CEST 2010
Time elapsed:
27258 seconds
BUILD SUCCESSFUL (total time: 454 minutes 20 seconds)
The categories that are run are:
id,loader,policyprovider,locatordiscovery,activation,config,discoverymanager,joinmanager,url,iiop,jrmp,reliability,thread,renewalmanager,constraint,export,lookupdiscovery
Looks like we almost have 50% coverage now (about 500 tests out of 1000+).
On my system (an Intel Quad Core with 4GB of memory), this took 7-8 hours to
run.
2010/8/27 Patricia Shanahan <[email protected]>
> That would be ideal. However, an infrequent run of a very large test set
> can be managed manually, with check lists.
>
> Patricia
>
>
>
> Jonathan Costers wrote:
>
>> The QA harness is also supposed to be able to work in distributed mode,
>> i.e.
>> having multiple machines work together on one test run (splitting the work
>> so to speak).
>> I haven't looked into that feature too much though.
>>
>> 2010/8/27 Patricia Shanahan <[email protected]>
>>
>> Based on some experiments, I am convinced a full run may take more than
>>> 24
>>> hours, so even that may have to be selective. Jonathan Costers reports
>>> killing a full run after several days. We may need three targets, in
>>> addition to problem-specific categories:
>>>
>>> 1. A quick test that one would do, for example, after checking out and
>>> building.
>>>
>>> 2. A more substantive test that would run in less than 24 hours, to do
>>> each
>>> day.
>>>
>>> 3. A complete test that might take several machine-days, and that would
>>> be
>>> run against a release candidate prior to release.
>>>
>>> Note that even if a test sequence takes several machine-days, that does
>>> not
>>> necessarily mean days of elapsed time. Maybe some tests can be run in
>>> parallel under the same OS copy. Even if that is not possible, we may be
>>> able to gang up several physical or virtual machines, each running a
>>> subset
>>> of the tests.
>>>
>>> I think virtual machines may work quite well because a lot of the tests
>>> do
>>> something then wait around a minute or two to see what happens. They are
>>> not
>>> very intensive resource users.
>>>
>>> Patricia
>>>
>>>
>>>
>>> Peter Firmstone wrote:
>>>
>>> Hi JC,
>>>>
>>>> Can we have an ant target for running all the tests?
>>>>
>>>> And how about a qa.run.hudson target?
>>>>
>>>> I usually use run-categories, to isolate what I'm working on, but we
>>>> definitely need a target that runs everything that should be, even if it
>>>> does take overnight.
>>>>
>>>> Regards,
>>>>
>>>> Peter.
>>>>
>>>> Jonathan Costers wrote:
>>>>
>>>> 2010/8/24 Patricia Shanahan <[email protected]>
>>>>>
>>>>>
>>>>>
>>>>> On 8/22/2010 4:57 PM, Peter Firmstone wrote:
>>>>>> ...
>>>>>>
>>>>>> Thanks Patricia, that's very helpful, I'll figure it out where I went
>>>>>>
>>>>>>
>>>>>> wrong this week, it really shows the importance of full test
>>>>>>> coverage.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ...
>>>>>>
>>>>>> I strongly agree that test coverage is important. Accordingly, I've
>>>>>> done
>>>>>> some analysis of the "ant qa.run" output.
>>>>>>
>>>>>> There are 1059 test description (*.td) files that exist, and are
>>>>>> loaded
>>>>>> at
>>>>>> the start of "ant qa.run", but that do not seem to be run. I've
>>>>>> extracted
>>>>>> the top level categories from those files:
>>>>>>
>>>>>> constraint
>>>>>> discoveryproviders_impl
>>>>>> discoveryservice
>>>>>> end2end
>>>>>> eventmailbox
>>>>>> export_spec
>>>>>> io
>>>>>> javaspace
>>>>>> jeri
>>>>>> joinmanager
>>>>>> jrmp
>>>>>> loader
>>>>>> locatordiscovery
>>>>>> lookupdiscovery
>>>>>> lookupservice
>>>>>> proxytrust
>>>>>> reliability
>>>>>> renewalmanager
>>>>>> renewalservice
>>>>>> scalability
>>>>>> security
>>>>>> start
>>>>>> txnmanager
>>>>>>
>>>>>> I'm sure some of these tests are obsolete, duplicates of tests in
>>>>>> categories that are being run, or otherwise inappropriate, but there
>>>>>> does
>>>>>> seem to be a rich vein of tests we could mine.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The QA harness loads all .td files under the "spec" and "impl"
>>>>> directories
>>>>> when starting and only witholds the ones that are tagged with the
>>>>> categories
>>>>> that we specify from the Ant target.
>>>>> Whenever a test is really obsolete or otherwise not supposed to run, it
>>>>> is
>>>>> marked with a "SkipTestVerifier" in its .td file.
>>>>> Most of these are genuine and should be run though.
>>>>> There are more categories than the ones you mention above, for
>>>>> instance:
>>>>> "spec", "id", "id_spec", etc.
>>>>> Also, some tests are tagged with multiple categories and as such
>>>>> duplicates
>>>>> can exist when assembling the list of tests to run.
>>>>>
>>>>> The reason not all of them are run (by Hudson) now is that we give a
>>>>> specific set of test categories that are known (to me) to run smoothly.
>>>>> There are many others that are not run (by default) because issue(s)
>>>>> are
>>>>> present with one or more of the tests in that category.
>>>>>
>>>>> I completely agree with the fact that we should not exclude complete
>>>>> test
>>>>> categories because of one test failing.
>>>>> What we probably should do is tag any problematic test (due to
>>>>> infrastructure or other reasons) with a SkipTestVerifier for the time
>>>>> being
>>>>> so that it is not taken into account by the QA harness for now.
>>>>> That way, we can add all test categories to the default Ant run.
>>>>> However, this would take a large amount of time to run (I've tried it
>>>>> once,
>>>>> and killed the process after several days), which brings us to your
>>>>> next
>>>>> point:
>>>>>
>>>>> Part of the problem may be time to run the tests. I'd like to propose
>>>>>
>>>>>
>>>>> splitting the tests into two sets:
>>>>>>
>>>>>> 1. A small set that one would run in addition to the relevant tests,
>>>>>> whenever making a small change. It should *not* be based on skipping
>>>>>> complete categories, but on doing those tests from each category that
>>>>>> are
>>>>>> most likely to detect regression, especially regression due to changes
>>>>>> in
>>>>>> other areas.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Completely agree. However, most of the QA tests are not clear unit or
>>>>> regression tests. They are more integration/conformance tests that test
>>>>> the
>>>>> requirements of the spec and its implementation.
>>>>> Identifying the list of "right" tests to run as part of the small set
>>>>> you
>>>>> mention would require going through all 1059 test descriptions and
>>>>> their
>>>>> sources.
>>>>>
>>>>> 2. A full test set that may take a lot longer. In many projects, there
>>>>> is
>>>>> a
>>>>>
>>>>>
>>>>> "nightly build" and a test sequence that is run against that build.
>>>>>> That
>>>>>> test sequence can take up to 24 hours to run, and should be as
>>>>>> complete
>>>>>> as
>>>>>> possible. Does Apache have infrastructure to support this sort of
>>>>>> operation?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Again, completely agree. I'm sure Apache supports this through
>>>>> Hudson. We
>>>>> could request to setup a second build job, doing nightly builds and
>>>>> running
>>>>> the whole test suite. Think this is the only way to make running the
>>>>> complete QA suite automatically practical.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Are there any tests that people *know* should not run? I'm thinking of
>>>>>> running the lot just to see what happens, but knowing ones that are
>>>>>> not
>>>>>> expected to work would help with result interpretation.
>>>>>>
>>>>>>
>>>>>>
>>>>>> See above, tests of that type should have already been tagged to be
>>>>> skipped
>>>>> by the good people that donated this test suite.
>>>>> I've noticed that usually, when a SkipTestVerifier is used in a .td
>>>>> file,
>>>>> someone has put some comments in there to explain why it was tagged as
>>>>> such.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Patricia
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>
>