That would be ideal. However, an infrequent run of a very large test set can be managed manually, with check lists.

Patricia


Jonathan Costers wrote:
The QA harness is also supposed to be able to work in distributed mode, i.e.
having multiple machines work together on one test run (splitting the work
so to speak).
I haven't looked into that feature too much though.

2010/8/27 Patricia Shanahan <[email protected]>

Based on some experiments, I am convinced a full run may take more than 24
hours, so even that may have to be selective. Jonathan Costers reports
killing a full run after several days. We may need three targets, in
addition to problem-specific categories:

1. A quick test that one would do, for example, after checking out and
building.

2. A more substantive test that would run in less than 24 hours, to do each
day.

3. A complete test that might take several machine-days, and that would be
run against a release candidate prior to release.

Note that even if a test sequence takes several machine-days, that does not
necessarily mean days of elapsed time. Maybe some tests can be run in
parallel under the same OS copy. Even if that is not possible, we may be
able to gang up several physical or virtual machines, each running a subset
of the tests.

I think virtual machines may work quite well because a lot of the tests do
something then wait around a minute or two to see what happens. They are not
very intensive resource users.

Patricia



Peter Firmstone wrote:

Hi JC,

Can we have an ant target for running all the tests?

And how about a qa.run.hudson target?

I usually use run-categories, to isolate what I'm working on, but we
definitely need a target that runs everything that should be, even if it
does take overnight.

Regards,

Peter.

Jonathan Costers wrote:

2010/8/24 Patricia Shanahan <[email protected]>



On 8/22/2010 4:57 PM, Peter Firmstone wrote:
...

 Thanks Patricia, that's very helpful, I'll figure it out where I went


wrong this week, it really shows the importance of full test coverage.



...

I strongly agree that test coverage is important. Accordingly, I've done
some analysis of the "ant qa.run" output.

There are 1059 test description (*.td) files that exist, and are loaded
at
the start of "ant qa.run", but that do not seem to be run. I've
extracted
the top level categories from those files:

constraint
discoveryproviders_impl
discoveryservice
end2end
eventmailbox
export_spec
io
javaspace
jeri
joinmanager
jrmp
loader
locatordiscovery
lookupdiscovery
lookupservice
proxytrust
reliability
renewalmanager
renewalservice
scalability
security
start
txnmanager

I'm sure some of these tests are obsolete, duplicates of tests in
categories that are being run, or otherwise inappropriate, but there
does
seem to be a rich vein of tests we could mine.



The QA harness loads all .td files under the "spec" and "impl"
directories
when starting and only witholds the ones that are tagged with the
categories
that we specify from the Ant target.
Whenever a test is really obsolete or otherwise not supposed to run, it
is
marked with a "SkipTestVerifier" in its .td file.
Most of these are genuine and should be run though.
There are more categories than the ones you mention above, for instance:
"spec", "id", "id_spec", etc.
Also, some tests are tagged with multiple categories and as such
duplicates
can exist when assembling the list of tests to run.

The reason not all of them are run (by Hudson) now is that we give a
specific set of test categories that are known (to me) to run smoothly.
There are many others that are not run (by default) because issue(s) are
present with one or more of the tests in that category.

I completely agree with the fact that we should not exclude complete test
categories because of one test failing.
What we probably should do is tag any problematic test (due to
infrastructure or other reasons) with a SkipTestVerifier for the time
being
so that it is not taken into account by the QA harness for now.
That way, we can add all test categories to the default Ant run.
However, this would take a large amount of time to run (I've tried it
once,
and killed the process after several days), which brings us to your next
point:

Part of the problem may be time to run the tests. I'd like to propose


splitting the tests into two sets:

1. A small set that one would run in addition to the relevant tests,
whenever making a small change. It should *not* be based on skipping
complete categories, but on doing those tests from each category that
are
most likely to detect regression, especially regression due to changes
in
other areas.



Completely agree. However, most of the QA tests are not clear unit or
regression tests. They are more integration/conformance tests that test
the
requirements of the spec and its implementation.
Identifying the list of "right" tests to run as part of the small set you
mention would require going through all 1059 test descriptions and their
sources.

2. A full test set that may take a lot longer. In many projects, there is
a


"nightly build" and a test sequence that is run against that build. That
test sequence can take up to 24 hours to run, and should be as complete
as
possible. Does Apache have infrastructure to support this sort of
operation?



Again, completely agree. I'm sure Apache supports this through Hudson. We
could request to setup a second build job, doing nightly builds and
running
the whole test suite. Think this is the only way to make running the
complete QA suite automatically practical.




Are there any tests that people *know* should not run? I'm thinking of
running the lot just to see what happens, but knowing ones that are not
expected to work would help with result interpretation.



See above, tests of that type should have already been tagged to be
skipped
by the good people that donated this test suite.
I've noticed that usually, when a SkipTestVerifier is used in a .td file,
someone has put some comments in there to explain why it was tagged as
such.




Patricia











Reply via email to