Can you give any guidance on how to find out which tests need what
infrastructure? Is it documented somewhere? I'm still learning my way
around the River files.
Also, I'm interested in tests that fail unexpectedly, especially any
tests that have regressed or fail intermittently without related source
code changes.
I have a suspicion, based on source code reading, of a race condition in
ServiceDiscoveryManager, and problems related to retries in some
subclasses of RetryTask. If these problems are real they would tend to
lead to unreproducible, intermittent failures rather than solid failures.
Patricia
On 8/25/2010 2:30 PM, Jonathan Costers wrote:
There is one more test category that we could add to the list that is used
by Hudson: "renewalmanager".
All the other categories have one or more issues (I have run all these tests
myself many, many times), mostly because of missing infrastructure, but some
also fail unexpectedly.
2010/8/24 Patricia Shanahan<[email protected]>
I'm not sure how much that would tell us, done on a bulk basis, because
some of the tests will be specific to bugs that were found and fixed after
then.
I will be doing something similar for individual tests, but taking into
account what their comments tell me about which versions are expected to
pass.
Patricia
On 8/24/2010 1:02 PM, Patrick Wright wrote:
Hi Patricia
Is there perhaps a solid baseline to test against, for example Jini
2.1 to see how many pass/fails we get?
Thanks for all the hard work
Patrick
On Tue, Aug 24, 2010 at 9:58 PM, Patricia Shanahan<[email protected]> wrote:
I ran a batch of the previously ignored QA tests overnight. I got 156
passes
and 64 failures. This is nowhere near as bad as it sounds, because many
of
the failures were clusters of related tests failing in similar ways,
suggesting a single problem affecting the base infrastructure for the
test
category. Some of the failures may relate to the known regression that
Peter
is going to look at this week.
Also, it is important to remember that the bugs may be in the tests, not
in
the code under test. A test may be obsolete, depending on behavior that
is
no longer supported.
I do think there is a good enough chance that at least one of the
failures
represents a real problem, and an opportunity to improve River, that I
plan
to start a background activity looking at failed tests to see what is
going
on. The objective is to do one of three things for each cluster of
failures:
1. Fix River.
2. Fix the test.
3. Decide the test is unfixable, and delete it. There is no point
spending
disk space, file transfer time, and test load time on tests we are never
going to run.
Running the subset I did last night took about 15 hours, but that
included a
lot of timeouts.
Patricia