On 4/5/13 10:43 AM, Mike Matrigali wrote:
I have been looking at nightly results of the last few weeks, with an
eye to making sure 10.10 release does not have regressions over previous
releases.
I think in the past we have tried to get clean nightly test runs
before making a release. It is a problem as there are known
intermittent errors, which make it hard to know if errors are
regressions or not.
Looking at the public nightly test runs for 10.10 I see:
Java DB Testing:
http://download.java.net/javadesktop/derby/10.10.html
currently has 1 in a row clean recent runs, 3 out of most recent 7
have failures. I think it is only running on checkins, so testing
of intermittent bugs is sparse.
IBM Testing:
http://people.apache.org/~myrnavl/derby_test_results/v10_10/windows/derbyall_history.html
http://people.apache.org/~myrnavl/derby_test_results/v10_10/windows/suites.All_history.html
http://people.apache.org/~myrnavl/derby_test_results/v10_10/linux/derbyall_history.html
http://people.apache.org/~myrnavl/derby_test_results/v10_10/linux/suites.All_history.html
currently has 0 in a row clean recent runs, and I don't think there
has been a "clean" day for 2 weeks.
I have not had time to look at all the failures, to determine if they
are regressions or not. While not totally clean the 10.9 runs
for the IBM Testing are much cleaner, so just using that metric it
seems 10.10 is not ready to ship.:
http://people.apache.org/~myrnavl/derby_test_results/v10_9/windows/suites.All_history.html
Thanks to Mike for raising this issue and thanks to Knut for analyzing
the problems in the Oracle test lab. I also am disappointed by the
signal to noise ratio in the nightly/continuous test results coming out
of the Oracle lab. That lab is still being debugged. I did a quick
calculation of bad runs vs. total runs for Oracle tests on the 10.10,
10.9, and 10.8 branches:
10.10: 30% failure rate
10.9: 44% failure rate
10.8: 41% failure rate
We clearly need to stabilize the Oracle lab. And the Derby tests have
too many heisenbugs. But the results for the 10.10 branch don't look
worse to me than the results for the 10.9 and 10.8 branches.
Moving on to the release candidate itself, here's a comparison of
distinct test failures reported during platform testing of the last 3
feature releases:
10.10.1: 4 distinct failures
10.9.1: 8 distinct failures
10.8.1: 10 distinct failures
Again, the platform test results for 10.10.1 don't look worse to me than
the results for 10.9.1 and 10.8.1.
I'm prepared to extend the vote by a week if that would help people
analyze the failures seen in the IBM lab. Let me know if I should do that.
Thanks,
-Rick