On 4/5/13 10:43 AM, Mike Matrigali wrote:
I have been looking at nightly results of the last few weeks, with an
eye to making sure 10.10 release does not have regressions over previous
releases.

I think in the past we have tried to get clean nightly test runs before making a release. It is a problem as there are known intermittent errors, which make it hard to know if errors are regressions or not.

Looking at the public nightly test runs for 10.10 I see:

Java DB Testing:
http://download.java.net/javadesktop/derby/10.10.html
currently has 1 in a row clean recent runs, 3 out of most recent 7 have failures. I think it is only running on checkins, so testing
of intermittent bugs is sparse.

IBM Testing:
http://people.apache.org/~myrnavl/derby_test_results/v10_10/windows/derbyall_history.html http://people.apache.org/~myrnavl/derby_test_results/v10_10/windows/suites.All_history.html http://people.apache.org/~myrnavl/derby_test_results/v10_10/linux/derbyall_history.html http://people.apache.org/~myrnavl/derby_test_results/v10_10/linux/suites.All_history.html
currently has 0 in a row clean recent runs, and I don't think there
has been a "clean" day for 2 weeks.

I have not had time to look at all the failures, to determine if they
are regressions or not.  While not totally clean the 10.9 runs
for the IBM Testing are much cleaner, so just using that metric it
seems 10.10 is not ready to ship.:
http://people.apache.org/~myrnavl/derby_test_results/v10_9/windows/suites.All_history.html

Thanks to Mike for raising this issue and thanks to Knut for analyzing the problems in the Oracle test lab. I also am disappointed by the signal to noise ratio in the nightly/continuous test results coming out of the Oracle lab. That lab is still being debugged. I did a quick calculation of bad runs vs. total runs for Oracle tests on the 10.10, 10.9, and 10.8 branches:

10.10: 30% failure rate
10.9: 44% failure rate
10.8: 41% failure rate

We clearly need to stabilize the Oracle lab. And the Derby tests have too many heisenbugs. But the results for the 10.10 branch don't look worse to me than the results for the 10.9 and 10.8 branches.

Moving on to the release candidate itself, here's a comparison of distinct test failures reported during platform testing of the last 3 feature releases:

10.10.1: 4 distinct failures
10.9.1: 8 distinct failures
10.8.1: 10 distinct failures

Again, the platform test results for 10.10.1 don't look worse to me than the results for 10.9.1 and 10.8.1.

I'm prepared to extend the vote by a week if that would help people analyze the failures seen in the IBM lab. Let me know if I should do that.

Thanks,
-Rick


Reply via email to