[sage-devel] When is a test not a valid test?

David Kirkby Wed, 01 Dec 2010 06:49:15 -0800

I'm somewhat unimpressed by the way some doc tests are constrained. An
example was at


http://trac.sagemath.org/sage_trac/ticket/10187

where I raised an issue.

There was this test:

 sage: taylor(gamma(1/3+x),x,0,3)

-1/432*((36*(pi*sqrt(3) + 9*log(3))*euler_gamma^2 + 27*pi^2*log(3) +
72*euler_gamma^3 + 243*log(3)^3 + 18*(6*pi*sqrt(3)*log(3) + pi^2 +
27*log(3)^2 + 12*psi(1, 1/3))*euler_gamma + 324*psi(1, 1/3)*log(3) +
(pi^3 + 9*(9*log(3)^2 + 4*psi(1, 1/3))*pi)*sqrt(3))*gamma(1/3) -
72*gamma(1/3)*psi(2, 1/3))*x^3 + 1/24*(6*pi*sqrt(3)*log(3) +
4*(pi*sqrt(3) + 9*log(3))*euler_gamma + pi^2 + 12*euler_gamma^2 +
27*log(3)^2 + 12*psi(1, 1/3))*x^2*gamma(1/3) - 1/6*(6*euler_gamma +
pi*sqrt(3) + 9*log(3))*x*gamma(1/3) + gamma(1/3)

 sage: map(lambda f:f[0].n(), _.coeffs())
 [2.6789385347..., -8.3905259853..., 26.662447494..., -80.683148377...]

I asked the author on the ticket that added the numerical coefficients
( [2.6789385347..., -8.3905259853..., 26.662447494...,
-80.683148377...]) to justify them, since I wanted to know they were
right before giving this a positive review. The author remarked he was
not the original author of the long analytic expression, but doubted
it had ever been checked. However, he did agree to check the numerical
results he had added. He did this using Maple 12 and got the same
answer as Sage.

In this case I'm satisfied the bit of code added to get the numerical
results is probably OK, as it has been independently verified by
another package.. The probability of them both being wrong is very
small, since they should be developed largely independent of each
other. Also the analytic express is probably OK.

I really feel people should use doctests where the analytic results
can be verified, or at least justified in some way. If the results are
then be expressed as numerical results, whenever possible those
numerical results should be independently verified, as was done on
this ticket after I requested verification.

Method of verification could include

 * Results given in a decent book
 * Results computed by programs like Mathematic and Maple.
 * Showing results are similar to an approximate method.

For example, if a bit of code claims to compute prime_pi(n) exactly
with 
n=10000000000000000000000000000000000000000000000000000000000000000000000000000000000
then that would be difficult to verify by other means. Mathematica for
example can't do it, and I doubt there is any computer could do it in
my lifetime. [1]

But there are numerical approximation for prime_pi, so computing a
numerical approximation, and showing it's similar to the numerical
equivalent of what was computed would be a reasonable verification the
function is correct.

It seems to me that many of the doctest have as expected values that's
basically whatever someone got on their computer. Sometimes they have
the sense to realise that different floating point processors will
give different results, so they add a few dots so not every digit is
expected to be the same.

To me at least, tests where the results are totally unjustified are
very poor tests, yet they seem to be quite common.

I was reading the other day about how one of the large Mersenne primes
was verified. I can't be exact, but it was something like:

 * Found by one person on his computer using an AMD or Intel CPU
 * Checked by another person using a different program on an Intel or AMD CPU
 * Checked by a third person, on a Sun M9000 using a SPARC processor.

I'm not expecting us to such lengths, but I feel expected values
should be justified.

Whenever we run tests on the Python package we get failures. If we run
the Maxima test suite, we get failures, which appear with ECL as the
Lisp interpreter, but not on some other interpreters. This indicates
to me we should not put too much trusts into tests which re not
justified.

Comments?

Dave

[1] An interesting experiment would be to find a proof that such a
number could not be computed before the Sun runs out of energy and so
all life on earth would be terminated. The designers of the 128-bit
file system used on Solaris have verified that the energy required to
fill the file system would be more than the energy required to boil
all the water in the oceans. I suspect similar arguments could be used
to prove one can't compute prime_pi(n) for sufficiently large n.

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

[sage-devel] When is a test not a valid test?

Reply via email to