RE: Math::Big* updates

Tels Wed, 20 Mar 2002 11:54:23 -0800

-----BEGIN PGP SIGNED MESSAGE-----

Moin,


On 20-Mar-02 Green, Paul tried to scribble about:
> Tels said:
> 
>> Currently it tells me that I miss a couple of thousand tests (or about
>> 70% of all unary and 94% of all binary operation ones) - and I am still
>> thinking on how I can add these without driving the testsuite running
>> time trough the> roof, as well as tie me up for 12 weeks entering
>> testcases.
> 
> I am a big fan of testing numeric functions by randomly generating input

;-)

Thanx for your response! I agree with what you said, just some random
thoughts:

> operands. I don't mean to rule out basic tests, or special-case tests
> written to drive the source code into a specific situation; such tests
> are well worth the time and effort.  But I once had a math package that
> had been carefully tested using manually generated inputs, and which
> nonetheless had a big gaping hole.  We had a customer discover it after
> some months in the field.  We wrote a new program to randomly generate
> operands and perform the calculations two ways, and managed to generate
> a test case almost as soon as we fired it up.(*)

I have thought about this quite a lot. You see, the current testsuite is
by no means complete, and the "hole" in
http://bloodgate.com/perl/bigint/errata.html just shows that. (Very
similiar to your situation!)

Now, test coverage is quite nice (Math::Big has quite high coverage thanx to
Devel::Cover ;) - but not enough, since it doesn't guarantie that the right
statement is executed (or not!) at the right time.

Basically, there are these kind of errors that can occur in Math::Big:

 * barfs in the functional interface, like some functions don't take all
   type of inputs they should (scalar vs. object etc, or non-subclass proof
   stuff)
 * erorrs on shortcuts, like a test saying that $x * 0 == 0 coming before
   the test for NaN, would wrongly produce NaN * 0 == 0 vs. NaN.
 * erros in the actual math code

The first are quite easily catched, since there are only a few functions
and input types. Tiresome, yes, but doable.

The second type of error is more harder. Fortunately, some general rules
exist (f.i.i anything that gets a NaN produces a NaN) and some special rules
for inf and 0, and with a small testsuite this can be completed quite
easily. Also, some of these tests (0*0, 0+0, etc) fall under point three,
too.

After quite some thinking I had several ideas how to test this, more below.

First:

There are also two types of tests: some that are in every distribution, and
some that are external. For instance, blasting trough 100,000 random tests
each time you install Perl and/or Math::BigInt might take too much time.
However, just testing it on my system isn't very good either, since the
actual math code will work different (and thus produce different errors or
no error vs error) on different systems.

I currently work on enhancing the testsuite that is bundled with
Math::BigInt. My plans are that I make up the new testcases, at
least the numerical ones, to be of semi-random patterns, like '100...00',
'111...111', '222...222', '123456789012...90' etc. Then I would insert
these testcases into the testsuite with a script along with the intended
result.

One thing that inhibits big testsuites is also the distribution size. A
ranom tester that does a few calculations without a table of the results
(like that int(A/B)*B + (A%B) == A) would be a win. Maybe I include one and
let it do 1000 calculations or so. Distributed over the many people that
install Math::BigInt, that would be a fairly high testcase count.

(There is a bug in Math::BigInt with and/xor/ior that I discovered while
doing all $x & $y for $x,$y = 1...65536, sort of a random test)

But having a random-test generator and running it just on a few selected
platforms might also be worthwhile (I need 36 hours a day ;)

Back to the testing methods. I tried/thought about the following ways to
crosscheck the testoutput (But I did not think about your
self-checking example, thanx!):

* Using bc or dc: Unfortunately, these don't parse most inputs that
  BigInt/BigFloat can do (1.23e4 etc), so it won't work for all tests :/
* Using Python: Ditto.
* Using BigInt itself. This sounds silly, but isn't actually. The reason is
  that if I use BigInt to create the test and the result, it wouldn't find a
  bug right-away. (for instance -1 + 1000000001 would result in 0 both in
  test and praxis). But I would put as usually the same testsuite under
  Math::BigInt::GMP and Math::BigInt::Pari and the failure would occur
  there, indicating that something is wrong with BigInt.

Also, apart from testing random numbers one needs to test "patterns", e.g.
the -1 + 1xxxxxxxx1 test relies on the fact that there are more than
$BASE_LEN zeros in the second operand and the first operand is very small
(< $BASE). Doing just random numbers it might take too long until a random
input justifies these conditions. Ah well ;)

> We had the good fortune of having both a hardware and software
> implementation that we could test against each other. Clearly, the
> situation with Math::Big is a little different.  But perhaps you can
> generate random inputs to some equations that will massage the numbers
> and give you back your input operand; that way you would not need a
> second, parallel implementation to test against.  Say, A=B*(A/B) for
> cases known to produce integral results.

That is a good idea, but not strictly necc. as outlined above - I can check
against Pari and GMP so at least the purely numerical tests can be
crossreferences. But the idea is certainly neat.

> (*)FWIW, if you have a copy of Knuth, Volume 2, turn to Algorithm D in
> section 4.3.1. This is the algorithm for division of extended-precision
> nonnegative integers.  We blew step D6.  Presciently, Knuth says "The
> probability that this step is necessary is very small...test data that
> activates this step should therefore be specifically contrived when
> debugging." We didn't do that, and paid the price. Had we tested with
> random inputs, we'd have found it in the lab.

Funnily enough, I not own the book, but read it and remember clearly this
part ;-) Unfortunately, Knuth doesn't give specific test examples, and the
actual implementation in BigInt wan't done by me and might well differ.
Complicated by the fact that Calc uses different bases on different
systems, and that might make it necc. to have more than one test (liek
testing test patterns).

Thanx again,

Tels

- -- 
 perl -MDev::Bollocks -e'print Dev::Bollocks->rand(),"\n"'
 widespreadedly optimize bleeding-edge systems

 http://bloodgate.com/perl       My current Perl projects
 PGP key available on http://bloodgate.com/tels.asc or via email

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)

iQEVAwUBPJjs4ncLPEOTuEwVAQEVngf+JUKwip5GWYfVRO7ohRX5uFnfle9pViCS
hzFGd5fExCcgaruk3HII5KnnexTbTHMFQTdt1aiM+Qs0YQKa4xNhqFbGTAHooxci
yBU1R3dMywtrUAfhrmswBDHergx2wnzp7LsS1fvF4GDNFegWrQVdE4tQSN9jiW3u
x9CAnaqzoy7tmf5q0xIxSutWAaKM5B29iReMx74we16mHLfcbZTtsyPMUUYvrE6R
VRHHEV7RlttFlJwQGRqPh4ymMUrOt6VeX+LLs+QZqy5xBlz9VW1U95hAnSu+wdXN
m0XH8x+cuOOYxkWT7XiT/STS9GktYlnUDAc5dwj/r8hL8v9MnOU4cg==
=nQkG
-----END PGP SIGNATURE-----

RE: Math::Big* updates

Reply via email to