-----BEGIN PGP SIGNED MESSAGE----- Moin,
On 20-Mar-02 Green, Paul tried to scribble about: > Tels said: > >> Currently it tells me that I miss a couple of thousand tests (or about >> 70% of all unary and 94% of all binary operation ones) - and I am still >> thinking on how I can add these without driving the testsuite running >> time trough the> roof, as well as tie me up for 12 weeks entering >> testcases. > > I am a big fan of testing numeric functions by randomly generating input ;-) Thanx for your response! I agree with what you said, just some random thoughts: > operands. I don't mean to rule out basic tests, or special-case tests > written to drive the source code into a specific situation; such tests > are well worth the time and effort. But I once had a math package that > had been carefully tested using manually generated inputs, and which > nonetheless had a big gaping hole. We had a customer discover it after > some months in the field. We wrote a new program to randomly generate > operands and perform the calculations two ways, and managed to generate > a test case almost as soon as we fired it up.(*) I have thought about this quite a lot. You see, the current testsuite is by no means complete, and the "hole" in http://bloodgate.com/perl/bigint/errata.html just shows that. (Very similiar to your situation!) Now, test coverage is quite nice (Math::Big has quite high coverage thanx to Devel::Cover ;) - but not enough, since it doesn't guarantie that the right statement is executed (or not!) at the right time. Basically, there are these kind of errors that can occur in Math::Big: * barfs in the functional interface, like some functions don't take all type of inputs they should (scalar vs. object etc, or non-subclass proof stuff) * erorrs on shortcuts, like a test saying that $x * 0 == 0 coming before the test for NaN, would wrongly produce NaN * 0 == 0 vs. NaN. * erros in the actual math code The first are quite easily catched, since there are only a few functions and input types. Tiresome, yes, but doable. The second type of error is more harder. Fortunately, some general rules exist (f.i.i anything that gets a NaN produces a NaN) and some special rules for inf and 0, and with a small testsuite this can be completed quite easily. Also, some of these tests (0*0, 0+0, etc) fall under point three, too. After quite some thinking I had several ideas how to test this, more below. First: There are also two types of tests: some that are in every distribution, and some that are external. For instance, blasting trough 100,000 random tests each time you install Perl and/or Math::BigInt might take too much time. However, just testing it on my system isn't very good either, since the actual math code will work different (and thus produce different errors or no error vs error) on different systems. I currently work on enhancing the testsuite that is bundled with Math::BigInt. My plans are that I make up the new testcases, at least the numerical ones, to be of semi-random patterns, like '100...00', '111...111', '222...222', '123456789012...90' etc. Then I would insert these testcases into the testsuite with a script along with the intended result. One thing that inhibits big testsuites is also the distribution size. A ranom tester that does a few calculations without a table of the results (like that int(A/B)*B + (A%B) == A) would be a win. Maybe I include one and let it do 1000 calculations or so. Distributed over the many people that install Math::BigInt, that would be a fairly high testcase count. (There is a bug in Math::BigInt with and/xor/ior that I discovered while doing all $x & $y for $x,$y = 1...65536, sort of a random test) But having a random-test generator and running it just on a few selected platforms might also be worthwhile (I need 36 hours a day ;) Back to the testing methods. I tried/thought about the following ways to crosscheck the testoutput (But I did not think about your self-checking example, thanx!): * Using bc or dc: Unfortunately, these don't parse most inputs that BigInt/BigFloat can do (1.23e4 etc), so it won't work for all tests :/ * Using Python: Ditto. * Using BigInt itself. This sounds silly, but isn't actually. The reason is that if I use BigInt to create the test and the result, it wouldn't find a bug right-away. (for instance -1 + 1000000001 would result in 0 both in test and praxis). But I would put as usually the same testsuite under Math::BigInt::GMP and Math::BigInt::Pari and the failure would occur there, indicating that something is wrong with BigInt. Also, apart from testing random numbers one needs to test "patterns", e.g. the -1 + 1xxxxxxxx1 test relies on the fact that there are more than $BASE_LEN zeros in the second operand and the first operand is very small (< $BASE). Doing just random numbers it might take too long until a random input justifies these conditions. Ah well ;) > We had the good fortune of having both a hardware and software > implementation that we could test against each other. Clearly, the > situation with Math::Big is a little different. But perhaps you can > generate random inputs to some equations that will massage the numbers > and give you back your input operand; that way you would not need a > second, parallel implementation to test against. Say, A=B*(A/B) for > cases known to produce integral results. That is a good idea, but not strictly necc. as outlined above - I can check against Pari and GMP so at least the purely numerical tests can be crossreferences. But the idea is certainly neat. > (*)FWIW, if you have a copy of Knuth, Volume 2, turn to Algorithm D in > section 4.3.1. This is the algorithm for division of extended-precision > nonnegative integers. We blew step D6. Presciently, Knuth says "The > probability that this step is necessary is very small...test data that > activates this step should therefore be specifically contrived when > debugging." We didn't do that, and paid the price. Had we tested with > random inputs, we'd have found it in the lab. Funnily enough, I not own the book, but read it and remember clearly this part ;-) Unfortunately, Knuth doesn't give specific test examples, and the actual implementation in BigInt wan't done by me and might well differ. Complicated by the fact that Calc uses different bases on different systems, and that might make it necc. to have more than one test (liek testing test patterns). Thanx again, Tels - -- perl -MDev::Bollocks -e'print Dev::Bollocks->rand(),"\n"' widespreadedly optimize bleeding-edge systems http://bloodgate.com/perl My current Perl projects PGP key available on http://bloodgate.com/tels.asc or via email -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) iQEVAwUBPJjs4ncLPEOTuEwVAQEVngf+JUKwip5GWYfVRO7ohRX5uFnfle9pViCS hzFGd5fExCcgaruk3HII5KnnexTbTHMFQTdt1aiM+Qs0YQKa4xNhqFbGTAHooxci yBU1R3dMywtrUAfhrmswBDHergx2wnzp7LsS1fvF4GDNFegWrQVdE4tQSN9jiW3u x9CAnaqzoy7tmf5q0xIxSutWAaKM5B29iReMx74we16mHLfcbZTtsyPMUUYvrE6R VRHHEV7RlttFlJwQGRqPh4ymMUrOt6VeX+LLs+QZqy5xBlz9VW1U95hAnSu+wdXN m0XH8x+cuOOYxkWT7XiT/STS9GktYlnUDAc5dwj/r8hL8v9MnOU4cg== =nQkG -----END PGP SIGNATURE-----
