And as I don't know the economics of NHibernate development, I will not
argue about the bearable length of test suite runs.
I'm not sure what's acceptable either. James Shore suggested 10 minutes as
a maximum (http://jamesshore.com/Agile-Book/ten_minute_build.html) with a
preference towards 5 minutes, and I agree with that.
I personally think around 5 minutes is the point at which development will
start to get painful. If we end up having to wait 15 minutes for a
successful build (or hours if we have to wait in suspense for the nightly
run), then I suspect we'll run into much larger problems (e.g., people not
running the build before check-in, broken builds for days, if not weeks,
...)
It seems people believe that full-coverage testing is somehow bad.
I didn't say that (I'll sit uncomfortably on the fence instead until I have
to handle a failure in that test fixture myself). However ... I will say
long builds are bad (or more importantly slow feedback). Nothing that
hasn't been said already I guess.
-----Original Message-----
From: Harald Mueller
Sent: Wednesday, April 27, 2011 11:12 AM
To: [email protected]
Subject: Re: [nhibernate-development] Re: My feeling of our Tests
I guess I’m just stating the obvious, but does that mean we agree
(generally) not to make this any worse then?
i.e., no more generative tests, and instead focused small tests.
It seems people believe that full-coverage testing is somehow bad.
Exactly the opposite is true: Full-coverage testing is the point against
which all other test methods have to measure up. The long years of test
research have all tried to find ways to attain the same quality as
full-coverage testing without the (sometimes nearly infinite) costs. But
when full-coverage testing is possible, it is the way to go.
Of course, agreeing on when it is *possible* to do full-coverage testing is
an economical question. And as I don't know the economics of NHibernate
development, I will not argue about the bearable length of test suite runs.
Still, I'd like to add a few not really connected thoughts - some not so
constructive, some (the last one) hopefully so:
a. As far as I know (but I'd have to look it up), for tools like compilers,
generated tests are a standard; they are e.g. used to create many different
expressions from grammars. For ways to test grammar-based programs, see e.g.
also Boris Beizer's "Black Box Testing" the section on "syntax testing"
(which, however, implicitly assumes manual testing; but aims at wide
coverage).
b. Hibernate and NHibernate have bugs (or at least very wide behavioral
deviations from previous versions - I count NH-2583 and NH-2648/HHH-6151,
which I posted, among them) that could have been easily found by a medium
amount of standard syntax testing even in the first version of that huge HQL
rewrite. For someone dealing with ORMs - or even with SQL; or predicate
calculus -, the first few "complex" tests to be written are obvious: "not"
operators around an "or"; as well as "not" around "or" inside "any" are
standard problems where SQL novices and older ORMs (TopLink comes to mind)
often stumble - and which, therefore, should be used as test cases for an
ORM. I don't know why this wasn't done.
c. If one reads the code dealing with JOINs in HqlSqlWalker and SqlGenerator
alone, it is obvious that this is a *very complicated design*, with at least
one risky design decision (parts of join information are kept redundantly
both in the HQL result AST tree and member variables of AST nodes). Each
change to this code runs the risk of destroying working behavior - I
personally would assume, in an uncontrolled way (but this is only my
impression - I have not talked to the designers of that code).
d. I do not see that even the standard test concepts of white-box testing
relying on coverage to identify "hot spots to test" are enough to guarantee
a standard quality for such a nearly-compiler tool as the N/Hib query part -
see e.g. Brian Marick's texts on using coverage and their impliciations (I
don't know whether the NHib and Hib development crews employ such
techniques ).
e. I would very much expect that correctness is tantamount to performance,
and more so performance of the test suite.
f. That does *not* mean that we shouldn't make tests as fast as possible.
However, using all my practical experience and my theoretical background,
I'm still at a loss to define a suitably clear *and* sufficient class of
"more focused tests" than the ones I provided.
So that at least one candidate is provided:
"Pairwise" testing of the value vectors I provided could be a candidate.
That would mean that at least all combinations that contain two null
assignment would occur in the tests; and as null assignments are the
troublesome events with inner and outer joining, this might be a
sufficiently covering set of tests.
On the other hand, it might create only test cases with *more* than two null
assignments, which might cover up problems that occur only with two
(specific) null assignments.
Could anyone of you - especially the test gurus? - provide any ideas of an
analysis on why this type of test data selection (pairwise) is or is not
sufficient for finding joining errors?
Regards
Harald M.
--
NEU: FreePhone - kostenlos mobil telefonieren und surfen!
Jetzt informieren: http://www.gmx.net/de/go/freephone