Interesting.  I guess one of my concerns with such generated test-cases is that 
they (perhaps) achieve 100% coverage but I fear that the pursuit of that is 
made at the expense of readability/usability.

As an example, if/when any of these tests fail, it doesn't seem as if it would 
be trivial (or even straightforward) to determine the *meaning* of that failure 
without an inordinate amount of time spent in the debugger (or much worse IMO, 
spent staring at the test data) to understand just what exactly failed.  Some 
of these tests seem to lack clear assertions.  Presumably these are testing for 
no-exceptions-thrown(?) but this kind of test failure isn't going to result in 
clarity-of-reason-for-failure. 

Fabio is expressing a very real and practical concern re: the time it takes to 
run these tests, but I'm more concerned about their opacity/usefulness in re: 
long-term maintenance of the codebase.

It could be that "this is in fact the best way to go about this" but 
intuitively it just doesn't feel 'right' to me for these tests to survive as-is 
in the project.

Generative tests seem to have their purpose in re: their being a tool to 'prove 
out' via probing to find the proper behavior of the codebase (all edge cases, 
etc) but I'm honestly ambivalent about their long-term maintenance prospects.  
In a sense, this feels like the results of PEX where you look at the generated 
test suite and say "this was a valuable exercise, but there's little value in 
hanging on to it long-term".

As an example, I'd wonder what would happen in the following scenario: we 
decide to change the semantic behavior of some part of the LINQ provider (and 
thus, presumably, a number of these tests would then be broken).  Would the 
next step be modifying all the now-broken tests + dependent test data to get 
them to pass or would the next step be to discard all the existing tests and 
regenerate them all again?

I honestly don't have a well-formed opinion on this, but it sort of seems to me 
as if the answer to this Q would lead us to a better-informed choice about the 
long-term viability of this generated test suite.

Does this make any sense --?

-Steve B.  
-----Original Message-----
From: "Harald Mueller" <[email protected]>
Sender: [email protected]
Date: Tue, 26 Apr 2011 08:39:35 
To: <[email protected]>
Reply-To: [email protected]
Subject: Re: [nhibernate-development] Re: My feeling of our Tests

As the culprit who wrote those full-coverage tests for NH-2583, I probably have 
to stand up to this.

First, I confess I never thought that those full-coverage tests would create 
that many objects. Of course, we all know that full-coverage testing has this 
effect/problem of exponential explosion. Therefore, I limited the property 
values to zero and one and the - for NH-2583 necessary - null values. Adding a 
few more nulls yesterday seems to have the object number driven up above 
Fabio's threshold ...

I actually chose full-coverage testing because it is obviously *simpler* than 
selective testing - simply because you need no additional oracle that helps you 
select the data will be best in detecting errors: Which in itself can be a 
source of bugs. This simplicity pays off in four ways:

(1) The testing code was much easier to write ("just a generator").
(2) It tests so much ... maybe you believe me if I say how much better I sleep 
when I do full-coverage tests.
(3) You guys (and girls - or this is included in "you guys"?) are easier to 
convince that something is wrong in existing code. Full coverage "hits 
broad-side" *if* there is a problem at all (but it does still not find problems 
it's not designed to detect: ?? and ?: work incorrectly in my implementation, 
and only Patrick's *thinking about this* could find this).
(4) You and I are also easier to convince that a modification does correct a 
problem "once and for all".

And with the || problems having gone unnoticed for - if I see it correctly - 
quite some time, I had, together with their complexity in semantics and design, 
the feeling that this is the right point to use the "sledge-hammer method," for 
once. Processor time is cheaper than my thinking time ;-)

There are, as I see it, 2 ways to go:

(a) Pull back from full-coverage testing. This introduces this awful "Which 
data do I select based on *risk*?" problem.
- Please let's *not* run a fixed random subset [i.e., use a fixed seed to the 
data selection random generator]).
- And please also *not* a *different* random subset for every test run 
(reproduction of problems is awful). 

(b) Take smaller subsets *algorithmically* that still are equivalent to 
full-(input-)coverage tests. This can span the whole range from a doctoral 
dissertation in test case reduction to some simple heuristic ... but I'm at a 
loss to see any "obvious" heuristic.

So much for a first head-scratching ...

Harald M.


-------- Original-Nachricht --------
> Datum: Tue, 26 Apr 2011 01:01:49 +0000
> Von: "Stephen Bohlen" <[email protected]>
> An: [email protected]
> Betreff: Re: [nhibernate-development] Re: My feeling of our Tests

> Is it unrealistic of me to expect that it should be possible to validate
> the behavior of the LINQ provider with perhaps 2-3 rows of data for each
> narrowly-targeted testcase rather than requiring such massive amounts of test
> data?
> 
> And even if more are needed, its hard for me to believe that 5-10 rows per
> test scenario (rather than the *thousands* mentioned here) wouldn't be
> sufficient for all but the most complex test scenarios. 
> 
> Is this an unrealistic expectation (and if so, can someone help me
> understand why this is a gross over-simplification of what's really needed)?
> 
> I may be misunderstanding this but it almost sounds like we're building a
> perf-test suite for the LINQ provider rather than validation for it's
> correctness ;)
> 
> Am I just being obtuse here (entirely possible <g>) --?
> 
> -Steve B.
> 
> -----Original Message-----
> From: Fabio Maulo <[email protected]>
> Sender: [email protected]
> Date: Mon, 25 Apr 2011 19:58:26 
> To: <[email protected]>
> Reply-To: [email protected]
> Subject: [nhibernate-development] Re: My feeling of our Tests
> 
> If you are experimenting the sensation that our tests are again more slow
> than before is just because, for NH2583, we have some tests method (note *
> test-method*) storing *4608* entities (yes! that is not a mistake they are
> really 4608).
> some others "just" *1008*.
> 
> I have reduced the time to run those test from +2 minutes to less than 1
> minute in my machine...
> we done the possible and the impossible, for miracles I don't have more
> time
> to spend there.
> If you have a little bit of time, try to reduce the amount of entities
> needed to run a test to check the LINQ behavior.
> 
> Thanks.
> 
> On Mon, Apr 25, 2011 at 6:20 PM, Fabio Maulo <[email protected]> wrote:
> 
> > There are 2 areas where I hope I will never see a broken test (until
> > yesterday was one):
> > The first area was NHibernate.Test.Legacy but now it is the second on
> the
> > ranking.
> > The first one is now NHibernate.Test.NHSpecificTest.NH2583
> >
> > God save the Queen!!
> >
> > --
> > Fabio Maulo
> >
> >
> 
> 
> -- 
> Fabio Maulo
> 

-- 
GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit 
gratis Handy-Flat! http://portal.gmx.net/de/go/dsl

Reply via email to