Interesting. I guess one of my concerns with such generated test-cases is that they (perhaps) achieve 100% coverage but I fear that the pursuit of that is made at the expense of readability/usability.
As an example, if/when any of these tests fail, it doesn't seem as if it would be trivial (or even straightforward) to determine the *meaning* of that failure without an inordinate amount of time spent in the debugger (or much worse IMO, spent staring at the test data) to understand just what exactly failed. Some of these tests seem to lack clear assertions. Presumably these are testing for no-exceptions-thrown(?) but this kind of test failure isn't going to result in clarity-of-reason-for-failure. Fabio is expressing a very real and practical concern re: the time it takes to run these tests, but I'm more concerned about their opacity/usefulness in re: long-term maintenance of the codebase. It could be that "this is in fact the best way to go about this" but intuitively it just doesn't feel 'right' to me for these tests to survive as-is in the project. Generative tests seem to have their purpose in re: their being a tool to 'prove out' via probing to find the proper behavior of the codebase (all edge cases, etc) but I'm honestly ambivalent about their long-term maintenance prospects. In a sense, this feels like the results of PEX where you look at the generated test suite and say "this was a valuable exercise, but there's little value in hanging on to it long-term". As an example, I'd wonder what would happen in the following scenario: we decide to change the semantic behavior of some part of the LINQ provider (and thus, presumably, a number of these tests would then be broken). Would the next step be modifying all the now-broken tests + dependent test data to get them to pass or would the next step be to discard all the existing tests and regenerate them all again? I honestly don't have a well-formed opinion on this, but it sort of seems to me as if the answer to this Q would lead us to a better-informed choice about the long-term viability of this generated test suite. Does this make any sense --? -Steve B. -----Original Message----- From: "Harald Mueller" <[email protected]> Sender: [email protected] Date: Tue, 26 Apr 2011 08:39:35 To: <[email protected]> Reply-To: [email protected] Subject: Re: [nhibernate-development] Re: My feeling of our Tests As the culprit who wrote those full-coverage tests for NH-2583, I probably have to stand up to this. First, I confess I never thought that those full-coverage tests would create that many objects. Of course, we all know that full-coverage testing has this effect/problem of exponential explosion. Therefore, I limited the property values to zero and one and the - for NH-2583 necessary - null values. Adding a few more nulls yesterday seems to have the object number driven up above Fabio's threshold ... I actually chose full-coverage testing because it is obviously *simpler* than selective testing - simply because you need no additional oracle that helps you select the data will be best in detecting errors: Which in itself can be a source of bugs. This simplicity pays off in four ways: (1) The testing code was much easier to write ("just a generator"). (2) It tests so much ... maybe you believe me if I say how much better I sleep when I do full-coverage tests. (3) You guys (and girls - or this is included in "you guys"?) are easier to convince that something is wrong in existing code. Full coverage "hits broad-side" *if* there is a problem at all (but it does still not find problems it's not designed to detect: ?? and ?: work incorrectly in my implementation, and only Patrick's *thinking about this* could find this). (4) You and I are also easier to convince that a modification does correct a problem "once and for all". And with the || problems having gone unnoticed for - if I see it correctly - quite some time, I had, together with their complexity in semantics and design, the feeling that this is the right point to use the "sledge-hammer method," for once. Processor time is cheaper than my thinking time ;-) There are, as I see it, 2 ways to go: (a) Pull back from full-coverage testing. This introduces this awful "Which data do I select based on *risk*?" problem. - Please let's *not* run a fixed random subset [i.e., use a fixed seed to the data selection random generator]). - And please also *not* a *different* random subset for every test run (reproduction of problems is awful). (b) Take smaller subsets *algorithmically* that still are equivalent to full-(input-)coverage tests. This can span the whole range from a doctoral dissertation in test case reduction to some simple heuristic ... but I'm at a loss to see any "obvious" heuristic. So much for a first head-scratching ... Harald M. -------- Original-Nachricht -------- > Datum: Tue, 26 Apr 2011 01:01:49 +0000 > Von: "Stephen Bohlen" <[email protected]> > An: [email protected] > Betreff: Re: [nhibernate-development] Re: My feeling of our Tests > Is it unrealistic of me to expect that it should be possible to validate > the behavior of the LINQ provider with perhaps 2-3 rows of data for each > narrowly-targeted testcase rather than requiring such massive amounts of test > data? > > And even if more are needed, its hard for me to believe that 5-10 rows per > test scenario (rather than the *thousands* mentioned here) wouldn't be > sufficient for all but the most complex test scenarios. > > Is this an unrealistic expectation (and if so, can someone help me > understand why this is a gross over-simplification of what's really needed)? > > I may be misunderstanding this but it almost sounds like we're building a > perf-test suite for the LINQ provider rather than validation for it's > correctness ;) > > Am I just being obtuse here (entirely possible <g>) --? > > -Steve B. > > -----Original Message----- > From: Fabio Maulo <[email protected]> > Sender: [email protected] > Date: Mon, 25 Apr 2011 19:58:26 > To: <[email protected]> > Reply-To: [email protected] > Subject: [nhibernate-development] Re: My feeling of our Tests > > If you are experimenting the sensation that our tests are again more slow > than before is just because, for NH2583, we have some tests method (note * > test-method*) storing *4608* entities (yes! that is not a mistake they are > really 4608). > some others "just" *1008*. > > I have reduced the time to run those test from +2 minutes to less than 1 > minute in my machine... > we done the possible and the impossible, for miracles I don't have more > time > to spend there. > If you have a little bit of time, try to reduce the amount of entities > needed to run a test to check the LINQ behavior. > > Thanks. > > On Mon, Apr 25, 2011 at 6:20 PM, Fabio Maulo <[email protected]> wrote: > > > There are 2 areas where I hope I will never see a broken test (until > > yesterday was one): > > The first area was NHibernate.Test.Legacy but now it is the second on > the > > ranking. > > The first one is now NHibernate.Test.NHSpecificTest.NH2583 > > > > God save the Queen!! > > > > -- > > Fabio Maulo > > > > > > > -- > Fabio Maulo > -- GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit gratis Handy-Flat! http://portal.gmx.net/de/go/dsl
