Hi -

> As an example, if/when any of these tests fail, it doesn't seem as if it
> would be trivial (or even straightforward) to determine the *meaning* of
> that failure without an inordinate amount of time spent in the debugger (or
> much worse IMO, spent staring at the test data) to understand just what
> exactly failed.  

No, that's no so. I know this because some of the tests filed "quite nicely" 
during development (I don't know about Patrick's experience, however). The full 
input coverage creates quite a nice output. I number all the objects per class 
with a static variable; so if you run a single test (that failed in a 
regression run), you get a failure like

    Expected: <1, 3, 4, 5, 7>
    But was: <1, 4, 5>

So, you knwo that the condition did not work as expected on object 3. But if 
you now look at the "Setters" definition, you see e.g. that it is

    Setters<TK, TBO1_I>(SetK1, SetBO1_I1)

you know that there are 3 * 4 = 1 records in the primary table ("MyBO"), which 
will have the values

    K1        BO1.I1
    ----------------
    0 (null)  null
    0 (null)      null
    0 (null)      0
    0 (null)      1
    0 (zero)  null
    0 (zero)      null
    0 (zero)      0
    0 (zero)      1
etc.

So you know that the first difference in the 3rd line, which in this case is 
where K1 is 0 and BO1.I1 is also 0.
After a few failing tests, you can almost read the "failure data" off the test 
output.

HOWEVER: My introduction of ValueNull for all possible values creates multiple 
absolutely identical object graphs. The reason is that - as you can see above - 
for a non-nullable int property (like K1), both TK.ValueNull and TK.Zero 
produce the same value. Reason: I cannot assign null to K1, so I assign 0. A 
better implementation would throw an exception when trying to assign null to 
K1; and by this, tell the data generator that this value does not make sense 
--> then, these objects would not be generated at all. I'll try to change this 
shortly - and check how much smaller the number of objects then gets.

> Some of these tests seem to lack clear assertions.  

All of them have the assertion that the Linq2Objects result is equal to the 
NHib.Linq result, except for exceptional objects. That's just the - current - 
definition of the semantics of NHib.Linq, except for special cases which have 
to be handwritten anyway.

> Presumably
> these are testing for no-exceptions-thrown(?) but this kind of test failure
> isn't going to result in clarity-of-reason-for-failure. 

Yes, that would be too little to get out of tests. I avoid such trivial tests 
as much as possible (if one has slipped through, it can only be a hand-written 
one I forgot to upgrade to a useful one).

> It could be that "this is in fact the best way to go about this" but
> intuitively it just doesn't feel 'right' to me for these tests to survive 
> as-is
> in the project.

I agree - that would be useless.

> Generative tests seem to have their purpose in re: their being a tool to
> 'prove out' via probing to find the proper behavior of the codebase (all
> edge cases, etc) but I'm honestly ambivalent about their long-term maintenance
> prospects.  

The problem here seems to be to understand the right *space* to be tested: It 
is *not* the states of objects, but the possible variations of conditions. So 
each of these tests is, like all other unit tests, hand-crafted to a special 
condition construct (e.g. doubly nested plus in the most recent tests; 
something Patrick wanted because he found an error there). The assertion to be 
fulfilled by the NHib.Linq machine is then that the SQL is equivalent (in a 
defined sense) to some behavior. The only alternative I can see is to write a 
prover that shows that the resulting SQL is equivalent in that sense to the 
Linq query - something no-one would try; and would accept as a "test", IMHO.

> In a sense, this feels like the results of PEX where you look
> at the generated test suite and say "this was a valuable exercise, but
> there's little value in hanging on to it long-term".

Well, did you try to "failurize" (or whatever you'd call that - "change the 
code so that it fails in a way you designed") any test, e.g. by modifiyng the 
condition is a semantically interesting way - e.g. adding existence guards or 
the like) and then scrutinize the result? I cannot believe that you then would 
have this "feeling" ... but I may be wrong if you did it! Then, please, tell us 
about it!

> 
> As an example, I'd wonder what would happen in the following scenario: we
> decide to change the semantic behavior of some part of the LINQ provider
> (and thus, presumably, a number of these tests would then be broken).  Would
> the next step be modifying all the now-broken tests + dependent test data
> to get them to pass or would the next step be to discard all the existing
> tests and regenerate them all again?

As I did just that weekend (look into the changeset!), you look at each failing 
test; decide why it fails - in these cases, the semantics definition of 
NHib.Linq is different from Linq2Objects in cases that we (at least Patrick and 
I) habve not discussed before; and then either change the implementation (if 
the test is right) or change the test (if you want the implementation as is; 
and have a good explanation of why the test is wrong - this is why I sent all 
of you the modified tests with the comments that explain the complexity of the 
change).

So, as far as I see it, you do exactly what you do with other tests.

Why should you discard all these well-understood tests??

> 
> I honestly don't have a well-formed opinion on this, but it sort of seems
> to me as if the answer to this Q would lead us to a better-informed choice
> about the long-term viability of this generated test suite.
> 
> Does this make any sense --?

Do my answers make any? (Patrick, may I ask what you say ...?)

Regards
Harald M.


-- 
NEU: FreePhone - kostenlos mobil telefonieren und surfen!                       
Jetzt informieren: http://www.gmx.net/de/go/freephone

Reply via email to