On Thu, 26 Oct 2006, Iago Toral Quiroga wrote:
- in the common case, test results should be reduced to a single boolean:
all tests passed vs. at least one test failed
many test frameworks provide means to count and report failing tests
(even automake's standard check:-rule), there's little to no merit to
this functionality though.
having/letting more than one test fail and to continue work in an
unrelated area rapidly leads to confusion about which tests are
supposed to work and which aren't, especially in multi-contributor setups.
figuring whether the right test passed, suddenly requires scanning of
the test logs and remembering the last count of tests that may validly
fail. this defeats the purpose using a single quick make check run to
be confident that one's changes didn't introduce breakage.
as a result, the whole test harness should always either succeed or
be immediately fixed.
I understand your point, however I still think that being able to get a
wider report with all the tests failing at a given moment is also
interesting (for example in a buildbot continuous integration loop, like
the one being prepared by the build-brigade). Besides, if there is a
group of people that want to work on fixing bugs at the same time, they
would need to get a list of tests failing, not only the first one.
well, you can get that to some extend automatically, if you invoke
$ make -k check
going beyond that would be a bad trade off i think because:
a) tests are primarily in place to ensure certain functionality is
implemented correctly and continues to work;
b) if things break, tests need to be easy to debug. basically, most of the
time a test failes, you have to engage the debugger, read code/docs,
analyse and fix. tests that need forking or are hard to understand
get into the way of this process, so should be avoided.
c) implementation and test code often has dependencies that won't allow
to test beyond the occourance of an error. a simple example is:
o = object_new();
ASSERT (o != NULL); /* no point to continue beyond this on error */
test_function (o);
a more intimidating case is:
main() {
test_gtk_1();
test_gtk_2();
test_gtk_3();
test_gtk_4();
test_gtk_5();
// ...
}
if any of those test functions (say test_gtk_3) produces a gtk/glib
error/assertion/warning/critical, the remaining test functions (4, 5, ...)
are likely to fail for bogus reasons because the libraries entered
undefined state.
reports of those subsequent errors (which are likely to be very
misleading) is useless at best and confusing (in terms of what error really
matters) at worst.
yes, forking for each of the test functions works around that (provided
they are as independent of one another as in the example above), but again,
this complicates the test implementation (it's not an easy to understand
test program anymore) and debuggability, i.e. affectes the 2 main
properties of a good test program.
to sum this up, reporting multiple fine grained test failures may have some
benefits, mainly those you outlined. but it comes at a certain cost, i.e.
test code complexity and debugging hinderance which are both important
properties of good test programs.
also, consider that make -k check can still get you reports on multiple
test failures, just at a somewhat lower granularity. in fact, it's just low
enough to avoid bogus reports.
so options face-to-face, adding fork mode when you don't have to (i.e. other
than checking g_error implementation) provides questionable benefits at
significant costs.
that's not an optimal trade off for gtk test programs i'd say, and i'd expect
the same to hold for most other projects.
- GLib based test programs should never produce a CRITICAL **: or
WARNING **: message and succeed. the reasoning here is that CRITICALs
and WARNINGs are indicators for an invalid program or library state,
anything can follow from this.
since tests are in place to verify correct implementation/operation, an
invalid program state should never be reached. as a consequence, all tests
should upon initialization make CRITICALs and WARNINGs fatal (as if
--g-fatal-warnings was given).
Maybe you would like to test how the library handles invalid input. For
example, let's say we have a function that accepts a pointer as
parameter, I think it is worth knowing if that function handles safely
the case when that pointer is NULL (if that is a not allowed value for
that parameter) or if it produces a segmentation fault in that case.
no, it really doesn't make sense to test functions outside the defined
value ranges. that's because when implementing, the only thing you need
to actually care about from an API perspective is: the defined value ranges.
besides that, value rtanges may compatibly be *extended* in future