First of all, while I use TextTest, I'm fortunate to be surrounded
by TextTest experts such as Goeff and Johan here at Carmen, so I'm
not a TextTest expert by any measure. I probably use it in an non-
optimal way. For really good answers, I suggest using the mailing
list at sourceforge:
http://lists.sourceforge.net/lists/listinfo/texttest-users
For those who don't know, TextTest is a testing framework which
is based on executing a program under test, and storing all
output from such runs (stdout, sterr, various files written
etc). On each test run, the output is compared to output which is
known to be correct. When there are differences, you can inspect
these with a normal diff tool such as tkdiff. In other words,
there are no test scripts that perform actual and expected values,
but there might be scripts for setting up, extracting results and
cleaning up.
Grig Gheorghiu wrote:
> I've been writing TextTest tests lately for an application that will be
> presented at a PyCon tutorial on "Agile development and testing". I
> have to say that if your application does a lot of logging, then the
> TextTest tests become very fragile in the presence of changes. So I had
> to come up with this process in order for the tests to be of any use at
> all:
The TextTest approach is different from typical unittest work,
and it takes some time to get used to. Having worked with it
for almost a year, I'm fairly happy with it, and feel that it
would mean much more work do achieve the same kind of confidence
in the software we write if would use e.g. the unittest module.
YMMV.
> 1) add new code in, with no logging calls
> 2) run texttest and see if anything got broken
> 3) if nothing got broken, add logging calls for new code and
> re-generate texttest golden images
I suppose its usefulness depends partly on what your application
looks like. We've traditionally built applications that produce
a lot of text files which have to look in a certain way. There are
several things to think about though:
We use logging a lot and we have a clear strategy for what log
levels to use in different cases. While debugging we can set a
high log level, but during ordinary tests, we have a fairly low
log level. What log messages gets emitted is controlled by an
environment variable here, but you could also filter out e.g.
all INFO and DEBUG messages from your test comparisions.
We're not really interested in exactly what the program does when
we test it. We want to make sure that it produces the right results.
As you suggested, we'll get lots of false negatives if we log too
much details during test runs.
We filter out stuff like timstamps, machine names etc. There are
also features that can be used to verify that texts occur even if
they might appear in different orders etc (I haven't meddled with
that myself).
We don't just compare program progress logs, but make sure that
the programs produce appropriate results files. I personally don't
work with any GUI applications, being confined in the database
dungeons here at Carmen, but I guess that if you did, you'd need
to instrument your GUI so that it can dump the contents of e.g.
list views to files, if you want to verify the display contents.
Different output files are assigned different severities. If
there are no changes, you get a green/green result in the test
list. If a high severity file (say a file showing produced
results) differ, you get a red/red result in the test list.
If a low severity file (e.g. program progress log or stderr)
changes, you get a green/red result in the test list.
You can obviously use TextTest for regression testing legacy
software, where you can't influence the amount of logs spewed
out, but in those cases, it's probable that the appearence of
the logs will be stable over time. Otherwise you have to fiddle
with filters etc.
In a case like you describe above, I could imagine the following
situation. You're changing some code, and that change will add
some new feature, and as a side effect change the process (but
not the results) for some existing test. You prepare a new test,
but want to make sure that existing tests don't break.
For your new test, you have no previous results, so all is red,
and you have to carefully inspect the files that describe the
generated results. If you're a strict test-first guy, you've
written a results file already and with some luck it's green,
or just have some cosmetic differences that you can accept. You
look at the other produced files as well and see that there
are no unexpected error messages etc, but you might accept the
program progress log without caring too much over every line.
For your existing tests, you might have some green/red cases,
where low severity files changed. If there are red/red test
cases among the existing tests, you are in trouble. Your new
code has messed up the results of other runs. Well, this
might bee expected. It's all text files, and if it's hundreds
of tests, you can write a little scri