Hi:

Nice discussion; as you know, I'm all for increasing the number and the
quality of our tests.  Here are my comments on some of raised points:

> What I noticed is that we always test our code when coding.  The
> problem is that test is manual.

As Sam mentioned, you can get some automation with my little devscripts.
In the realm of testing, see `invenio-retest-demo-site' that runs the
full unit/regression/web test suite, compares the results against the
last run, and warns you of differences.

Could be extended to run only some tests, like:

   $ invenio-retest-demo-site ./modules/bibknowledge

Or could run only those modules' tests that had some files changed by
the given branch, quite like `invenio-check-branch' does for the
tracking code kwalitee changes.  So running tests could optionally
become part of `invenio-check-branch', which would be especially useful
when you are on Atlantis.

Are you using my little helper tools in this way, and are you interested
in these improvements?  If so, I'll commit some of them.

> For example, we are thinking of requiring the implementation of tests
> for bugs that appear and gets fixed

This is nothing new, it has been `gently suggested' as a `should-have
feature' since ~2006:

   ``This is especially important if a bug had been previously
   found. Then a regression test case should be written to assure that
   it will never reappear.''

   <http://invenio-demo.cern.ch/help/hacking/test-suite#3.1>

If people want to write more tests, then I'm all for it, I think we have
quite a usable ecosystem already... but one has to use it!

> we have 616 successful unit tests out of 616 (!) and 479 successful
> regression tests out of 490(!), so I don't see Invenio that broken in
> that sense

+1.  However, 11 failing regression tests on `master' is still too much.
As I recall we have only 3 regression tests that are `historically
failing' in this TDD style.  (Plus there are a few more failing because
of the First-Day-Of-A-Month problem, plus there may be some LibreOffice
ones on some platforms, etc.)  So we should try to clean `master' better
once again.

> (For BibClassify I instead don't know the reason for the failure)

It's been a recurrent issue, see:

   <http://invenio-software.org/ticket/817>

I recall that when I rerun tests, the results were different.  Have not
looked at that yet, but seems like it may be a simple setup/teardown
issue, or a test case call order issue, or something.

> the decorator approach to marking tests looks very useful.

I fully agree it will be useful to differentiate should-fail tests.
However, we cannot use decorators due to Python-2.4 minimal version.

Until we upgrade Python version, I would propose to simply use `def
FIXME_TDD_foo()' naming technique that we sometimes used in the past via
`def xtest_foo()', together with opening Trac tickets.  History shows
that it may take a lot of time to implement our TDD-meant feature, so
let's differentiate them away in this sense, which would better address
the original problem.  I'll modify Invenio codebase in this respect.

> Make failure very prominent in the build server (currently tests fail
> and the server reports success)

Yes, I agree.  I had mentioned in the past that I'd like to extend the
usage of the red flag on Bitten after we fix all the tests and even the
kwalitee issues, e.g. on 2010-11-30:

   ``If we clear `make kwalitee-check-sql-queries' of false positives,
     then this could be plugged into Bitten reports that would raise a
     red flag and stuff.  Anyone to join in a codefest?''

It seems that this may take a very long time, though, so I'll implement
the above-mentioned FIXME technique and I'll configure the red flag in
Bitten to also appear when a test fails.

This, together with using devscript helpers, should address feelings
that lead to this thread; WDYT?

> We should refuse to commit

That's what we've been usually doing, but sometimes tests fail only when
they are run at a particular hour (ticket:421), sometimes only when not
run repetitively (ticket:817), sometimes as part of independent test
data change (ticket:842), sometimes only on Python-2.4 while we
quick-integrated production hotfixes using Python-2.6 only (ticket:715),
sometimes tests fail only on boxes with low memory (intbitset with VMM),
etc.  So stuff happens.  

(But virtually all of these are then caught after-merge by Bitten
builds.)

Best regards
-- 
Tibor Simko

Reply via email to