Hi devs,

As part of the STAMP research project, we’ve developed a new tool (Descartes, 
based on Pitest) to measure the quality of tests. It generates a mutation score 
for your tests, defining how good the tests are. Technical Descartes performs 
some extreme mutations on the code under test (e.g. remove content of void 
methods, return true for methods returning a boolean, etc - See 
https://github.com/STAMP-project/pitest-descartes). If the test continues to 
pass then it means it’s not killing the mutant and thus its mutation score 
decreases.

So in short:
* Jacoco/Clover: measure how much of the code is tested
* Pitest/Descartes: measure how good the tests are

Both provide a percentage value.

I’m proposing to compute the current mutation scores for xwiki-commons and 
xwiki-rendering and fail the build when new code is added that reduce the 
mutation score threshold (exactly the same as our jacoco threshold and 
strategy).

I consider this is an experiment to push the limit of software engineering a 
bit further. I don’t know how well it’ll work or not. I propose to do the work 
and test this for over 2-3 months and see how well it works or not. At that 
time we can then decide whether it works or not (i.e whether the gains it 
brings are more important than the problems it causes).

Here’s my +1 to try this out.

Some links:
* pitest: http://pitest.org/
* descartes: https://github.com/STAMP-project/pitest-descartes
* http://massol.myxwiki.org/xwiki/bin/view/Blog/ControllingTestQuality
* http://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes

If you’re curious, you can see a screenshot of a mutation score report at 
http://massol.myxwiki.org/xwiki/bin/download/Blog/MutationTestingDescartes/report.png

Please cast your votes.

Thanks
-Vincent

Reply via email to