If you’re curious I’ve also added some more information at https://github.com/STAMP-project/pitest-descartes/issues/54
ATM, and even with a time of “only” 10mn to run pitest/descartes on xwiki-commons, it’s still too much IMO so I’ll work on turning it off by default but having a job to run it on the CI. Let me know if you have remarks. Thanks -Vincent > On 27 Mar 2018, at 20:09, Vincent Massol <vinc...@massol.net> wrote: > > > >> On 27 Mar 2018, at 19:32, Vincent Massol <vinc...@massol.net> wrote: >> >> FYI I’ve implemented it locally for all modules of xwiki-commons and did >> some build time measurements: >> >> * With pitest/descartes: 37:16 minutes >> * Without pitest/descartes 5:10 minutes > > Actually I was able to reduce the time to 15:12 minutes with configuring > pitest with 4 threads. > > Thanks > -Vincent > >> >> So that’s a pretty important hit…. >> >> So I think one strategy could be to not run pitest/descartes by default in >> the quality profile (i.e. have it off by default with >> <xwiki.pitest.skip>true</xwiki.pitest.skip>) and run it on the CI, from time >> to time, like once per day for example, or once per week. >> >> Small issue: I need to find/test a way to run a crontab type of job in a >> Jenkins pipeline script. I know how to do in theory but I need to test it >> and verify it works. I still have some doubts ATM... >> >> WDYT? >> >> Thanks >> -Vincent >> >>> On 15 Mar 2018, at 09:30, Vincent Massol <vinc...@massol.net> wrote: >>> >>> Hi devs, >>> >>> As part of the STAMP research project, we’ve developed a new tool >>> (Descartes, based on Pitest) to measure the quality of tests. It generates >>> a mutation score for your tests, defining how good the tests are. Technical >>> Descartes performs some extreme mutations on the code under test (e.g. >>> remove content of void methods, return true for methods returning a >>> boolean, etc - See https://github.com/STAMP-project/pitest-descartes). If >>> the test continues to pass then it means it’s not killing the mutant and >>> thus its mutation score decreases. >>> >>> So in short: >>> * Jacoco/Clover: measure how much of the code is tested >>> * Pitest/Descartes: measure how good the tests are >>> >>> Both provide a percentage value. >>> >>> I’m proposing to compute the current mutation scores for xwiki-commons and >>> xwiki-rendering and fail the build when new code is added that reduce the >>> mutation score threshold (exactly the same as our jacoco threshold and >>> strategy). >>> >>> I consider this is an experiment to push the limit of software engineering >>> a bit further. I don’t know how well it’ll work or not. I propose to do the >>> work and test this for over 2-3 months and see how well it works or not. At >>> that time we can then decide whether it works or not (i.e whether the gains >>> it brings are more important than the problems it causes). >>> >>> Here’s my +1 to try this out. >>> >>> Some links: >>> * pitest: http://pitest.org/ >>> * descartes: https://github.com/STAMP-project/pitest-descartes >>> * http://massol.myxwiki.org/xwiki/bin/view/Blog/ControllingTestQuality >>> * http://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes >>> >>> If you’re curious, you can see a screenshot of a mutation score report at >>> http://massol.myxwiki.org/xwiki/bin/download/Blog/MutationTestingDescartes/report.png >>> >>> Please cast your votes. >>> >>> Thanks >>> -Vincent >> >