If you’re curious I’ve also added some more information at 
https://github.com/STAMP-project/pitest-descartes/issues/54

ATM, and even with a time of “only” 10mn to run pitest/descartes on 
xwiki-commons, it’s still too much IMO so I’ll work on turning it off by 
default but having a job to run it on the CI.      

Let me know if you have remarks.

Thanks
-Vincent

> On 27 Mar 2018, at 20:09, Vincent Massol <vinc...@massol.net> wrote:
> 
> 
> 
>> On 27 Mar 2018, at 19:32, Vincent Massol <vinc...@massol.net> wrote:
>> 
>> FYI I’ve implemented it locally for all modules of xwiki-commons and did 
>> some build time measurements:
>> 
>> * With pitest/descartes: 37:16 minutes
>> * Without pitest/descartes 5:10 minutes
> 
> Actually I was able to reduce the time to 15:12 minutes with configuring 
> pitest with 4 threads.
> 
> Thanks
> -Vincent
> 
>> 
>> So that’s a pretty important hit….
>> 
>> So I think one strategy could be to not run pitest/descartes by default in 
>> the quality profile (i.e. have it off by default with 
>> <xwiki.pitest.skip>true</xwiki.pitest.skip>) and run it on the CI, from time 
>> to time, like once per day for example, or once per week.
>> 
>> Small issue: I need to find/test a way to run a crontab type of job in a 
>> Jenkins pipeline script. I know how to do in theory but I need to test it 
>> and verify it works. I still have some doubts ATM...
>> 
>> WDYT?
>> 
>> Thanks
>> -Vincent
>> 
>>> On 15 Mar 2018, at 09:30, Vincent Massol <vinc...@massol.net> wrote:
>>> 
>>> Hi devs,
>>> 
>>> As part of the STAMP research project, we’ve developed a new tool 
>>> (Descartes, based on Pitest) to measure the quality of tests. It generates 
>>> a mutation score for your tests, defining how good the tests are. Technical 
>>> Descartes performs some extreme mutations on the code under test (e.g. 
>>> remove content of void methods, return true for methods returning a 
>>> boolean, etc - See https://github.com/STAMP-project/pitest-descartes). If 
>>> the test continues to pass then it means it’s not killing the mutant and 
>>> thus its mutation score decreases.
>>> 
>>> So in short:
>>> * Jacoco/Clover: measure how much of the code is tested
>>> * Pitest/Descartes: measure how good the tests are
>>> 
>>> Both provide a percentage value.
>>> 
>>> I’m proposing to compute the current mutation scores for xwiki-commons and 
>>> xwiki-rendering and fail the build when new code is added that reduce the 
>>> mutation score threshold (exactly the same as our jacoco threshold and 
>>> strategy).
>>> 
>>> I consider this is an experiment to push the limit of software engineering 
>>> a bit further. I don’t know how well it’ll work or not. I propose to do the 
>>> work and test this for over 2-3 months and see how well it works or not. At 
>>> that time we can then decide whether it works or not (i.e whether the gains 
>>> it brings are more important than the problems it causes).
>>> 
>>> Here’s my +1 to try this out.
>>> 
>>> Some links:
>>> * pitest: http://pitest.org/
>>> * descartes: https://github.com/STAMP-project/pitest-descartes
>>> * http://massol.myxwiki.org/xwiki/bin/view/Blog/ControllingTestQuality
>>> * http://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes
>>> 
>>> If you’re curious, you can see a screenshot of a mutation score report at 
>>> http://massol.myxwiki.org/xwiki/bin/download/Blog/MutationTestingDescartes/report.png
>>> 
>>> Please cast your votes.
>>> 
>>> Thanks
>>> -Vincent
>> 
> 

Reply via email to