On Wed, Mar 28, 2018 at 1:25 PM, Vincent Massol <[email protected]> wrote:
> If you’re curious I’ve also added some more information at 
> https://github.com/STAMP-project/pitest-descartes/issues/54
>
> ATM, and even with a time of “only” 10mn to run pitest/descartes on 
> xwiki-commons, it’s still too much IMO so I’ll work on turning it off by 
> default but having a job to run it on the CI.

Sounds good. I agree that we can't enable it all the time given the time lost.

>
> Let me know if you have remarks.
>
> Thanks
> -Vincent
>
>> On 27 Mar 2018, at 20:09, Vincent Massol <[email protected]> wrote:
>>
>>
>>
>>> On 27 Mar 2018, at 19:32, Vincent Massol <[email protected]> wrote:
>>>
>>> FYI I’ve implemented it locally for all modules of xwiki-commons and did 
>>> some build time measurements:
>>>
>>> * With pitest/descartes: 37:16 minutes
>>> * Without pitest/descartes 5:10 minutes
>>
>> Actually I was able to reduce the time to 15:12 minutes with configuring 
>> pitest with 4 threads.
>>
>> Thanks
>> -Vincent
>>
>>>
>>> So that’s a pretty important hit….
>>>
>>> So I think one strategy could be to not run pitest/descartes by default in 
>>> the quality profile (i.e. have it off by default with 
>>> <xwiki.pitest.skip>true</xwiki.pitest.skip>) and run it on the CI, from 
>>> time to time, like once per day for example, or once per week.
>>>
>>> Small issue: I need to find/test a way to run a crontab type of job in a 
>>> Jenkins pipeline script. I know how to do in theory but I need to test it 
>>> and verify it works. I still have some doubts ATM...
>>>
>>> WDYT?
>>>
>>> Thanks
>>> -Vincent
>>>
>>>> On 15 Mar 2018, at 09:30, Vincent Massol <[email protected]> wrote:
>>>>
>>>> Hi devs,
>>>>
>>>> As part of the STAMP research project, we’ve developed a new tool 
>>>> (Descartes, based on Pitest) to measure the quality of tests. It generates 
>>>> a mutation score for your tests, defining how good the tests are. 
>>>> Technical Descartes performs some extreme mutations on the code under test 
>>>> (e.g. remove content of void methods, return true for methods returning a 
>>>> boolean, etc - See https://github.com/STAMP-project/pitest-descartes). If 
>>>> the test continues to pass then it means it’s not killing the mutant and 
>>>> thus its mutation score decreases.
>>>>
>>>> So in short:
>>>> * Jacoco/Clover: measure how much of the code is tested
>>>> * Pitest/Descartes: measure how good the tests are
>>>>
>>>> Both provide a percentage value.
>>>>
>>>> I’m proposing to compute the current mutation scores for xwiki-commons and 
>>>> xwiki-rendering and fail the build when new code is added that reduce the 
>>>> mutation score threshold (exactly the same as our jacoco threshold and 
>>>> strategy).
>>>>
>>>> I consider this is an experiment to push the limit of software engineering 
>>>> a bit further. I don’t know how well it’ll work or not. I propose to do 
>>>> the work and test this for over 2-3 months and see how well it works or 
>>>> not. At that time we can then decide whether it works or not (i.e whether 
>>>> the gains it brings are more important than the problems it causes).
>>>>
>>>> Here’s my +1 to try this out.
>>>>
>>>> Some links:
>>>> * pitest: http://pitest.org/
>>>> * descartes: https://github.com/STAMP-project/pitest-descartes
>>>> * http://massol.myxwiki.org/xwiki/bin/view/Blog/ControllingTestQuality
>>>> * http://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes
>>>>
>>>> If you’re curious, you can see a screenshot of a mutation score report at 
>>>> http://massol.myxwiki.org/xwiki/bin/download/Blog/MutationTestingDescartes/report.png
>>>>
>>>> Please cast your votes.
>>>>
>>>> Thanks
>>>> -Vincent
>>>
>>
>



-- 
Thomas Mortagne

Reply via email to