On Wed, Mar 28, 2018 at 1:25 PM, Vincent Massol <[email protected]> wrote: > If you’re curious I’ve also added some more information at > https://github.com/STAMP-project/pitest-descartes/issues/54 > > ATM, and even with a time of “only” 10mn to run pitest/descartes on > xwiki-commons, it’s still too much IMO so I’ll work on turning it off by > default but having a job to run it on the CI.
Sounds good. I agree that we can't enable it all the time given the time lost. > > Let me know if you have remarks. > > Thanks > -Vincent > >> On 27 Mar 2018, at 20:09, Vincent Massol <[email protected]> wrote: >> >> >> >>> On 27 Mar 2018, at 19:32, Vincent Massol <[email protected]> wrote: >>> >>> FYI I’ve implemented it locally for all modules of xwiki-commons and did >>> some build time measurements: >>> >>> * With pitest/descartes: 37:16 minutes >>> * Without pitest/descartes 5:10 minutes >> >> Actually I was able to reduce the time to 15:12 minutes with configuring >> pitest with 4 threads. >> >> Thanks >> -Vincent >> >>> >>> So that’s a pretty important hit…. >>> >>> So I think one strategy could be to not run pitest/descartes by default in >>> the quality profile (i.e. have it off by default with >>> <xwiki.pitest.skip>true</xwiki.pitest.skip>) and run it on the CI, from >>> time to time, like once per day for example, or once per week. >>> >>> Small issue: I need to find/test a way to run a crontab type of job in a >>> Jenkins pipeline script. I know how to do in theory but I need to test it >>> and verify it works. I still have some doubts ATM... >>> >>> WDYT? >>> >>> Thanks >>> -Vincent >>> >>>> On 15 Mar 2018, at 09:30, Vincent Massol <[email protected]> wrote: >>>> >>>> Hi devs, >>>> >>>> As part of the STAMP research project, we’ve developed a new tool >>>> (Descartes, based on Pitest) to measure the quality of tests. It generates >>>> a mutation score for your tests, defining how good the tests are. >>>> Technical Descartes performs some extreme mutations on the code under test >>>> (e.g. remove content of void methods, return true for methods returning a >>>> boolean, etc - See https://github.com/STAMP-project/pitest-descartes). If >>>> the test continues to pass then it means it’s not killing the mutant and >>>> thus its mutation score decreases. >>>> >>>> So in short: >>>> * Jacoco/Clover: measure how much of the code is tested >>>> * Pitest/Descartes: measure how good the tests are >>>> >>>> Both provide a percentage value. >>>> >>>> I’m proposing to compute the current mutation scores for xwiki-commons and >>>> xwiki-rendering and fail the build when new code is added that reduce the >>>> mutation score threshold (exactly the same as our jacoco threshold and >>>> strategy). >>>> >>>> I consider this is an experiment to push the limit of software engineering >>>> a bit further. I don’t know how well it’ll work or not. I propose to do >>>> the work and test this for over 2-3 months and see how well it works or >>>> not. At that time we can then decide whether it works or not (i.e whether >>>> the gains it brings are more important than the problems it causes). >>>> >>>> Here’s my +1 to try this out. >>>> >>>> Some links: >>>> * pitest: http://pitest.org/ >>>> * descartes: https://github.com/STAMP-project/pitest-descartes >>>> * http://massol.myxwiki.org/xwiki/bin/view/Blog/ControllingTestQuality >>>> * http://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes >>>> >>>> If you’re curious, you can see a screenshot of a mutation score report at >>>> http://massol.myxwiki.org/xwiki/bin/download/Blog/MutationTestingDescartes/report.png >>>> >>>> Please cast your votes. >>>> >>>> Thanks >>>> -Vincent >>> >> > -- Thomas Mortagne

