Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy

Thomas Mortagne Mon, 03 Sep 2018 01:32:36 -0700

Sounds good.

On Mon, Sep 3, 2018 at 9:55 AM, Vincent Massol <vinc...@massol.net> wrote:
>
>> On 3 Sep 2018, at 09:55, Vincent Massol <vinc...@massol.net> wrote:
>>
>> I propose to do this tomorrow Tuesday, starting with an intro from me, using 
>> youtube live.
>
> Say, 10AM Paris time.
>
> Thanks
> -Vincent
>
>> WDYT?
>>
>> Thanks
>> -Vincent
>>
>>> On 30 Aug 2018, at 12:27, Adel Atallah <adel.atal...@xwiki.com> wrote:
>>>
>>> Just to be clear, when I proposed "having a whole day dedicated on
>>> using these tools", I didn't meant having to have it every week but
>>> only once, so we can properly start improving the tests. It would be
>>> some kind of training.
>>> On my side I don't think I'll be able to have on a week one day
>>> dedicated to tests and one for bug fixing, I won't have time left for
>>> the roadmap as I will only work on the product 50% of the time.
>>>
>>>
>>> On Thu, Aug 30, 2018 at 12:18 PM, Vincent Massol <vinc...@massol.net> wrote:
>>>> Hi,
>>>>
>>>> I don’t remember discussing this with you Thomas. Actually I’m not 
>>>> convinced to have a fixed day:
>>>> * we already have a fixed BFD and having a second one doesn’t leave much 
>>>> flexibility for working on roadmap items when it’s the best
>>>> * test sessions can be short (0.5-1 hours) and it’s easy to do them 
>>>> between other tasks
>>>> * it can be boring to spend a full day on them
>>>>
>>>> Now, I agree that not having a fixed day will make it hard to make sure 
>>>> that we work 20% on that topic.
>>>>
>>>> So if you prefer we can define a day, knowing that some won’t be able to 
>>>> always attend during that day and in this case they should do it on 
>>>> another day. What’s important is to have 20% done each week (i.e. enough 
>>>> work done on it).
>>>>
>>>> In term of day, if we have to choose one, I’d say Tuesday. That’s the most 
>>>> logical to me.
>>>>
>>>> WDYT? What do you prefer?
>>>>
>>>> Thanks
>>>> -Vincent
>>>>
>>>>> On 30 Aug 2018, at 10:38, Thomas Mortagne <thomas.morta...@xwiki.com> 
>>>>> wrote:
>>>>>
>>>>> Indeed we discussed this but I don't see it in your mail Vincent.
>>>>>
>>>>> On Thu, Aug 30, 2018 at 10:33 AM, Adel Atallah <adel.atal...@xwiki.com> 
>>>>> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> Maybe we should agree on having a whole day dedicated on using these
>>>>>> tools with a maximum number of developers.
>>>>>> That way we will be able to help each other and maybe it will make the
>>>>>> process easier to carry out in the future.
>>>>>>
>>>>>> WDYT?
>>>>>>
>>>>>> Thanks,
>>>>>> Adel
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 29, 2018 at 11:20 AM, Vincent Massol <vinc...@massol.net> 
>>>>>> wrote:
>>>>>>> Hi devs (and anyone else interested to improve the tests of XWiki),
>>>>>>>
>>>>>>> History
>>>>>>> ======
>>>>>>>
>>>>>>> It all started when I analyzed our global TPC and found that it was 
>>>>>>> going down globally even though we have the 
>>>>>>> fail-build-on-jacoco-threshold strategy.
>>>>>>>
>>>>>>> I sent several email threads:
>>>>>>>
>>>>>>> - Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6
>>>>>>> - TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn
>>>>>>> - Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7
>>>>>>>
>>>>>>> Note: As a consequence of this last thread, I implemented a Jenkins 
>>>>>>> Pipeline to send us a mail when the global TPC of an XWiki module goes 
>>>>>>> down so that we fix it ASAP. This is still a development in progress. A 
>>>>>>> first version is done and running at 
>>>>>>> https://ci.xwiki.org/view/Tools/job/Clover/ but I need to debug it and 
>>>>>>> fix it (it’s not working ATM).
>>>>>>>
>>>>>>> As a result of the global TPC going down/stagnating, I have proposed to 
>>>>>>> have 10.7 focused on Tests + BFD.
>>>>>>> - Initially I proposed to focus on increasing the global TPC by looking 
>>>>>>> at the reports from 1) above 
>>>>>>> (http://markmail.org/message/qjemnip7hjva2rjd). See the last report at 
>>>>>>> https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw (we need to fix the 
>>>>>>> red parts).
>>>>>>> - Then with the STAMP mid-term review, a bigger urgency surfaced and I 
>>>>>>> asked if we could instead focus on fixing tests as reported by 
>>>>>>> Descartes to increase both coverage and mutation score (ie test 
>>>>>>> quality), since those are 2 metrics/KPIs measured by STAMP and since 
>>>>>>> XWiki participates to STAMP we need to work on them and increase them 
>>>>>>> substantially. See http://markmail.org/message/ejmdkf3hx7drkj52
>>>>>>>
>>>>>>> The results of XWiki 10.7 has been quite poor on test improvements  
>>>>>>> (more focus on BFD than tests, lots of devs on holidays, etc). This 
>>>>>>> forces us to have a different strategy.
>>>>>>>
>>>>>>> Full Strategy proposal
>>>>>>> =================
>>>>>>>
>>>>>>> 1) As many XWiki SAS devs as possible (and anyone else from the 
>>>>>>> community who’s interested ofc! :)) should spend 1 day per week working 
>>>>>>> on improving STAMP metrics
>>>>>>> * Currently the agreement is that Thomas and myself will do this for 
>>>>>>> the foreseeable future till we get some good-enough metric progress
>>>>>>> * Some other devs from XWiki SAS will help out for XWiki 10.8 only FTM 
>>>>>>> (Marius, Adel if he can, Simon in the future). The idea is to see where 
>>>>>>> that could get us by using substantial manpower.
>>>>>>>
>>>>>>> 2) All committers: More generally the global TPC failure is also 
>>>>>>> already active and dev need to modify modules that see their global TPC 
>>>>>>> go down.
>>>>>>>
>>>>>>> 3) All committers: Of course, the jacoco strategy is also active at 
>>>>>>> each module level.
>>>>>>>
>>>>>>> STAMP tools
>>>>>>> ==========
>>>>>>>
>>>>>>> There are 4 tools developed by STAMP:
>>>>>>> * Descartes: Improves quality of tests by increasing their mutation 
>>>>>>> scores. See http://markmail.org/message/bonb5f7f37omnnog and also 
>>>>>>> https://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes
>>>>>>> * DSpot: Automatically generate new tests, based on existing tests. See 
>>>>>>> https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot
>>>>>>> * CAMP: Takes a Dockerfile and generates mutations of it, then deploys 
>>>>>>> and execute tests on the software to see if the mutation works or not. 
>>>>>>> Note this is currently not fitting the need of XWiki and thus I’ve been 
>>>>>>> developing another tool as an experiment (which may go back in CAMP one 
>>>>>>> day), based on TestContainers, see 
>>>>>>> https://massol.myxwiki.org/xwiki/bin/view/Blog/EnvironmentTestingExperimentations
>>>>>>> * EvoCrash: Takes a stack trace from production logs and generates a 
>>>>>>> test that, when executed, reproduces the crash. See 
>>>>>>> https://markmail.org/message/v74g3tsmflquqwra. See also 
>>>>>>> https://github.com/SERG-Delft/EvoCrash
>>>>>>>
>>>>>>> Since XWiki is part of the STAMP research project, we need to use those 
>>>>>>> 4 tools to increase the KPIs associated with the tools. See below.
>>>>>>>
>>>>>>> Objectives/KPIs/Metrics for STAMP
>>>>>>> ===========================
>>>>>>>
>>>>>>> The STAMP project has defined 9 KPIs that all partners (and thus XWiki) 
>>>>>>> need to work on:
>>>>>>>
>>>>>>> 1) K01: Increase test coverage
>>>>>>> * Global increase by reducing by 40% the non-covered code. For XWiki 
>>>>>>> since we’re at about 70%, this means reaching about 80% before the end 
>>>>>>> of STAMP (ie. before end of 2019)
>>>>>>> * Increase the coverage contributions of each tool developed by STAMP.
>>>>>>>
>>>>>>> Strategy:
>>>>>>> * Primary goal:
>>>>>>> ** Increase coverage by executing Descartes and improving our tests. 
>>>>>>> This is http://markmail.org/message/ejmdkf3hx7drkj52
>>>>>>> ** Don’t do anything with DSpot. I’ll do that part. Note that the goal 
>>>>>>> is to write a Jenkins pipeline to automatically execute DSpot from time 
>>>>>>> to time and commit the generated tests in a separate test source and 
>>>>>>> have our build execute both src/test/java and this new test source.
>>>>>>> ** Don’t do anything with TestContainers FTM since I need to finish a 
>>>>>>> first working version. I may need help in the future to implement 
>>>>>>> docker images for more configurations (on Oracle, in a cluster, with 
>>>>>>> LibreOffice, with an external SOLR server, etc).
>>>>>>> ** For EvoCrash: We’ll count contributions of EvoCrash to coverage in 
>>>>>>> K08.
>>>>>>> * Secondary goal:
>>>>>>> ** Increase our global TPC as mentioned above by fixing the modules in 
>>>>>>> red.
>>>>>>>
>>>>>>> 2) K02: Reduce flaky tests.
>>>>>>> * Objective: reduce the number of flaky tests by 20%
>>>>>>>
>>>>>>> Strategy:
>>>>>>> * Record flaky tests in jira
>>>>>>> * Fix the max number of them
>>>>>>>
>>>>>>> 3) K03: Better test quality
>>>>>>> * Objective: increase mutation score by 20%
>>>>>>>
>>>>>>> Strategy:
>>>>>>> * Same strategy as K01.
>>>>>>>
>>>>>>> 4) K04: More configuration-related paths tested
>>>>>>> * Objective: increase the code coverage of configuration-related paths 
>>>>>>> in our code by 20% (e.g. DB schema creation, cluster)related code, 
>>>>>>> SOLR-related code, LibreOffice-related code, etc).
>>>>>>>
>>>>>>> Strategy:
>>>>>>> * Leave it to FTM. The idea is to measure Clover TPC with the base 
>>>>>>> configuration, then execute all other configurations (with 
>>>>>>> TestContainers) and regenerate the Clover report to see how much the 
>>>>>>> TPC has increased.
>>>>>>>
>>>>>>> 5) K05: Reduce system-specific bugs
>>>>>>> * Objective: 30% improvement
>>>>>>>
>>>>>>> Strategy:
>>>>>>> * Run TestContainers, execute existing tests and find new bugs related 
>>>>>>> to configurations. Record them
>>>>>>>
>>>>>>> 6) K06: More configurations/Faster tests
>>>>>>> * Objective: increase the number of automatically tested configurations 
>>>>>>> by 50%
>>>>>>>
>>>>>>> Strategy:
>>>>>>> * Increase the # of configurations we test with TestContainers. I’ll do 
>>>>>>> that part initially.
>>>>>>> * Reduce time it takes to deploy the software under a given 
>>>>>>> configuration vs time it used to take when done manually before STAMP. 
>>>>>>> I’ll do this one. I’ve already worked on it in the past year with the 
>>>>>>> dockerization of XWiki.
>>>>>>>
>>>>>>> 7) K07: Pending, nothing to do FTM
>>>>>>>
>>>>>>> 8) K08: More crash replicating test cases
>>>>>>> * Objective: increase the number of crash replicating test cases by at 
>>>>>>> least 70%
>>>>>>>
>>>>>>> Strategy:
>>>>>>> * For all issues that are still open and that have stack traces and for 
>>>>>>> all issues closed but without tests, run EvoCrash on them to try to 
>>>>>>> generate a test.
>>>>>>> * Record and count the number of successful EvoCrash-generated test 
>>>>>>> cases.
>>>>>>> * Derive a regression test (which can be very different from the 
>>>>>>> negative of the test generated by evocrash!).
>>>>>>> * Measure the new coverage increase
>>>>>>> * Note that I haven’t experimented much with this yet myself.
>>>>>>>
>>>>>>> 9) K09: Pending, nothing to do FTM.
>>>>>>>
>>>>>>> Conclusion
>>>>>>> =========
>>>>>>>
>>>>>>> Right now, I need your help for the following KPIs: K01, K02, K03, K08.
>>>>>>>
>>>>>>> Since there’s a lot to understand in this email, I’m open to:
>>>>>>> * Organizing a meeting on youtube live to discuss all this
>>>>>>> * Answering any questions on this thread ofc
>>>>>>> * Also feel free to ask on IRC/Matrix.
>>>>>>>
>>>>>>> Here’s an extract from STAMP which has more details about the 
>>>>>>> KPIs/metrics:
>>>>>>> https://up1.xwikisas.com/#QJyxqspKXSzuWNOHUuAaEA
>>>>>>>
>>>>>>> Thanks
>>>>>>> -Vincent
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thomas Mortagne
>>>>
>>
>




-- 
Thomas Mortagne

Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy

Reply via email to