Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy

Vincent Massol Tue, 16 Oct 2018 01:27:37 -0700

Hi there,

We need some more DSpot results. Would be great if you could help out.


See below for instructions.

> On 29 Aug 2018, at 11:20, Vincent Massol <[email protected]> wrote:
> 
> Hi devs (and anyone else interested to improve the tests of XWiki),
> 
> History
> ======
> 
> It all started when I analyzed our global TPC and found that it was going 
> down globally even though we have the fail-build-on-jacoco-threshold strategy.
> 
> I sent several email threads:
> 
> - Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6
> - TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn
> - Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7
> 
> Note: As a consequence of this last thread, I implemented a Jenkins Pipeline 
> to send us a mail when the global TPC of an XWiki module goes down so that we 
> fix it ASAP. This is still a development in progress. A first version is done 
> and running at https://ci.xwiki.org/view/Tools/job/Clover/ but I need to 
> debug it and fix it (it’s not working ATM).
> 
> As a result of the global TPC going down/stagnating, I have proposed to have 
> 10.7 focused on Tests + BFD.
> - Initially I proposed to focus on increasing the global TPC by looking at 
> the reports from 1) above (http://markmail.org/message/qjemnip7hjva2rjd). See 
> the last report at https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw (we need 
> to fix the red parts).
> - Then with the STAMP mid-term review, a bigger urgency surfaced and I asked 
> if we could instead focus on fixing tests as reported by Descartes to 
> increase both coverage and mutation score (ie test quality), since those are 
> 2 metrics/KPIs measured by STAMP and since XWiki participates to STAMP we 
> need to work on them and increase them substantially. See 
> http://markmail.org/message/ejmdkf3hx7drkj52
> 
> The results of XWiki 10.7 has been quite poor on test improvements  (more 
> focus on BFD than tests, lots of devs on holidays, etc). This forces us to 
> have a different strategy.
> 
> Full Strategy proposal
> =================
> 
> 1) As many XWiki SAS devs as possible (and anyone else from the community 
> who’s interested ofc! :)) should spend 1 day per week working on improving 
> STAMP metrics
> * Currently the agreement is that Thomas and myself will do this for the 
> foreseeable future till we get some good-enough metric progress
> * Some other devs from XWiki SAS will help out for XWiki 10.8 only FTM 
> (Marius, Adel if he can, Simon in the future). The idea is to see where that 
> could get us by using substantial manpower.
> 
> 2) All committers: More generally the global TPC failure is also already 
> active and dev need to modify modules that see their global TPC go down.
> 
> 3) All committers: Of course, the jacoco strategy is also active at each 
> module level.
> 
> STAMP tools
> ==========
> 
> There are 4 tools developed by STAMP:
> * Descartes: Improves quality of tests by increasing their mutation scores. 
> See http://markmail.org/message/bonb5f7f37omnnog and also 
> https://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes
> * DSpot: Automatically generate new tests, based on existing tests. See 
> https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot

Process to run DSpot:
1) Pick a module. Measure coverage and mutation score (or take the value there 
already if they’re in the pom.xml). Same as for Descartes testing.
2) Run DSpot on the module, see 
https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot for 
explanations
3) If DSpot has generated tests, add them to XWiki’s source code in 
src/test/dspot and add the following to the pom of that module:

<build>
  <plugins>
    <!-- Add test source root for executing DSpot-generated tests -->
    <plugin>
      <groupId>org.codehaus.mojo</groupId>
      <artifactId>build-helper-maven-plugin</artifactId>
    </plugin>
  </plugins>
</build>

Example: 
https://github.com/xwiki/xwiki-commons/tree/244ee07976c691c335b7f54c48e6308004ba3d82/xwiki-commons-core/xwiki-commons-crypto/xwiki-commons-crypto-cipher

Note: The generated tests sometimes need to be modified a bit to pass. 
Personally I’ve only committed tests that were passing and I reported issues 
for those that were not passing.

4) File the various reports:
a) https://github.com/STAMP-project/dspot-usecases-output/tree/master/xwiki 
both for success and failures
b) 
https://docs.google.com/spreadsheets/d/1LULpGpsJirmFyvHNstLGv-Gv5DVBdpLTM2hm0jgCKUw/edit#gid=2061481816
c) for failures, file a github issue at 
https://github.com/STAMP-project/dspot/issues and link to the place on 
https://github.com/STAMP-project/dspot-usecases-output/tree/master/xwiki where 
we put the failing result.

Note: The reason we need to report failures too is because DSpot fails a lot so 
we need to show what we have tested

Thanks
-Vincent

> * CAMP: Takes a Dockerfile and generates mutations of it, then deploys and 
> execute tests on the software to see if the mutation works or not. Note this 
> is currently not fitting the need of XWiki and thus I’ve been developing 
> another tool as an experiment (which may go back in CAMP one day), based on 
> TestContainers, see 
> https://massol.myxwiki.org/xwiki/bin/view/Blog/EnvironmentTestingExperimentations
> * EvoCrash: Takes a stack trace from production logs and generates a test 
> that, when executed, reproduces the crash. See 
> https://markmail.org/message/v74g3tsmflquqwra. See also 
> https://github.com/SERG-Delft/EvoCrash
> 
> Since XWiki is part of the STAMP research project, we need to use those 4 
> tools to increase the KPIs associated with the tools. See below.
> 
> Objectives/KPIs/Metrics for STAMP
> ===========================
> 
> The STAMP project has defined 9 KPIs that all partners (and thus XWiki) need 
> to work on:
> 
> 1) K01: Increase test coverage
> * Global increase by reducing by 40% the non-covered code. For XWiki since 
> we’re at about 70%, this means reaching about 80% before the end of STAMP 
> (ie. before end of 2019)
> * Increase the coverage contributions of each tool developed by STAMP.
> 
> Strategy:
> * Primary goal: 
> ** Increase coverage by executing Descartes and improving our tests. This is 
> http://markmail.org/message/ejmdkf3hx7drkj52
> ** Don’t do anything with DSpot. I’ll do that part. Note that the goal is to 
> write a Jenkins pipeline to automatically execute DSpot from time to time and 
> commit the generated tests in a separate test source and have our build 
> execute both src/test/java and this new test source.
> ** Don’t do anything with TestContainers FTM since I need to finish a first 
> working version. I may need help in the future to implement docker images for 
> more configurations (on Oracle, in a cluster, with LibreOffice, with an 
> external SOLR server, etc).
> ** For EvoCrash: We’ll count contributions of EvoCrash to coverage in K08.
> * Secondary goal:
> ** Increase our global TPC as mentioned above by fixing the modules in red.
> 
> 2) K02: Reduce flaky tests.
> * Objective: reduce the number of flaky tests by 20%
> 
> Strategy:
> * Record flaky tests in jira
> * Fix the max number of them
> 
> 3) K03: Better test quality
> * Objective: increase mutation score by 20%
> 
> Strategy:
> * Same strategy as K01.
> 
> 4) K04: More configuration-related paths tested
> * Objective: increase the code coverage of configuration-related paths in our 
> code by 20% (e.g. DB schema creation, cluster)related code, SOLR-related 
> code, LibreOffice-related code, etc).
> 
> Strategy:
> * Leave it to FTM. The idea is to measure Clover TPC with the base 
> configuration, then execute all other configurations (with TestContainers) 
> and regenerate the Clover report to see how much the TPC has increased.
> 
> 5) K05: Reduce system-specific bugs
> * Objective: 30% improvement
> 
> Strategy:
> * Run TestContainers, execute existing tests and find new bugs related to 
> configurations. Record them
> 
> 6) K06: More configurations/Faster tests
> * Objective: increase the number of automatically tested configurations by 50%
> 
> Strategy:
> * Increase the # of configurations we test with TestContainers. I’ll do that 
> part initially.
> * Reduce time it takes to deploy the software under a given configuration vs 
> time it used to take when done manually before STAMP. I’ll do this one. I’ve 
> already worked on it in the past year with the dockerization of XWiki.
> 
> 7) K07: Pending, nothing to do FTM
> 
> 8) K08: More crash replicating test cases
> * Objective: increase the number of crash replicating test cases by at least 
> 70%
> 
> Strategy:
> * For all issues that are still open and that have stack traces and for all 
> issues closed but without tests, run EvoCrash on them to try to generate a 
> test.
> * Record and count the number of successful EvoCrash-generated test cases.
> * Derive a regression test (which can be very different from the negative of 
> the test generated by evocrash!). 
> * Measure the new coverage increase
> * Note that I haven’t experimented much with this yet myself.
> 
> 9) K09: Pending, nothing to do FTM.
> 
> Conclusion
> =========
> 
> Right now, I need your help for the following KPIs: K01, K02, K03, K08.
> 
> Since there’s a lot to understand in this email, I’m open to:
> * Organizing a meeting on youtube live to discuss all this
> * Answering any questions on this thread ofc
> * Also feel free to ask on IRC/Matrix.
> 
> Here’s an extract from STAMP which has more details about the KPIs/metrics:
> https://up1.xwikisas.com/#QJyxqspKXSzuWNOHUuAaEA
> 
> Thanks
> -Vincent
> 
> 
> 
> 
> 
>

Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy

Reply via email to