Hi there, We need some more DSpot results. Would be great if you could help out.
See below for instructions. > On 29 Aug 2018, at 11:20, Vincent Massol <[email protected]> wrote: > > Hi devs (and anyone else interested to improve the tests of XWiki), > > History > ====== > > It all started when I analyzed our global TPC and found that it was going > down globally even though we have the fail-build-on-jacoco-threshold strategy. > > I sent several email threads: > > - Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6 > - TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn > - Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7 > > Note: As a consequence of this last thread, I implemented a Jenkins Pipeline > to send us a mail when the global TPC of an XWiki module goes down so that we > fix it ASAP. This is still a development in progress. A first version is done > and running at https://ci.xwiki.org/view/Tools/job/Clover/ but I need to > debug it and fix it (it’s not working ATM). > > As a result of the global TPC going down/stagnating, I have proposed to have > 10.7 focused on Tests + BFD. > - Initially I proposed to focus on increasing the global TPC by looking at > the reports from 1) above (http://markmail.org/message/qjemnip7hjva2rjd). See > the last report at https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw (we need > to fix the red parts). > - Then with the STAMP mid-term review, a bigger urgency surfaced and I asked > if we could instead focus on fixing tests as reported by Descartes to > increase both coverage and mutation score (ie test quality), since those are > 2 metrics/KPIs measured by STAMP and since XWiki participates to STAMP we > need to work on them and increase them substantially. See > http://markmail.org/message/ejmdkf3hx7drkj52 > > The results of XWiki 10.7 has been quite poor on test improvements (more > focus on BFD than tests, lots of devs on holidays, etc). This forces us to > have a different strategy. > > Full Strategy proposal > ================= > > 1) As many XWiki SAS devs as possible (and anyone else from the community > who’s interested ofc! :)) should spend 1 day per week working on improving > STAMP metrics > * Currently the agreement is that Thomas and myself will do this for the > foreseeable future till we get some good-enough metric progress > * Some other devs from XWiki SAS will help out for XWiki 10.8 only FTM > (Marius, Adel if he can, Simon in the future). The idea is to see where that > could get us by using substantial manpower. > > 2) All committers: More generally the global TPC failure is also already > active and dev need to modify modules that see their global TPC go down. > > 3) All committers: Of course, the jacoco strategy is also active at each > module level. > > STAMP tools > ========== > > There are 4 tools developed by STAMP: > * Descartes: Improves quality of tests by increasing their mutation scores. > See http://markmail.org/message/bonb5f7f37omnnog and also > https://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes > * DSpot: Automatically generate new tests, based on existing tests. See > https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot Process to run DSpot: 1) Pick a module. Measure coverage and mutation score (or take the value there already if they’re in the pom.xml). Same as for Descartes testing. 2) Run DSpot on the module, see https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot for explanations 3) If DSpot has generated tests, add them to XWiki’s source code in src/test/dspot and add the following to the pom of that module: <build> <plugins> <!-- Add test source root for executing DSpot-generated tests --> <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>build-helper-maven-plugin</artifactId> </plugin> </plugins> </build> Example: https://github.com/xwiki/xwiki-commons/tree/244ee07976c691c335b7f54c48e6308004ba3d82/xwiki-commons-core/xwiki-commons-crypto/xwiki-commons-crypto-cipher Note: The generated tests sometimes need to be modified a bit to pass. Personally I’ve only committed tests that were passing and I reported issues for those that were not passing. 4) File the various reports: a) https://github.com/STAMP-project/dspot-usecases-output/tree/master/xwiki both for success and failures b) https://docs.google.com/spreadsheets/d/1LULpGpsJirmFyvHNstLGv-Gv5DVBdpLTM2hm0jgCKUw/edit#gid=2061481816 c) for failures, file a github issue at https://github.com/STAMP-project/dspot/issues and link to the place on https://github.com/STAMP-project/dspot-usecases-output/tree/master/xwiki where we put the failing result. Note: The reason we need to report failures too is because DSpot fails a lot so we need to show what we have tested Thanks -Vincent > * CAMP: Takes a Dockerfile and generates mutations of it, then deploys and > execute tests on the software to see if the mutation works or not. Note this > is currently not fitting the need of XWiki and thus I’ve been developing > another tool as an experiment (which may go back in CAMP one day), based on > TestContainers, see > https://massol.myxwiki.org/xwiki/bin/view/Blog/EnvironmentTestingExperimentations > * EvoCrash: Takes a stack trace from production logs and generates a test > that, when executed, reproduces the crash. See > https://markmail.org/message/v74g3tsmflquqwra. See also > https://github.com/SERG-Delft/EvoCrash > > Since XWiki is part of the STAMP research project, we need to use those 4 > tools to increase the KPIs associated with the tools. See below. > > Objectives/KPIs/Metrics for STAMP > =========================== > > The STAMP project has defined 9 KPIs that all partners (and thus XWiki) need > to work on: > > 1) K01: Increase test coverage > * Global increase by reducing by 40% the non-covered code. For XWiki since > we’re at about 70%, this means reaching about 80% before the end of STAMP > (ie. before end of 2019) > * Increase the coverage contributions of each tool developed by STAMP. > > Strategy: > * Primary goal: > ** Increase coverage by executing Descartes and improving our tests. This is > http://markmail.org/message/ejmdkf3hx7drkj52 > ** Don’t do anything with DSpot. I’ll do that part. Note that the goal is to > write a Jenkins pipeline to automatically execute DSpot from time to time and > commit the generated tests in a separate test source and have our build > execute both src/test/java and this new test source. > ** Don’t do anything with TestContainers FTM since I need to finish a first > working version. I may need help in the future to implement docker images for > more configurations (on Oracle, in a cluster, with LibreOffice, with an > external SOLR server, etc). > ** For EvoCrash: We’ll count contributions of EvoCrash to coverage in K08. > * Secondary goal: > ** Increase our global TPC as mentioned above by fixing the modules in red. > > 2) K02: Reduce flaky tests. > * Objective: reduce the number of flaky tests by 20% > > Strategy: > * Record flaky tests in jira > * Fix the max number of them > > 3) K03: Better test quality > * Objective: increase mutation score by 20% > > Strategy: > * Same strategy as K01. > > 4) K04: More configuration-related paths tested > * Objective: increase the code coverage of configuration-related paths in our > code by 20% (e.g. DB schema creation, cluster)related code, SOLR-related > code, LibreOffice-related code, etc). > > Strategy: > * Leave it to FTM. The idea is to measure Clover TPC with the base > configuration, then execute all other configurations (with TestContainers) > and regenerate the Clover report to see how much the TPC has increased. > > 5) K05: Reduce system-specific bugs > * Objective: 30% improvement > > Strategy: > * Run TestContainers, execute existing tests and find new bugs related to > configurations. Record them > > 6) K06: More configurations/Faster tests > * Objective: increase the number of automatically tested configurations by 50% > > Strategy: > * Increase the # of configurations we test with TestContainers. I’ll do that > part initially. > * Reduce time it takes to deploy the software under a given configuration vs > time it used to take when done manually before STAMP. I’ll do this one. I’ve > already worked on it in the past year with the dockerization of XWiki. > > 7) K07: Pending, nothing to do FTM > > 8) K08: More crash replicating test cases > * Objective: increase the number of crash replicating test cases by at least > 70% > > Strategy: > * For all issues that are still open and that have stack traces and for all > issues closed but without tests, run EvoCrash on them to try to generate a > test. > * Record and count the number of successful EvoCrash-generated test cases. > * Derive a regression test (which can be very different from the negative of > the test generated by evocrash!). > * Measure the new coverage increase > * Note that I haven’t experimented much with this yet myself. > > 9) K09: Pending, nothing to do FTM. > > Conclusion > ========= > > Right now, I need your help for the following KPIs: K01, K02, K03, K08. > > Since there’s a lot to understand in this email, I’m open to: > * Organizing a meeting on youtube live to discuss all this > * Answering any questions on this thread ofc > * Also feel free to ask on IRC/Matrix. > > Here’s an extract from STAMP which has more details about the KPIs/metrics: > https://up1.xwikisas.com/#QJyxqspKXSzuWNOHUuAaEA > > Thanks > -Vincent > > > > > >

