Sounds good. On Mon, Sep 3, 2018 at 9:55 AM, Vincent Massol <vinc...@massol.net> wrote: > >> On 3 Sep 2018, at 09:55, Vincent Massol <vinc...@massol.net> wrote: >> >> I propose to do this tomorrow Tuesday, starting with an intro from me, using >> youtube live. > > Say, 10AM Paris time. > > Thanks > -Vincent > >> WDYT? >> >> Thanks >> -Vincent >> >>> On 30 Aug 2018, at 12:27, Adel Atallah <adel.atal...@xwiki.com> wrote: >>> >>> Just to be clear, when I proposed "having a whole day dedicated on >>> using these tools", I didn't meant having to have it every week but >>> only once, so we can properly start improving the tests. It would be >>> some kind of training. >>> On my side I don't think I'll be able to have on a week one day >>> dedicated to tests and one for bug fixing, I won't have time left for >>> the roadmap as I will only work on the product 50% of the time. >>> >>> >>> On Thu, Aug 30, 2018 at 12:18 PM, Vincent Massol <vinc...@massol.net> wrote: >>>> Hi, >>>> >>>> I don’t remember discussing this with you Thomas. Actually I’m not >>>> convinced to have a fixed day: >>>> * we already have a fixed BFD and having a second one doesn’t leave much >>>> flexibility for working on roadmap items when it’s the best >>>> * test sessions can be short (0.5-1 hours) and it’s easy to do them >>>> between other tasks >>>> * it can be boring to spend a full day on them >>>> >>>> Now, I agree that not having a fixed day will make it hard to make sure >>>> that we work 20% on that topic. >>>> >>>> So if you prefer we can define a day, knowing that some won’t be able to >>>> always attend during that day and in this case they should do it on >>>> another day. What’s important is to have 20% done each week (i.e. enough >>>> work done on it). >>>> >>>> In term of day, if we have to choose one, I’d say Tuesday. That’s the most >>>> logical to me. >>>> >>>> WDYT? What do you prefer? >>>> >>>> Thanks >>>> -Vincent >>>> >>>>> On 30 Aug 2018, at 10:38, Thomas Mortagne <thomas.morta...@xwiki.com> >>>>> wrote: >>>>> >>>>> Indeed we discussed this but I don't see it in your mail Vincent. >>>>> >>>>> On Thu, Aug 30, 2018 at 10:33 AM, Adel Atallah <adel.atal...@xwiki.com> >>>>> wrote: >>>>>> Hello, >>>>>> >>>>>> Maybe we should agree on having a whole day dedicated on using these >>>>>> tools with a maximum number of developers. >>>>>> That way we will be able to help each other and maybe it will make the >>>>>> process easier to carry out in the future. >>>>>> >>>>>> WDYT? >>>>>> >>>>>> Thanks, >>>>>> Adel >>>>>> >>>>>> >>>>>> On Wed, Aug 29, 2018 at 11:20 AM, Vincent Massol <vinc...@massol.net> >>>>>> wrote: >>>>>>> Hi devs (and anyone else interested to improve the tests of XWiki), >>>>>>> >>>>>>> History >>>>>>> ====== >>>>>>> >>>>>>> It all started when I analyzed our global TPC and found that it was >>>>>>> going down globally even though we have the >>>>>>> fail-build-on-jacoco-threshold strategy. >>>>>>> >>>>>>> I sent several email threads: >>>>>>> >>>>>>> - Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6 >>>>>>> - TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn >>>>>>> - Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7 >>>>>>> >>>>>>> Note: As a consequence of this last thread, I implemented a Jenkins >>>>>>> Pipeline to send us a mail when the global TPC of an XWiki module goes >>>>>>> down so that we fix it ASAP. This is still a development in progress. A >>>>>>> first version is done and running at >>>>>>> https://ci.xwiki.org/view/Tools/job/Clover/ but I need to debug it and >>>>>>> fix it (it’s not working ATM). >>>>>>> >>>>>>> As a result of the global TPC going down/stagnating, I have proposed to >>>>>>> have 10.7 focused on Tests + BFD. >>>>>>> - Initially I proposed to focus on increasing the global TPC by looking >>>>>>> at the reports from 1) above >>>>>>> (http://markmail.org/message/qjemnip7hjva2rjd). See the last report at >>>>>>> https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw (we need to fix the >>>>>>> red parts). >>>>>>> - Then with the STAMP mid-term review, a bigger urgency surfaced and I >>>>>>> asked if we could instead focus on fixing tests as reported by >>>>>>> Descartes to increase both coverage and mutation score (ie test >>>>>>> quality), since those are 2 metrics/KPIs measured by STAMP and since >>>>>>> XWiki participates to STAMP we need to work on them and increase them >>>>>>> substantially. See http://markmail.org/message/ejmdkf3hx7drkj52 >>>>>>> >>>>>>> The results of XWiki 10.7 has been quite poor on test improvements >>>>>>> (more focus on BFD than tests, lots of devs on holidays, etc). This >>>>>>> forces us to have a different strategy. >>>>>>> >>>>>>> Full Strategy proposal >>>>>>> ================= >>>>>>> >>>>>>> 1) As many XWiki SAS devs as possible (and anyone else from the >>>>>>> community who’s interested ofc! :)) should spend 1 day per week working >>>>>>> on improving STAMP metrics >>>>>>> * Currently the agreement is that Thomas and myself will do this for >>>>>>> the foreseeable future till we get some good-enough metric progress >>>>>>> * Some other devs from XWiki SAS will help out for XWiki 10.8 only FTM >>>>>>> (Marius, Adel if he can, Simon in the future). The idea is to see where >>>>>>> that could get us by using substantial manpower. >>>>>>> >>>>>>> 2) All committers: More generally the global TPC failure is also >>>>>>> already active and dev need to modify modules that see their global TPC >>>>>>> go down. >>>>>>> >>>>>>> 3) All committers: Of course, the jacoco strategy is also active at >>>>>>> each module level. >>>>>>> >>>>>>> STAMP tools >>>>>>> ========== >>>>>>> >>>>>>> There are 4 tools developed by STAMP: >>>>>>> * Descartes: Improves quality of tests by increasing their mutation >>>>>>> scores. See http://markmail.org/message/bonb5f7f37omnnog and also >>>>>>> https://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes >>>>>>> * DSpot: Automatically generate new tests, based on existing tests. See >>>>>>> https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot >>>>>>> * CAMP: Takes a Dockerfile and generates mutations of it, then deploys >>>>>>> and execute tests on the software to see if the mutation works or not. >>>>>>> Note this is currently not fitting the need of XWiki and thus I’ve been >>>>>>> developing another tool as an experiment (which may go back in CAMP one >>>>>>> day), based on TestContainers, see >>>>>>> https://massol.myxwiki.org/xwiki/bin/view/Blog/EnvironmentTestingExperimentations >>>>>>> * EvoCrash: Takes a stack trace from production logs and generates a >>>>>>> test that, when executed, reproduces the crash. See >>>>>>> https://markmail.org/message/v74g3tsmflquqwra. See also >>>>>>> https://github.com/SERG-Delft/EvoCrash >>>>>>> >>>>>>> Since XWiki is part of the STAMP research project, we need to use those >>>>>>> 4 tools to increase the KPIs associated with the tools. See below. >>>>>>> >>>>>>> Objectives/KPIs/Metrics for STAMP >>>>>>> =========================== >>>>>>> >>>>>>> The STAMP project has defined 9 KPIs that all partners (and thus XWiki) >>>>>>> need to work on: >>>>>>> >>>>>>> 1) K01: Increase test coverage >>>>>>> * Global increase by reducing by 40% the non-covered code. For XWiki >>>>>>> since we’re at about 70%, this means reaching about 80% before the end >>>>>>> of STAMP (ie. before end of 2019) >>>>>>> * Increase the coverage contributions of each tool developed by STAMP. >>>>>>> >>>>>>> Strategy: >>>>>>> * Primary goal: >>>>>>> ** Increase coverage by executing Descartes and improving our tests. >>>>>>> This is http://markmail.org/message/ejmdkf3hx7drkj52 >>>>>>> ** Don’t do anything with DSpot. I’ll do that part. Note that the goal >>>>>>> is to write a Jenkins pipeline to automatically execute DSpot from time >>>>>>> to time and commit the generated tests in a separate test source and >>>>>>> have our build execute both src/test/java and this new test source. >>>>>>> ** Don’t do anything with TestContainers FTM since I need to finish a >>>>>>> first working version. I may need help in the future to implement >>>>>>> docker images for more configurations (on Oracle, in a cluster, with >>>>>>> LibreOffice, with an external SOLR server, etc). >>>>>>> ** For EvoCrash: We’ll count contributions of EvoCrash to coverage in >>>>>>> K08. >>>>>>> * Secondary goal: >>>>>>> ** Increase our global TPC as mentioned above by fixing the modules in >>>>>>> red. >>>>>>> >>>>>>> 2) K02: Reduce flaky tests. >>>>>>> * Objective: reduce the number of flaky tests by 20% >>>>>>> >>>>>>> Strategy: >>>>>>> * Record flaky tests in jira >>>>>>> * Fix the max number of them >>>>>>> >>>>>>> 3) K03: Better test quality >>>>>>> * Objective: increase mutation score by 20% >>>>>>> >>>>>>> Strategy: >>>>>>> * Same strategy as K01. >>>>>>> >>>>>>> 4) K04: More configuration-related paths tested >>>>>>> * Objective: increase the code coverage of configuration-related paths >>>>>>> in our code by 20% (e.g. DB schema creation, cluster)related code, >>>>>>> SOLR-related code, LibreOffice-related code, etc). >>>>>>> >>>>>>> Strategy: >>>>>>> * Leave it to FTM. The idea is to measure Clover TPC with the base >>>>>>> configuration, then execute all other configurations (with >>>>>>> TestContainers) and regenerate the Clover report to see how much the >>>>>>> TPC has increased. >>>>>>> >>>>>>> 5) K05: Reduce system-specific bugs >>>>>>> * Objective: 30% improvement >>>>>>> >>>>>>> Strategy: >>>>>>> * Run TestContainers, execute existing tests and find new bugs related >>>>>>> to configurations. Record them >>>>>>> >>>>>>> 6) K06: More configurations/Faster tests >>>>>>> * Objective: increase the number of automatically tested configurations >>>>>>> by 50% >>>>>>> >>>>>>> Strategy: >>>>>>> * Increase the # of configurations we test with TestContainers. I’ll do >>>>>>> that part initially. >>>>>>> * Reduce time it takes to deploy the software under a given >>>>>>> configuration vs time it used to take when done manually before STAMP. >>>>>>> I’ll do this one. I’ve already worked on it in the past year with the >>>>>>> dockerization of XWiki. >>>>>>> >>>>>>> 7) K07: Pending, nothing to do FTM >>>>>>> >>>>>>> 8) K08: More crash replicating test cases >>>>>>> * Objective: increase the number of crash replicating test cases by at >>>>>>> least 70% >>>>>>> >>>>>>> Strategy: >>>>>>> * For all issues that are still open and that have stack traces and for >>>>>>> all issues closed but without tests, run EvoCrash on them to try to >>>>>>> generate a test. >>>>>>> * Record and count the number of successful EvoCrash-generated test >>>>>>> cases. >>>>>>> * Derive a regression test (which can be very different from the >>>>>>> negative of the test generated by evocrash!). >>>>>>> * Measure the new coverage increase >>>>>>> * Note that I haven’t experimented much with this yet myself. >>>>>>> >>>>>>> 9) K09: Pending, nothing to do FTM. >>>>>>> >>>>>>> Conclusion >>>>>>> ========= >>>>>>> >>>>>>> Right now, I need your help for the following KPIs: K01, K02, K03, K08. >>>>>>> >>>>>>> Since there’s a lot to understand in this email, I’m open to: >>>>>>> * Organizing a meeting on youtube live to discuss all this >>>>>>> * Answering any questions on this thread ofc >>>>>>> * Also feel free to ask on IRC/Matrix. >>>>>>> >>>>>>> Here’s an extract from STAMP which has more details about the >>>>>>> KPIs/metrics: >>>>>>> https://up1.xwikisas.com/#QJyxqspKXSzuWNOHUuAaEA >>>>>>> >>>>>>> Thanks >>>>>>> -Vincent >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Thomas Mortagne >>>> >> >
-- Thomas Mortagne