Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing
Ben Pfaff writes: > On Thu, Sep 06, 2018 at 04:56:18AM -0400, Aaron Conole wrote: >> As of June, the 0-day robot has tested over 450 patch series. >> Occasionally it spams the list (apologies for that), but for the >> majority of the time it has caught issues before they made it to the >> tree - so it's accomplishing the initial goal just fine. >> >> I see lots of ways it can improve. Currently, the bot runs on a light >> system. It takes ~20 minutes to complete a set of tests, including all >> the checkpatch and rebuild runs. That's not a big issue. BUT, it does >> mean that the machine isn't able to perform all the kinds of regression >> tests that we would want. I want to improve this in a way that various >> contributors can bring their own hardware and regression tests to the >> party. In that way, various projects can detect potential issues before >> they would ever land on the tree and it could flag functional changes >> earlier in the process. >> >> I'm not sure the best way to do that. One thing I'll be doing is >> updating the bot to push a series that successfully builds and passes >> checkpatch to a special branch on a github repository to kick off travis >> builds. That will give us a more complete regression coverage, and we >> could be confident that a series won't break something major. After >> that, I'm not sure how to notify various alternate test infrastructures >> how to kick off their own tests using the patched sources. > > That's pretty exciting. > > Don't forget about appveyor, either. Hardly any of us builds on > Windows, so appveyor is likely to catch things that we won't. I haven't forgotten about this. Unveiling (after some quick testing... hopefully it continues to work): https://github.com/ovsrobot/ovs https://travis-ci.org/ovsrobot/ovs https://ci.appveyor.com/project/ovsrobot/ovs I don't know if the appveyor stuff is set up correctly, so if anyone who is more knowledgeable in that area wants to assist contact me. I only have one patch from one series built. BUT - every series now is submitted to a github branch, and that kicks off Travis (and hopefully appveyor). They can be found by their series id in patchwork: series_SERIES_ID I don't have any mechanism to clean them up. Presumably I'll need to figure out how to do that at some point... :) >> My goal is to get really early feedback on patch series. I've sent this >> out to the folks I know are involved in testing and test discussions in >> the hopes that we can talk about how best to get more CI happening. The >> open questions: >> >> 1. How can we notify various downstream consumers of OvS of these >>0-day builds? Should we just rely on people rolling their own? >>Should there be a more formalized framework? How will these other >>test frameworks report any kind of failures? > > Do you mean notify of successes or failures? I assumed that the robot's > email would notify us of that. > > Do you mean actually provide the builds? I don't know a good way to do > that. > >> 2. What kinds of additional testing do we want to see the robot include? >>Should the test results be made available in general on some kind of >>public facing site? Should it just stay as a "bleep bloop - >>failure!" marker? > > It would be super awesome if we could run the various additional > testsuites that we have: check-system-userspace, check-kernel, etc. We > can't run them easily on travis because they require superuser (and bugs > sometimes crash the system). > >> 3. What other concerns should be addressed? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing
"Eelco Chaudron" writes: > On 11 Sep 2018, at 17:51, Aaron Conole wrote: > >> "Eelco Chaudron" writes: >> >>> On 6 Sep 2018, at 10:56, Aaron Conole wrote: >>> As of June, the 0-day robot has tested over 450 patch series. Occasionally it spams the list (apologies for that), but for the majority of the time it has caught issues before they made it to the tree - so it's accomplishing the initial goal just fine. I see lots of ways it can improve. Currently, the bot runs on a light system. It takes ~20 minutes to complete a set of tests, including all the checkpatch and rebuild runs. That's not a big issue. BUT, it does mean that the machine isn't able to perform all the kinds of regression tests that we would want. I want to improve this in a way that various contributors can bring their own hardware and regression tests to the party. In that way, various projects can detect potential issues before they would ever land on the tree and it could flag functional changes earlier in the process. I'm not sure the best way to do that. One thing I'll be doing is updating the bot to push a series that successfully builds and passes checkpatch to a special branch on a github repository to kick off travis builds. That will give us a more complete regression coverage, and we could be confident that a series won't break something major. After that, I'm not sure how to notify various alternate test infrastructures how to kick off their own tests using the patched sources. My goal is to get really early feedback on patch series. I've sent this out to the folks I know are involved in testing and test discussions in the hopes that we can talk about how best to get more CI happening. The open questions: 1. How can we notify various downstream consumers of OvS of these 0-day builds? Should we just rely on people rolling their own? Should there be a more formalized framework? How will these other test frameworks report any kind of failures? 2. What kinds of additional testing do we want to see the robot include? >>> >>> First of all thanks for the 0-day robot, I really like the idea… >>> >>> One thing I feel would really benefit is some basic performance >>> testing, like a PVP test for the kernel/dpdk datapath. This will help >>> easily identifying performance impacting patches as they happen… >>> Rather than people figuring out after a release why their performance >>> has dropped. >> >> Yes - I hope to pull in the work you've done for ovs_perf to have some >> kind of baselines. >> >> For this to make sense, I think it also needs to have a bunch of >> hardware that we can benchmark (hint hint to some of the folks in >> the CC >> list :). Not for absolute numbers, but at least to detect significant >> changes. >> >> I'm also not sure how to measure a 'problem.' Do we run a test >> pre-series, and then run it post-series? In that case, we could >> slowly >> degrade performance over time without any noticing. Do we take it >> from >> the previous release, and compare? Might make more sense, but I don't >> know if it has other problems associated. What are the thresholds we >> use for saying something is a regression? How do we report it to >> developers? > > Guess both in an ideal world, and maybe add a weekly baseline for > master :) > > Having a graph of this would be really nice. However, this might be a > whole project on itself, i.e. performance runs on all commits to > master… I consider that a useful thing to have (a dashboard with information on series tested, etc). But I agree, it's another whole project. Should the test results be made available in general on some kind of public facing site? Should it just stay as a "bleep bloop - failure!" marker? 3. What other concerns should be addressed? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing
Ian Stokes writes: > On 9/11/2018 4:51 PM, Aaron Conole wrote: >> "Eelco Chaudron" writes: >> >>> On 6 Sep 2018, at 10:56, Aaron Conole wrote: >>> As of June, the 0-day robot has tested over 450 patch series. Occasionally it spams the list (apologies for that), but for the majority of the time it has caught issues before they made it to the tree - so it's accomplishing the initial goal just fine. I see lots of ways it can improve. Currently, the bot runs on a light system. It takes ~20 minutes to complete a set of tests, including all the checkpatch and rebuild runs. That's not a big issue. BUT, it does mean that the machine isn't able to perform all the kinds of regression tests that we would want. I want to improve this in a way that various contributors can bring their own hardware and regression tests to the party. In that way, various projects can detect potential issues before they would ever land on the tree and it could flag functional changes earlier in the process. I'm not sure the best way to do that. One thing I'll be doing is updating the bot to push a series that successfully builds and passes checkpatch to a special branch on a github repository to kick off travis builds. That will give us a more complete regression coverage, and we could be confident that a series won't break something major. After that, I'm not sure how to notify various alternate test infrastructures how to kick off their own tests using the patched sources. My goal is to get really early feedback on patch series. I've sent this out to the folks I know are involved in testing and test discussions in the hopes that we can talk about how best to get more CI happening. The open questions: 1. How can we notify various downstream consumers of OvS of these 0-day builds? Should we just rely on people rolling their own? Should there be a more formalized framework? How will these other test frameworks report any kind of failures? 2. What kinds of additional testing do we want to see the robot include? >>> >>> First of all thanks for the 0-day robot, I really like the idea… > +1, great work on this. > >>> >>> One thing I feel would really benefit is some basic performance >>> testing, like a PVP test for the kernel/dpdk datapath. This will help >>> easily identifying performance impacting patches as they happen… >>> Rather than people figuring out after a release why their performance >>> has dropped. >> > > To date I've been using vsperf to conduct p2p, pvp, pvvp, vxlan tests > etc. The framework for a lot of these are already in place. It > supports a number of traffic gens also such as t-rex, moongen etc. as > well as the commercial usual suspects. I think we also have some VSPerf tests, as well. I'm happy to use whatever :) > The vsperf CI also published a large number of tests with both OVS > DPDK and OVS kernel. Not sure if it is still running however, I'll > look into it as the graphs in the link below seem out of date. > > https://wiki.opnfv.org/display/vsperf/VSPERF+CI+Results#VSPERFCIResults-OVSwithDPDK > > Currently it uses DPDK 17.08 and OVS 2.9 by default but I have it > working with DPDK 17.11 and OVS master on my own system easily enough. Awesome. I'll pull a Bane: "Your precious test suite, gratefully accepted!" >> Yes - I hope to pull in the work you've done for ovs_perf to have some >> kind of baselines. >> >> For this to make sense, I think it also needs to have a bunch of >> hardware that we can benchmark (hint hint to some of the folks in the CC >> list :). Not for absolute numbers, but at least to detect significant >> changes. > > Working on it :). It leads to another discussion though, if we have > hardware ready to ship then where should it go? Where's the best place > to host and maintain the CI system. > >> >> I'm also not sure how to measure a 'problem.' Do we run a test >> pre-series, and then run it post-series? In that case, we could slowly >> degrade performance over time without any noticing. Do we take it from >> the previous release, and compare? Might make more sense, but I don't >> know if it has other problems associated. What are the thresholds we >> use for saying something is a regression? How do we report it to >> developers? > > It's a good point, we typically run perf tests nightly in order to > gauge any degradation on OVS master. Possibly this could help in > comparison as long as the HW is the same. I would be anxious not to > overburden the robot test system from the get go however. It's primary > purpose initially would be to provide feedback on patch series so I'd > like to avoid having it tied up with performance checking what has > been upstreamed already. I agree with the burdening part. I think from our side we'd
Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing
On 11 Sep 2018, at 17:51, Aaron Conole wrote: "Eelco Chaudron" writes: On 6 Sep 2018, at 10:56, Aaron Conole wrote: As of June, the 0-day robot has tested over 450 patch series. Occasionally it spams the list (apologies for that), but for the majority of the time it has caught issues before they made it to the tree - so it's accomplishing the initial goal just fine. I see lots of ways it can improve. Currently, the bot runs on a light system. It takes ~20 minutes to complete a set of tests, including all the checkpatch and rebuild runs. That's not a big issue. BUT, it does mean that the machine isn't able to perform all the kinds of regression tests that we would want. I want to improve this in a way that various contributors can bring their own hardware and regression tests to the party. In that way, various projects can detect potential issues before they would ever land on the tree and it could flag functional changes earlier in the process. I'm not sure the best way to do that. One thing I'll be doing is updating the bot to push a series that successfully builds and passes checkpatch to a special branch on a github repository to kick off travis builds. That will give us a more complete regression coverage, and we could be confident that a series won't break something major. After that, I'm not sure how to notify various alternate test infrastructures how to kick off their own tests using the patched sources. My goal is to get really early feedback on patch series. I've sent this out to the folks I know are involved in testing and test discussions in the hopes that we can talk about how best to get more CI happening. The open questions: 1. How can we notify various downstream consumers of OvS of these 0-day builds? Should we just rely on people rolling their own? Should there be a more formalized framework? How will these other test frameworks report any kind of failures? 2. What kinds of additional testing do we want to see the robot include? First of all thanks for the 0-day robot, I really like the idea… One thing I feel would really benefit is some basic performance testing, like a PVP test for the kernel/dpdk datapath. This will help easily identifying performance impacting patches as they happen… Rather than people figuring out after a release why their performance has dropped. Yes - I hope to pull in the work you've done for ovs_perf to have some kind of baselines. For this to make sense, I think it also needs to have a bunch of hardware that we can benchmark (hint hint to some of the folks in the CC list :). Not for absolute numbers, but at least to detect significant changes. I'm also not sure how to measure a 'problem.' Do we run a test pre-series, and then run it post-series? In that case, we could slowly degrade performance over time without any noticing. Do we take it from the previous release, and compare? Might make more sense, but I don't know if it has other problems associated. What are the thresholds we use for saying something is a regression? How do we report it to developers? Guess both in an ideal world, and maybe add a weekly baseline for master :) Having a graph of this would be really nice. However, this might be a whole project on itself, i.e. performance runs on all commits to master… Should the test results be made available in general on some kind of public facing site? Should it just stay as a "bleep bloop - failure!" marker? 3. What other concerns should be addressed? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing
Ophir Munk writes: >> -Original Message- >> From: Aaron Conole [mailto:acon...@bytheb.org] >> Sent: Thursday, September 06, 2018 11:56 AM >> To: Ian Stokes ; Kevin Traynor >> ; Ophir Munk ; Ferruh >> Yigit ; Luca Boccassi ; Jeremy >> Plsek ; Sugesh Chandran >> ; Jean-Tsung Hsiao ; >> Christian Trautman ; Ben Pfaff ; >> Bala Sankaran >> Cc: d...@openvswitch.org >> Subject: [RFC] Federating the 0-day robot, and improving the testing >> >> As of June, the 0-day robot has tested over 450 patch series. >> Occasionally it spams the list (apologies for that), but for the majority of >> the >> time it has caught issues before they made it to the tree - so it's >> accomplishing the initial goal just fine. >> >> I see lots of ways it can improve. Currently, the bot runs on a light >> system. It >> takes ~20 minutes to complete a set of tests, including all the checkpatch >> and rebuild runs. That's not a big issue. BUT, it does mean that the >> machine >> isn't able to perform all the kinds of regression tests that we would want. >> I >> want to improve this in a way that various contributors can bring their own >> hardware and regression tests to the party. In that way, various projects >> can >> detect potential issues before they would ever land on the tree and it could >> flag functional changes earlier in the process. >> > > First of all - lots of thanks for this great work. > A few questions/comments: > 1. Are the tests mentioned above considered core/sanity tests to make > sure the basic functionality is not broken? Yes - actually, I haven't re-enabled reporting the make check, so it's basically: 1. git am 2. checkpatch 3. make If any of those fails, they get reported. Future work, we'll re-enable reporting the other checks. > 2. Is there a link to the tests which are executed? How can they be reviewed? Documentation/topics/testing.rst covers the high level overview (including the testsuites run by doing make check{-dpdk,-kernel,-kmod,-system-userspace,-ryu,-oftest,-valgrind,-lcov,}) The various tests are primarily wired up through m4, although they can be written in any language provided there's a binary to execute. > 3. Is there a link to the tests results? How can they be viewed? For the bot, right now, there isn't a link. I think a dashboard functionality is probably worthwhile to write. > 4. Is the test environment documented? I think it would be beneficial > if in parallel to the 0-day robot each vendor would be able to build > the same environment locally in order to test his patches before > sending them. Yes and no. For example, the exact steps the bot takes are all documented at: https://github.com/orgcandman/pw-ci/blob/master/3rd-party/openvswitch/config.xml But, as I wrote above, we just report failures from the steps above. > 5. I am interested in having Mellanox NICs taking part of these > tests. We will have some internal discussions regarding this, then I > will be more specific. Awesome! Look forward to hearing more. >> I'm not sure the best way to do that. One thing I'll be doing is updating >> the >> bot to push a series that successfully builds and passes checkpatch to a >> special branch on a github repository to kick off travis builds. That will >> give >> us a more complete regression coverage, and we could be confident that a >> series won't break something major. > > I suggest to tag the daily regression series and to have public access to it. > In case anything is broken we should get an email notifying on this > and be able to bisect the tree (based on tag) to find which commit is > causing issues. It is even better to have the bot doing the bisect. Not sure what it means. I don't think there should be anything to bisect yet - but that's probably because I'm focused on the submission side of testing. Of course, a future effort would be some kind of full regression. I guess that's what you're referring to here. >> After that, I'm not sure how to notify >> various alternate test infrastructures how to kick off their own tests using >> the patched sources. >> >> My goal is to get really early feedback on patch series. I've sent this out >> to >> the folks I know are involved in testing and test discussions in the hopes >> that >> we can talk about how best to get more CI happening. The open questions: >> >> 1. How can we notify various downstream consumers of OvS of these >>0-day builds? Should we just rely on people rolling their own? >>Should there be a more formalized framework? How will these other >>test frameworks report any kind of failures? >> >> 2. What kinds of additional testing do we want to see the robot include? >>Should the test results be made available in general on some kind of >>public facing site? Should it just stay as a "bleep bloop - >>failure!" marker? >> >> 3. What other concerns should be addressed? > > I am looking forward to start running even with just basic tests
Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing
> -Original Message- > From: Aaron Conole [mailto:acon...@bytheb.org] > Sent: Thursday, September 06, 2018 11:56 AM > To: Ian Stokes ; Kevin Traynor > ; Ophir Munk ; Ferruh > Yigit ; Luca Boccassi ; Jeremy > Plsek ; Sugesh Chandran > ; Jean-Tsung Hsiao ; > Christian Trautman ; Ben Pfaff ; > Bala Sankaran > Cc: d...@openvswitch.org > Subject: [RFC] Federating the 0-day robot, and improving the testing > > As of June, the 0-day robot has tested over 450 patch series. > Occasionally it spams the list (apologies for that), but for the majority of > the > time it has caught issues before they made it to the tree - so it's > accomplishing the initial goal just fine. > > I see lots of ways it can improve. Currently, the bot runs on a light > system. It > takes ~20 minutes to complete a set of tests, including all the checkpatch > and rebuild runs. That's not a big issue. BUT, it does mean that the machine > isn't able to perform all the kinds of regression tests that we would want. I > want to improve this in a way that various contributors can bring their own > hardware and regression tests to the party. In that way, various projects can > detect potential issues before they would ever land on the tree and it could > flag functional changes earlier in the process. > First of all - lots of thanks for this great work. A few questions/comments: 1. Are the tests mentioned above considered core/sanity tests to make sure the basic functionality is not broken? 2. Is there a link to the tests which are executed? How can they be reviewed? 3. Is there a link to the tests results? How can they be viewed? 4. Is the test environment documented? I think it would be beneficial if in parallel to the 0-day robot each vendor would be able to build the same environment locally in order to test his patches before sending them. 5. I am interested in having Mellanox NICs taking part of these tests. We will have some internal discussions regarding this, then I will be more specific. > I'm not sure the best way to do that. One thing I'll be doing is updating the > bot to push a series that successfully builds and passes checkpatch to a > special branch on a github repository to kick off travis builds. That will > give > us a more complete regression coverage, and we could be confident that a > series won't break something major. I suggest to tag the daily regression series and to have public access to it. In case anything is broken we should get an email notifying on this and be able to bisect the tree (based on tag) to find which commit is causing issues. It is even better to have the bot doing the bisect. > After that, I'm not sure how to notify > various alternate test infrastructures how to kick off their own tests using > the patched sources. > > My goal is to get really early feedback on patch series. I've sent this out > to > the folks I know are involved in testing and test discussions in the hopes > that > we can talk about how best to get more CI happening. The open questions: > > 1. How can we notify various downstream consumers of OvS of these >0-day builds? Should we just rely on people rolling their own? >Should there be a more formalized framework? How will these other >test frameworks report any kind of failures? > > 2. What kinds of additional testing do we want to see the robot include? >Should the test results be made available in general on some kind of >public facing site? Should it just stay as a "bleep bloop - >failure!" marker? > > 3. What other concerns should be addressed? I am looking forward to start running even with just basic tests to see how this whole framework works, then improving along the way. Can you please make sure to add the dpdk-latest and dpdk-hwol branches to the bot tests in addition to the master branch? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing
On 9/11/2018 4:51 PM, Aaron Conole wrote: "Eelco Chaudron" writes: On 6 Sep 2018, at 10:56, Aaron Conole wrote: As of June, the 0-day robot has tested over 450 patch series. Occasionally it spams the list (apologies for that), but for the majority of the time it has caught issues before they made it to the tree - so it's accomplishing the initial goal just fine. I see lots of ways it can improve. Currently, the bot runs on a light system. It takes ~20 minutes to complete a set of tests, including all the checkpatch and rebuild runs. That's not a big issue. BUT, it does mean that the machine isn't able to perform all the kinds of regression tests that we would want. I want to improve this in a way that various contributors can bring their own hardware and regression tests to the party. In that way, various projects can detect potential issues before they would ever land on the tree and it could flag functional changes earlier in the process. I'm not sure the best way to do that. One thing I'll be doing is updating the bot to push a series that successfully builds and passes checkpatch to a special branch on a github repository to kick off travis builds. That will give us a more complete regression coverage, and we could be confident that a series won't break something major. After that, I'm not sure how to notify various alternate test infrastructures how to kick off their own tests using the patched sources. My goal is to get really early feedback on patch series. I've sent this out to the folks I know are involved in testing and test discussions in the hopes that we can talk about how best to get more CI happening. The open questions: 1. How can we notify various downstream consumers of OvS of these 0-day builds? Should we just rely on people rolling their own? Should there be a more formalized framework? How will these other test frameworks report any kind of failures? 2. What kinds of additional testing do we want to see the robot include? First of all thanks for the 0-day robot, I really like the idea… +1, great work on this. One thing I feel would really benefit is some basic performance testing, like a PVP test for the kernel/dpdk datapath. This will help easily identifying performance impacting patches as they happen… Rather than people figuring out after a release why their performance has dropped. To date I've been using vsperf to conduct p2p, pvp, pvvp, vxlan tests etc. The framework for a lot of these are already in place. It supports a number of traffic gens also such as t-rex, moongen etc. as well as the commercial usual suspects. The vsperf CI also published a large number of tests with both OVS DPDK and OVS kernel. Not sure if it is still running however, I'll look into it as the graphs in the link below seem out of date. https://wiki.opnfv.org/display/vsperf/VSPERF+CI+Results#VSPERFCIResults-OVSwithDPDK Currently it uses DPDK 17.08 and OVS 2.9 by default but I have it working with DPDK 17.11 and OVS master on my own system easily enough. Yes - I hope to pull in the work you've done for ovs_perf to have some kind of baselines. For this to make sense, I think it also needs to have a bunch of hardware that we can benchmark (hint hint to some of the folks in the CC list :). Not for absolute numbers, but at least to detect significant changes. Working on it :). It leads to another discussion though, if we have hardware ready to ship then where should it go? Where's the best place to host and maintain the CI system. I'm also not sure how to measure a 'problem.' Do we run a test pre-series, and then run it post-series? In that case, we could slowly degrade performance over time without any noticing. Do we take it from the previous release, and compare? Might make more sense, but I don't know if it has other problems associated. What are the thresholds we use for saying something is a regression? How do we report it to developers? It's a good point, we typically run perf tests nightly in order to gauge any degradation on OVS master. Possibly this could help in comparison as long as the HW is the same. I would be anxious not to overburden the robot test system from the get go however. It's primary purpose initially would be to provide feedback on patch series so I'd like to avoid having it tied up with performance checking what has been upstreamed already. In this case maybe it would make sense to run a baseline performance test once a week and use this against incoming patch series for comparison? Ian Should the test results be made available in general on some kind of public facing site? Should it just stay as a "bleep bloop - failure!" marker? 3. What other concerns should be addressed? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list
Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing
"Eelco Chaudron" writes: > On 6 Sep 2018, at 10:56, Aaron Conole wrote: > >> As of June, the 0-day robot has tested over 450 patch series. >> Occasionally it spams the list (apologies for that), but for the >> majority of the time it has caught issues before they made it to the >> tree - so it's accomplishing the initial goal just fine. >> >> I see lots of ways it can improve. Currently, the bot runs on a light >> system. It takes ~20 minutes to complete a set of tests, including >> all >> the checkpatch and rebuild runs. That's not a big issue. BUT, it >> does >> mean that the machine isn't able to perform all the kinds of >> regression >> tests that we would want. I want to improve this in a way that >> various >> contributors can bring their own hardware and regression tests to the >> party. In that way, various projects can detect potential issues >> before >> they would ever land on the tree and it could flag functional changes >> earlier in the process. >> >> I'm not sure the best way to do that. One thing I'll be doing is >> updating the bot to push a series that successfully builds and passes >> checkpatch to a special branch on a github repository to kick off >> travis >> builds. That will give us a more complete regression coverage, and we >> could be confident that a series won't break something major. After >> that, I'm not sure how to notify various alternate test >> infrastructures >> how to kick off their own tests using the patched sources. >> >> My goal is to get really early feedback on patch series. I've sent >> this >> out to the folks I know are involved in testing and test discussions >> in >> the hopes that we can talk about how best to get more CI happening. >> The >> open questions: >> >> 1. How can we notify various downstream consumers of OvS of these >>0-day builds? Should we just rely on people rolling their own? >>Should there be a more formalized framework? How will these other >>test frameworks report any kind of failures? >> >> 2. What kinds of additional testing do we want to see the robot >> include? > > First of all thanks for the 0-day robot, I really like the idea… > > One thing I feel would really benefit is some basic performance > testing, like a PVP test for the kernel/dpdk datapath. This will help > easily identifying performance impacting patches as they happen… > Rather than people figuring out after a release why their performance > has dropped. Yes - I hope to pull in the work you've done for ovs_perf to have some kind of baselines. For this to make sense, I think it also needs to have a bunch of hardware that we can benchmark (hint hint to some of the folks in the CC list :). Not for absolute numbers, but at least to detect significant changes. I'm also not sure how to measure a 'problem.' Do we run a test pre-series, and then run it post-series? In that case, we could slowly degrade performance over time without any noticing. Do we take it from the previous release, and compare? Might make more sense, but I don't know if it has other problems associated. What are the thresholds we use for saying something is a regression? How do we report it to developers? >>Should the test results be made available in general on some kind >> of >>public facing site? Should it just stay as a "bleep bloop - >>failure!" marker? >> >> 3. What other concerns should be addressed? >> ___ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing
Ben Pfaff writes: > On Thu, Sep 06, 2018 at 04:56:18AM -0400, Aaron Conole wrote: >> As of June, the 0-day robot has tested over 450 patch series. >> Occasionally it spams the list (apologies for that), but for the >> majority of the time it has caught issues before they made it to the >> tree - so it's accomplishing the initial goal just fine. >> >> I see lots of ways it can improve. Currently, the bot runs on a light >> system. It takes ~20 minutes to complete a set of tests, including all >> the checkpatch and rebuild runs. That's not a big issue. BUT, it does >> mean that the machine isn't able to perform all the kinds of regression >> tests that we would want. I want to improve this in a way that various >> contributors can bring their own hardware and regression tests to the >> party. In that way, various projects can detect potential issues before >> they would ever land on the tree and it could flag functional changes >> earlier in the process. >> >> I'm not sure the best way to do that. One thing I'll be doing is >> updating the bot to push a series that successfully builds and passes >> checkpatch to a special branch on a github repository to kick off travis >> builds. That will give us a more complete regression coverage, and we >> could be confident that a series won't break something major. After >> that, I'm not sure how to notify various alternate test infrastructures >> how to kick off their own tests using the patched sources. > > That's pretty exciting. > > Don't forget about appveyor, either. Hardly any of us builds on > Windows, so appveyor is likely to catch things that we won't. :) I did forget it, but it's true. I'm working on some scripts to poll the status. That way the bot can bundle up emails together on the series. >> My goal is to get really early feedback on patch series. I've sent this >> out to the folks I know are involved in testing and test discussions in >> the hopes that we can talk about how best to get more CI happening. The >> open questions: >> >> 1. How can we notify various downstream consumers of OvS of these >>0-day builds? Should we just rely on people rolling their own? >>Should there be a more formalized framework? How will these other >>test frameworks report any kind of failures? > > Do you mean notify of successes or failures? I assumed that the robot's > email would notify us of that. I will keep the emails. I was thinking of some kind of public dashboard, or even just using the patchwork 'checks' API to report status of various tests that the robot runs. > Do you mean actually provide the builds? I don't know a good way to do > that. I didn't know if anyone would find it useful to have something like a PPA / COPR or other kind of repo available. That way, they can just update their package manager configuration to point at the appropriate place and install a pre-applied series. But I guess apart from .deb,.rpm for the various distros that are in-tree, it's difficult to provide something. Maybe a tarball of the sources with the series applied, and a tarball of the binaries that were spit out (but the configurations can be quite varied, so it probably wouldn't make sense). >> 2. What kinds of additional testing do we want to see the robot include? >>Should the test results be made available in general on some kind of >>public facing site? Should it just stay as a "bleep bloop - >>failure!" marker? > > It would be super awesome if we could run the various additional > testsuites that we have: check-system-userspace, check-kernel, etc. We > can't run them easily on travis because they require superuser (and bugs > sometimes crash the system). I agree. I'm hoping to take advantage of the poc sub-system to do various builds to have a superuser environment available that's insulated from the host system. >> 3. What other concerns should be addressed? Thanks for the input, Ben! ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing
On Thu, Sep 06, 2018 at 04:56:18AM -0400, Aaron Conole wrote: > As of June, the 0-day robot has tested over 450 patch series. > Occasionally it spams the list (apologies for that), but for the > majority of the time it has caught issues before they made it to the > tree - so it's accomplishing the initial goal just fine. > > I see lots of ways it can improve. Currently, the bot runs on a light > system. It takes ~20 minutes to complete a set of tests, including all > the checkpatch and rebuild runs. That's not a big issue. BUT, it does > mean that the machine isn't able to perform all the kinds of regression > tests that we would want. I want to improve this in a way that various > contributors can bring their own hardware and regression tests to the > party. In that way, various projects can detect potential issues before > they would ever land on the tree and it could flag functional changes > earlier in the process. > > I'm not sure the best way to do that. One thing I'll be doing is > updating the bot to push a series that successfully builds and passes > checkpatch to a special branch on a github repository to kick off travis > builds. That will give us a more complete regression coverage, and we > could be confident that a series won't break something major. After > that, I'm not sure how to notify various alternate test infrastructures > how to kick off their own tests using the patched sources. That's pretty exciting. Don't forget about appveyor, either. Hardly any of us builds on Windows, so appveyor is likely to catch things that we won't. > My goal is to get really early feedback on patch series. I've sent this > out to the folks I know are involved in testing and test discussions in > the hopes that we can talk about how best to get more CI happening. The > open questions: > > 1. How can we notify various downstream consumers of OvS of these >0-day builds? Should we just rely on people rolling their own? >Should there be a more formalized framework? How will these other >test frameworks report any kind of failures? Do you mean notify of successes or failures? I assumed that the robot's email would notify us of that. Do you mean actually provide the builds? I don't know a good way to do that. > 2. What kinds of additional testing do we want to see the robot include? >Should the test results be made available in general on some kind of >public facing site? Should it just stay as a "bleep bloop - >failure!" marker? It would be super awesome if we could run the various additional testsuites that we have: check-system-userspace, check-kernel, etc. We can't run them easily on travis because they require superuser (and bugs sometimes crash the system). > 3. What other concerns should be addressed? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing
On 6 Sep 2018, at 10:56, Aaron Conole wrote: As of June, the 0-day robot has tested over 450 patch series. Occasionally it spams the list (apologies for that), but for the majority of the time it has caught issues before they made it to the tree - so it's accomplishing the initial goal just fine. I see lots of ways it can improve. Currently, the bot runs on a light system. It takes ~20 minutes to complete a set of tests, including all the checkpatch and rebuild runs. That's not a big issue. BUT, it does mean that the machine isn't able to perform all the kinds of regression tests that we would want. I want to improve this in a way that various contributors can bring their own hardware and regression tests to the party. In that way, various projects can detect potential issues before they would ever land on the tree and it could flag functional changes earlier in the process. I'm not sure the best way to do that. One thing I'll be doing is updating the bot to push a series that successfully builds and passes checkpatch to a special branch on a github repository to kick off travis builds. That will give us a more complete regression coverage, and we could be confident that a series won't break something major. After that, I'm not sure how to notify various alternate test infrastructures how to kick off their own tests using the patched sources. My goal is to get really early feedback on patch series. I've sent this out to the folks I know are involved in testing and test discussions in the hopes that we can talk about how best to get more CI happening. The open questions: 1. How can we notify various downstream consumers of OvS of these 0-day builds? Should we just rely on people rolling their own? Should there be a more formalized framework? How will these other test frameworks report any kind of failures? 2. What kinds of additional testing do we want to see the robot include? First of all thanks for the 0-day robot, I really like the idea… One thing I feel would really benefit is some basic performance testing, like a PVP test for the kernel/dpdk datapath. This will help easily identifying performance impacting patches as they happen… Rather than people figuring out after a release why their performance has dropped. Should the test results be made available in general on some kind of public facing site? Should it just stay as a "bleep bloop - failure!" marker? 3. What other concerns should be addressed? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [RFC] Federating the 0-day robot, and improving the testing
As of June, the 0-day robot has tested over 450 patch series. Occasionally it spams the list (apologies for that), but for the majority of the time it has caught issues before they made it to the tree - so it's accomplishing the initial goal just fine. I see lots of ways it can improve. Currently, the bot runs on a light system. It takes ~20 minutes to complete a set of tests, including all the checkpatch and rebuild runs. That's not a big issue. BUT, it does mean that the machine isn't able to perform all the kinds of regression tests that we would want. I want to improve this in a way that various contributors can bring their own hardware and regression tests to the party. In that way, various projects can detect potential issues before they would ever land on the tree and it could flag functional changes earlier in the process. I'm not sure the best way to do that. One thing I'll be doing is updating the bot to push a series that successfully builds and passes checkpatch to a special branch on a github repository to kick off travis builds. That will give us a more complete regression coverage, and we could be confident that a series won't break something major. After that, I'm not sure how to notify various alternate test infrastructures how to kick off their own tests using the patched sources. My goal is to get really early feedback on patch series. I've sent this out to the folks I know are involved in testing and test discussions in the hopes that we can talk about how best to get more CI happening. The open questions: 1. How can we notify various downstream consumers of OvS of these 0-day builds? Should we just rely on people rolling their own? Should there be a more formalized framework? How will these other test frameworks report any kind of failures? 2. What kinds of additional testing do we want to see the robot include? Should the test results be made available in general on some kind of public facing site? Should it just stay as a "bleep bloop - failure!" marker? 3. What other concerns should be addressed? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev