Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing

2018-12-04 Thread Aaron Conole
Ben Pfaff  writes:

> On Thu, Sep 06, 2018 at 04:56:18AM -0400, Aaron Conole wrote:
>> As of June, the 0-day robot has tested over 450 patch series.
>> Occasionally it spams the list (apologies for that), but for the
>> majority of the time it has caught issues before they made it to the
>> tree - so it's accomplishing the initial goal just fine.
>> 
>> I see lots of ways it can improve.  Currently, the bot runs on a light
>> system.  It takes ~20 minutes to complete a set of tests, including all
>> the checkpatch and rebuild runs.  That's not a big issue.  BUT, it does
>> mean that the machine isn't able to perform all the kinds of regression
>> tests that we would want.  I want to improve this in a way that various
>> contributors can bring their own hardware and regression tests to the
>> party.  In that way, various projects can detect potential issues before
>> they would ever land on the tree and it could flag functional changes
>> earlier in the process.
>> 
>> I'm not sure the best way to do that.  One thing I'll be doing is
>> updating the bot to push a series that successfully builds and passes
>> checkpatch to a special branch on a github repository to kick off travis
>> builds.  That will give us a more complete regression coverage, and we
>> could be confident that a series won't break something major.  After
>> that, I'm not sure how to notify various alternate test infrastructures
>> how to kick off their own tests using the patched sources.
>
> That's pretty exciting.
>
> Don't forget about appveyor, either.  Hardly any of us builds on
> Windows, so appveyor is likely to catch things that we won't.

I haven't forgotten about this.

Unveiling (after some quick testing... hopefully it continues to work):

   https://github.com/ovsrobot/ovs
   https://travis-ci.org/ovsrobot/ovs
   https://ci.appveyor.com/project/ovsrobot/ovs

I don't know if the appveyor stuff is set up correctly, so if anyone who
is more knowledgeable in that area wants to assist contact me.

I only have one patch from one series built.  BUT - every series now is
submitted to a github branch, and that kicks off Travis (and hopefully
appveyor).  They can be found by their series id in patchwork:

   series_SERIES_ID

I don't have any mechanism to clean them up.  Presumably I'll need to
figure out how to do that at some point... :)

>> My goal is to get really early feedback on patch series.  I've sent this
>> out to the folks I know are involved in testing and test discussions in
>> the hopes that we can talk about how best to get more CI happening.  The
>> open questions:
>> 
>> 1. How can we notify various downstream consumers of OvS of these
>>0-day builds?  Should we just rely on people rolling their own?
>>Should there be a more formalized framework?  How will these other
>>test frameworks report any kind of failures?
>
> Do you mean notify of successes or failures?  I assumed that the robot's
> email would notify us of that.
>
> Do you mean actually provide the builds?  I don't know a good way to do
> that.
>
>> 2. What kinds of additional testing do we want to see the robot include?
>>Should the test results be made available in general on some kind of
>>public facing site?  Should it just stay as a "bleep bloop -
>>failure!" marker?
>
> It would be super awesome if we could run the various additional
> testsuites that we have: check-system-userspace, check-kernel, etc.  We
> can't run them easily on travis because they require superuser (and bugs
> sometimes crash the system).
>
>> 3. What other concerns should be addressed?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing

2018-09-12 Thread Aaron Conole
"Eelco Chaudron"  writes:

> On 11 Sep 2018, at 17:51, Aaron Conole wrote:
>
>> "Eelco Chaudron"  writes:
>>
>>> On 6 Sep 2018, at 10:56, Aaron Conole wrote:
>>>
 As of June, the 0-day robot has tested over 450 patch series.
 Occasionally it spams the list (apologies for that), but for the
 majority of the time it has caught issues before they made it to the
 tree - so it's accomplishing the initial goal just fine.

 I see lots of ways it can improve.  Currently, the bot runs on a
 light
 system.  It takes ~20 minutes to complete a set of tests, including
 all
 the checkpatch and rebuild runs.  That's not a big issue.  BUT, it
 does
 mean that the machine isn't able to perform all the kinds of
 regression
 tests that we would want.  I want to improve this in a way that
 various
 contributors can bring their own hardware and regression tests to
 the
 party.  In that way, various projects can detect potential issues
 before
 they would ever land on the tree and it could flag functional
 changes
 earlier in the process.

 I'm not sure the best way to do that.  One thing I'll be doing is
 updating the bot to push a series that successfully builds and
 passes
 checkpatch to a special branch on a github repository to kick off
 travis
 builds.  That will give us a more complete regression coverage,
 and we
 could be confident that a series won't break something major.  After
 that, I'm not sure how to notify various alternate test
 infrastructures
 how to kick off their own tests using the patched sources.

 My goal is to get really early feedback on patch series.  I've sent
 this
 out to the folks I know are involved in testing and test discussions
 in
 the hopes that we can talk about how best to get more CI happening.
 The
 open questions:

 1. How can we notify various downstream consumers of OvS of these
0-day builds?  Should we just rely on people rolling their own?
Should there be a more formalized framework?  How will these
 other
test frameworks report any kind of failures?

 2. What kinds of additional testing do we want to see the robot
 include?
>>>
>>> First of all thanks for the 0-day robot, I really like the idea…
>>>
>>> One thing I feel would really benefit is some basic performance
>>> testing, like a PVP test for the kernel/dpdk datapath. This will help
>>> easily identifying performance impacting patches as they happen…
>>> Rather than people figuring out after a release why their performance
>>> has dropped.
>>
>> Yes - I hope to pull in the work you've done for ovs_perf to have some
>> kind of baselines.
>>
>> For this to make sense, I think it also needs to have a bunch of
>> hardware that we can benchmark (hint hint to some of the folks in
>> the CC
>> list :).  Not for absolute numbers, but at least to detect significant
>> changes.
>>
>> I'm also not sure how to measure a 'problem.'  Do we run a test
>> pre-series, and then run it post-series?  In that case, we could
>> slowly
>> degrade performance over time without any noticing.  Do we take it
>> from
>> the previous release, and compare?  Might make more sense, but I don't
>> know if it has other problems associated.  What are the thresholds we
>> use for saying something is a regression?  How do we report it to
>> developers?
>
> Guess both in an ideal world, and maybe add a weekly baseline for
> master :)
>
> Having a graph of this would be really nice. However, this might be a
> whole project on itself, i.e. performance runs on all commits to
> master…

I consider that a useful thing to have (a dashboard with information on
series tested, etc).  But I agree, it's another whole project.

Should the test results be made available in general on some kind
 of
public facing site?  Should it just stay as a "bleep bloop -
failure!" marker?

 3. What other concerns should be addressed?
 ___
 dev mailing list
 d...@openvswitch.org
 https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing

2018-09-12 Thread Aaron Conole
Ian Stokes  writes:

> On 9/11/2018 4:51 PM, Aaron Conole wrote:
>> "Eelco Chaudron"  writes:
>>
>>> On 6 Sep 2018, at 10:56, Aaron Conole wrote:
>>>
 As of June, the 0-day robot has tested over 450 patch series.
 Occasionally it spams the list (apologies for that), but for the
 majority of the time it has caught issues before they made it to the
 tree - so it's accomplishing the initial goal just fine.

 I see lots of ways it can improve.  Currently, the bot runs on a light
 system.  It takes ~20 minutes to complete a set of tests, including
 all
 the checkpatch and rebuild runs.  That's not a big issue.  BUT, it
 does
 mean that the machine isn't able to perform all the kinds of
 regression
 tests that we would want.  I want to improve this in a way that
 various
 contributors can bring their own hardware and regression tests to the
 party.  In that way, various projects can detect potential issues
 before
 they would ever land on the tree and it could flag functional changes
 earlier in the process.

 I'm not sure the best way to do that.  One thing I'll be doing is
 updating the bot to push a series that successfully builds and passes
 checkpatch to a special branch on a github repository to kick off
 travis
 builds.  That will give us a more complete regression coverage, and we
 could be confident that a series won't break something major.  After
 that, I'm not sure how to notify various alternate test
 infrastructures
 how to kick off their own tests using the patched sources.

 My goal is to get really early feedback on patch series.  I've sent
 this
 out to the folks I know are involved in testing and test discussions
 in
 the hopes that we can talk about how best to get more CI happening.
 The
 open questions:

 1. How can we notify various downstream consumers of OvS of these
 0-day builds?  Should we just rely on people rolling their own?
 Should there be a more formalized framework?  How will these other
 test frameworks report any kind of failures?

 2. What kinds of additional testing do we want to see the robot
 include?
>>>
>>> First of all thanks for the 0-day robot, I really like the idea…
> +1, great work on this.
>
>>>
>>> One thing I feel would really benefit is some basic performance
>>> testing, like a PVP test for the kernel/dpdk datapath. This will help
>>> easily identifying performance impacting patches as they happen…
>>> Rather than people figuring out after a release why their performance
>>> has dropped.
>>
>
> To date I've been using vsperf to conduct p2p, pvp, pvvp, vxlan tests
> etc. The framework for a lot of these are already in place. It
> supports a number of traffic gens also such as t-rex, moongen etc. as
> well as the commercial usual suspects.

I think we also have some VSPerf tests, as well.  I'm happy to use
whatever :)

> The vsperf CI also published a large number of tests with both OVS
> DPDK and OVS kernel. Not sure if it is still running however, I'll
> look into it as the graphs in the link below seem out of date.
>
> https://wiki.opnfv.org/display/vsperf/VSPERF+CI+Results#VSPERFCIResults-OVSwithDPDK
>
> Currently it uses DPDK 17.08 and OVS 2.9 by default but I have it
> working with DPDK 17.11 and OVS master on my own system easily enough.

Awesome.  I'll pull a Bane:

"Your precious test suite, gratefully accepted!"

>> Yes - I hope to pull in the work you've done for ovs_perf to have some
>> kind of baselines.
>>
>> For this to make sense, I think it also needs to have a bunch of
>> hardware that we can benchmark (hint hint to some of the folks in the CC
>> list :).  Not for absolute numbers, but at least to detect significant
>> changes.
>
> Working on it :). It leads to another discussion though, if we have
> hardware ready to ship then where should it go? Where's the best place
> to host and maintain the CI system.
>
>>
>> I'm also not sure how to measure a 'problem.'  Do we run a test
>> pre-series, and then run it post-series?  In that case, we could slowly
>> degrade performance over time without any noticing.  Do we take it from
>> the previous release, and compare?  Might make more sense, but I don't
>> know if it has other problems associated.  What are the thresholds we
>> use for saying something is a regression?  How do we report it to
>> developers?
>
> It's a good point, we typically run perf tests nightly in order to
> gauge any degradation on OVS master. Possibly this could help in
> comparison as long as the HW is the same. I would be anxious not to
> overburden the robot test system from the get go however. It's primary
> purpose initially would be to provide feedback on patch series so I'd
> like to avoid having it tied up with performance checking what has
> been upstreamed already.

I agree with the burdening part.  I think from our side we'd 

Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing

2018-09-12 Thread Eelco Chaudron



On 11 Sep 2018, at 17:51, Aaron Conole wrote:


"Eelco Chaudron"  writes:


On 6 Sep 2018, at 10:56, Aaron Conole wrote:


As of June, the 0-day robot has tested over 450 patch series.
Occasionally it spams the list (apologies for that), but for the
majority of the time it has caught issues before they made it to the
tree - so it's accomplishing the initial goal just fine.

I see lots of ways it can improve.  Currently, the bot runs on a 
light

system.  It takes ~20 minutes to complete a set of tests, including
all
the checkpatch and rebuild runs.  That's not a big issue.  BUT, it
does
mean that the machine isn't able to perform all the kinds of
regression
tests that we would want.  I want to improve this in a way that
various
contributors can bring their own hardware and regression tests to 
the

party.  In that way, various projects can detect potential issues
before
they would ever land on the tree and it could flag functional 
changes

earlier in the process.

I'm not sure the best way to do that.  One thing I'll be doing is
updating the bot to push a series that successfully builds and 
passes

checkpatch to a special branch on a github repository to kick off
travis
builds.  That will give us a more complete regression coverage, and 
we

could be confident that a series won't break something major.  After
that, I'm not sure how to notify various alternate test
infrastructures
how to kick off their own tests using the patched sources.

My goal is to get really early feedback on patch series.  I've sent
this
out to the folks I know are involved in testing and test discussions
in
the hopes that we can talk about how best to get more CI happening.
The
open questions:

1. How can we notify various downstream consumers of OvS of these
   0-day builds?  Should we just rely on people rolling their own?
   Should there be a more formalized framework?  How will these 
other

   test frameworks report any kind of failures?

2. What kinds of additional testing do we want to see the robot
include?


First of all thanks for the 0-day robot, I really like the idea…

One thing I feel would really benefit is some basic performance
testing, like a PVP test for the kernel/dpdk datapath. This will help
easily identifying performance impacting patches as they happen…
Rather than people figuring out after a release why their performance
has dropped.


Yes - I hope to pull in the work you've done for ovs_perf to have some
kind of baselines.

For this to make sense, I think it also needs to have a bunch of
hardware that we can benchmark (hint hint to some of the folks in the 
CC

list :).  Not for absolute numbers, but at least to detect significant
changes.

I'm also not sure how to measure a 'problem.'  Do we run a test
pre-series, and then run it post-series?  In that case, we could 
slowly
degrade performance over time without any noticing.  Do we take it 
from

the previous release, and compare?  Might make more sense, but I don't
know if it has other problems associated.  What are the thresholds we
use for saying something is a regression?  How do we report it to
developers?


Guess both in an ideal world, and maybe add a weekly baseline for master 
:)


Having a graph of this would be really nice. However, this might be a 
whole project on itself, i.e. performance runs on all commits to 
master…




   Should the test results be made available in general on some kind
of
   public facing site?  Should it just stay as a "bleep bloop -
   failure!" marker?

3. What other concerns should be addressed?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing

2018-09-11 Thread Aaron Conole
Ophir Munk  writes:

>> -Original Message-
>> From: Aaron Conole [mailto:acon...@bytheb.org]
>> Sent: Thursday, September 06, 2018 11:56 AM
>> To: Ian Stokes ; Kevin Traynor
>> ; Ophir Munk ; Ferruh
>> Yigit ; Luca Boccassi ; Jeremy
>> Plsek ; Sugesh Chandran
>> ; Jean-Tsung Hsiao ;
>> Christian Trautman ; Ben Pfaff ;
>> Bala Sankaran 
>> Cc: d...@openvswitch.org
>> Subject: [RFC] Federating the 0-day robot, and improving the testing
>> 
>> As of June, the 0-day robot has tested over 450 patch series.
>> Occasionally it spams the list (apologies for that), but for the majority of 
>> the
>> time it has caught issues before they made it to the tree - so it's
>> accomplishing the initial goal just fine.
>> 
>> I see lots of ways it can improve.  Currently, the bot runs on a light 
>> system.  It
>> takes ~20 minutes to complete a set of tests, including all the checkpatch
>> and rebuild runs.  That's not a big issue.  BUT, it does mean that the 
>> machine
>> isn't able to perform all the kinds of regression tests that we would want.  
>> I
>> want to improve this in a way that various contributors can bring their own
>> hardware and regression tests to the party.  In that way, various projects 
>> can
>> detect potential issues before they would ever land on the tree and it could
>> flag functional changes earlier in the process.
>> 
>
> First of all - lots of thanks for this great work. 
> A few questions/comments:
> 1. Are the tests mentioned above considered core/sanity tests to make
> sure the basic functionality is not broken?

Yes - actually, I haven't re-enabled reporting the make check, so it's
basically:

1. git am

2. checkpatch

3. make

If any of those fails, they get reported.  Future work, we'll re-enable
reporting the other checks.

> 2. Is there a link to the tests which are executed? How can they be reviewed?

Documentation/topics/testing.rst covers the high level overview
(including the testsuites run by doing make
 check{-dpdk,-kernel,-kmod,-system-userspace,-ryu,-oftest,-valgrind,-lcov,})

The various tests are primarily wired up through m4, although they can
be written in any language provided there's a binary to execute.

> 3. Is there a link to the tests results? How can they be viewed?

For the bot, right now, there isn't a link.  I think a dashboard
functionality is probably worthwhile to write.

> 4. Is the test environment documented? I think it would be beneficial
> if in parallel to the 0-day robot each vendor would be able to build
> the same environment locally in order to test his patches before
> sending them.

Yes and no.  For example, the exact steps the bot takes are all
documented at:

https://github.com/orgcandman/pw-ci/blob/master/3rd-party/openvswitch/config.xml

But, as I wrote above, we just report failures from the steps above.

> 5. I am interested in having Mellanox NICs taking part of these
> tests. We will have some internal discussions regarding this, then I
> will be more specific.

Awesome!  Look forward to hearing more.

>> I'm not sure the best way to do that.  One thing I'll be doing is updating 
>> the
>> bot to push a series that successfully builds and passes checkpatch to a
>> special branch on a github repository to kick off travis builds.  That will 
>> give
>> us a more complete regression coverage, and we could be confident that a
>> series won't break something major.  
>
> I suggest to tag the daily regression series and to have public access to it.
> In case anything is broken we should get an email notifying on this
> and be able to bisect the tree (based on tag) to find which commit is
> causing issues. It is even better to have the bot doing the bisect.

Not sure what it means.  I don't think there should be anything to
bisect yet - but that's probably because I'm focused on the submission
side of testing.  Of course, a future effort would be some kind of full
regression.  I guess that's what you're referring to here.

>> After that, I'm not sure how to notify
>> various alternate test infrastructures how to kick off their own tests using
>> the patched sources.
>> 
>> My goal is to get really early feedback on patch series.  I've sent this out 
>> to
>> the folks I know are involved in testing and test discussions in the hopes 
>> that
>> we can talk about how best to get more CI happening.  The open questions:
>> 
>> 1. How can we notify various downstream consumers of OvS of these
>>0-day builds?  Should we just rely on people rolling their own?
>>Should there be a more formalized framework?  How will these other
>>test frameworks report any kind of failures?
>> 
>> 2. What kinds of additional testing do we want to see the robot include?
>>Should the test results be made available in general on some kind of
>>public facing site?  Should it just stay as a "bleep bloop -
>>failure!" marker?
>> 
>> 3. What other concerns should be addressed?
>
> I am looking forward to start running even with just basic tests 

Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing

2018-09-11 Thread Ophir Munk



> -Original Message-
> From: Aaron Conole [mailto:acon...@bytheb.org]
> Sent: Thursday, September 06, 2018 11:56 AM
> To: Ian Stokes ; Kevin Traynor
> ; Ophir Munk ; Ferruh
> Yigit ; Luca Boccassi ; Jeremy
> Plsek ; Sugesh Chandran
> ; Jean-Tsung Hsiao ;
> Christian Trautman ; Ben Pfaff ;
> Bala Sankaran 
> Cc: d...@openvswitch.org
> Subject: [RFC] Federating the 0-day robot, and improving the testing
> 
> As of June, the 0-day robot has tested over 450 patch series.
> Occasionally it spams the list (apologies for that), but for the majority of 
> the
> time it has caught issues before they made it to the tree - so it's
> accomplishing the initial goal just fine.
> 
> I see lots of ways it can improve.  Currently, the bot runs on a light 
> system.  It
> takes ~20 minutes to complete a set of tests, including all the checkpatch
> and rebuild runs.  That's not a big issue.  BUT, it does mean that the machine
> isn't able to perform all the kinds of regression tests that we would want.  I
> want to improve this in a way that various contributors can bring their own
> hardware and regression tests to the party.  In that way, various projects can
> detect potential issues before they would ever land on the tree and it could
> flag functional changes earlier in the process.
> 

First of all - lots of thanks for this great work. 
A few questions/comments:
1. Are the tests mentioned above considered core/sanity tests to make sure the 
basic functionality is not broken?
2. Is there a link to the tests which are executed? How can they be reviewed?
3. Is there a link to the tests results? How can they be viewed?
4. Is the test environment documented? I think it would be beneficial if in 
parallel to the 0-day robot each vendor would be able to build the same 
environment locally in order to test his patches before sending them.
5. I am interested in having Mellanox NICs taking part of these tests. We will 
have some internal discussions regarding this, then I will be more specific.

> I'm not sure the best way to do that.  One thing I'll be doing is updating the
> bot to push a series that successfully builds and passes checkpatch to a
> special branch on a github repository to kick off travis builds.  That will 
> give
> us a more complete regression coverage, and we could be confident that a
> series won't break something major.  

I suggest to tag the daily regression series and to have public access to it.
In case anything is broken we should get an email notifying on this and be able 
to bisect the tree (based on tag) to find which commit is causing issues. It is 
even better to have the bot doing the bisect.

> After that, I'm not sure how to notify
> various alternate test infrastructures how to kick off their own tests using
> the patched sources.
> 
> My goal is to get really early feedback on patch series.  I've sent this out 
> to
> the folks I know are involved in testing and test discussions in the hopes 
> that
> we can talk about how best to get more CI happening.  The open questions:
> 
> 1. How can we notify various downstream consumers of OvS of these
>0-day builds?  Should we just rely on people rolling their own?
>Should there be a more formalized framework?  How will these other
>test frameworks report any kind of failures?
> 
> 2. What kinds of additional testing do we want to see the robot include?
>Should the test results be made available in general on some kind of
>public facing site?  Should it just stay as a "bleep bloop -
>failure!" marker?
> 
> 3. What other concerns should be addressed?

I am looking forward to start running even with just basic tests to see how 
this whole framework works, then improving along the way. Can you please make 
sure to add the dpdk-latest and dpdk-hwol branches to the bot tests in addition 
to the master branch?

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing

2018-09-11 Thread Ian Stokes

On 9/11/2018 4:51 PM, Aaron Conole wrote:

"Eelco Chaudron"  writes:


On 6 Sep 2018, at 10:56, Aaron Conole wrote:


As of June, the 0-day robot has tested over 450 patch series.
Occasionally it spams the list (apologies for that), but for the
majority of the time it has caught issues before they made it to the
tree - so it's accomplishing the initial goal just fine.

I see lots of ways it can improve.  Currently, the bot runs on a light
system.  It takes ~20 minutes to complete a set of tests, including
all
the checkpatch and rebuild runs.  That's not a big issue.  BUT, it
does
mean that the machine isn't able to perform all the kinds of
regression
tests that we would want.  I want to improve this in a way that
various
contributors can bring their own hardware and regression tests to the
party.  In that way, various projects can detect potential issues
before
they would ever land on the tree and it could flag functional changes
earlier in the process.

I'm not sure the best way to do that.  One thing I'll be doing is
updating the bot to push a series that successfully builds and passes
checkpatch to a special branch on a github repository to kick off
travis
builds.  That will give us a more complete regression coverage, and we
could be confident that a series won't break something major.  After
that, I'm not sure how to notify various alternate test
infrastructures
how to kick off their own tests using the patched sources.

My goal is to get really early feedback on patch series.  I've sent
this
out to the folks I know are involved in testing and test discussions
in
the hopes that we can talk about how best to get more CI happening.
The
open questions:

1. How can we notify various downstream consumers of OvS of these
0-day builds?  Should we just rely on people rolling their own?
Should there be a more formalized framework?  How will these other
test frameworks report any kind of failures?

2. What kinds of additional testing do we want to see the robot
include?


First of all thanks for the 0-day robot, I really like the idea…

+1, great work on this.



One thing I feel would really benefit is some basic performance
testing, like a PVP test for the kernel/dpdk datapath. This will help
easily identifying performance impacting patches as they happen…
Rather than people figuring out after a release why their performance
has dropped.




To date I've been using vsperf to conduct p2p, pvp, pvvp, vxlan tests 
etc. The framework for a lot of these are already in place. It supports 
a number of traffic gens also such as t-rex, moongen etc. as well as the 
commercial usual suspects.


The vsperf CI also published a large number of tests with both OVS DPDK 
and OVS kernel. Not sure if it is still running however, I'll look into 
it as the graphs in the link below seem out of date.


https://wiki.opnfv.org/display/vsperf/VSPERF+CI+Results#VSPERFCIResults-OVSwithDPDK

Currently it uses DPDK 17.08 and OVS 2.9 by default but I have it 
working with DPDK 17.11 and OVS master on my own system easily enough.



Yes - I hope to pull in the work you've done for ovs_perf to have some
kind of baselines.

For this to make sense, I think it also needs to have a bunch of
hardware that we can benchmark (hint hint to some of the folks in the CC
list :).  Not for absolute numbers, but at least to detect significant
changes.


Working on it :). It leads to another discussion though, if we have 
hardware ready to ship then where should it go? Where's the best place 
to host and maintain the CI system.




I'm also not sure how to measure a 'problem.'  Do we run a test
pre-series, and then run it post-series?  In that case, we could slowly
degrade performance over time without any noticing.  Do we take it from
the previous release, and compare?  Might make more sense, but I don't
know if it has other problems associated.  What are the thresholds we
use for saying something is a regression?  How do we report it to
developers?


It's a good point, we typically run perf tests nightly in order to gauge 
any degradation on OVS master. Possibly this could help in comparison as 
long as the HW is the same. I would be anxious not to overburden the 
robot test system from the get go however. It's primary purpose 
initially would be to provide feedback on patch series so I'd like to 
avoid having it tied up with performance checking what has been 
upstreamed already.


In this case maybe it would make sense to run a baseline performance 
test once a week and use this against incoming patch series for comparison?


Ian



Should the test results be made available in general on some kind
of
public facing site?  Should it just stay as a "bleep bloop -
failure!" marker?

3. What other concerns should be addressed?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


___
dev mailing list

Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing

2018-09-11 Thread Aaron Conole
"Eelco Chaudron"  writes:

> On 6 Sep 2018, at 10:56, Aaron Conole wrote:
>
>> As of June, the 0-day robot has tested over 450 patch series.
>> Occasionally it spams the list (apologies for that), but for the
>> majority of the time it has caught issues before they made it to the
>> tree - so it's accomplishing the initial goal just fine.
>>
>> I see lots of ways it can improve.  Currently, the bot runs on a light
>> system.  It takes ~20 minutes to complete a set of tests, including
>> all
>> the checkpatch and rebuild runs.  That's not a big issue.  BUT, it
>> does
>> mean that the machine isn't able to perform all the kinds of
>> regression
>> tests that we would want.  I want to improve this in a way that
>> various
>> contributors can bring their own hardware and regression tests to the
>> party.  In that way, various projects can detect potential issues
>> before
>> they would ever land on the tree and it could flag functional changes
>> earlier in the process.
>>
>> I'm not sure the best way to do that.  One thing I'll be doing is
>> updating the bot to push a series that successfully builds and passes
>> checkpatch to a special branch on a github repository to kick off
>> travis
>> builds.  That will give us a more complete regression coverage, and we
>> could be confident that a series won't break something major.  After
>> that, I'm not sure how to notify various alternate test
>> infrastructures
>> how to kick off their own tests using the patched sources.
>>
>> My goal is to get really early feedback on patch series.  I've sent
>> this
>> out to the folks I know are involved in testing and test discussions
>> in
>> the hopes that we can talk about how best to get more CI happening.
>> The
>> open questions:
>>
>> 1. How can we notify various downstream consumers of OvS of these
>>0-day builds?  Should we just rely on people rolling their own?
>>Should there be a more formalized framework?  How will these other
>>test frameworks report any kind of failures?
>>
>> 2. What kinds of additional testing do we want to see the robot
>> include?
>
> First of all thanks for the 0-day robot, I really like the idea…
>
> One thing I feel would really benefit is some basic performance
> testing, like a PVP test for the kernel/dpdk datapath. This will help
> easily identifying performance impacting patches as they happen…
> Rather than people figuring out after a release why their performance
> has dropped.

Yes - I hope to pull in the work you've done for ovs_perf to have some
kind of baselines.

For this to make sense, I think it also needs to have a bunch of
hardware that we can benchmark (hint hint to some of the folks in the CC
list :).  Not for absolute numbers, but at least to detect significant
changes.

I'm also not sure how to measure a 'problem.'  Do we run a test
pre-series, and then run it post-series?  In that case, we could slowly
degrade performance over time without any noticing.  Do we take it from
the previous release, and compare?  Might make more sense, but I don't
know if it has other problems associated.  What are the thresholds we
use for saying something is a regression?  How do we report it to
developers?

>>Should the test results be made available in general on some kind
>> of
>>public facing site?  Should it just stay as a "bleep bloop -
>>failure!" marker?
>>
>> 3. What other concerns should be addressed?
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing

2018-09-11 Thread Aaron Conole
Ben Pfaff  writes:

> On Thu, Sep 06, 2018 at 04:56:18AM -0400, Aaron Conole wrote:
>> As of June, the 0-day robot has tested over 450 patch series.
>> Occasionally it spams the list (apologies for that), but for the
>> majority of the time it has caught issues before they made it to the
>> tree - so it's accomplishing the initial goal just fine.
>> 
>> I see lots of ways it can improve.  Currently, the bot runs on a light
>> system.  It takes ~20 minutes to complete a set of tests, including all
>> the checkpatch and rebuild runs.  That's not a big issue.  BUT, it does
>> mean that the machine isn't able to perform all the kinds of regression
>> tests that we would want.  I want to improve this in a way that various
>> contributors can bring their own hardware and regression tests to the
>> party.  In that way, various projects can detect potential issues before
>> they would ever land on the tree and it could flag functional changes
>> earlier in the process.
>> 
>> I'm not sure the best way to do that.  One thing I'll be doing is
>> updating the bot to push a series that successfully builds and passes
>> checkpatch to a special branch on a github repository to kick off travis
>> builds.  That will give us a more complete regression coverage, and we
>> could be confident that a series won't break something major.  After
>> that, I'm not sure how to notify various alternate test infrastructures
>> how to kick off their own tests using the patched sources.
>
> That's pretty exciting.
>
> Don't forget about appveyor, either.  Hardly any of us builds on
> Windows, so appveyor is likely to catch things that we won't.

:)  I did forget it, but it's true.

I'm working on some scripts to poll the status.  That way the bot can
bundle up emails together on the series.

>> My goal is to get really early feedback on patch series.  I've sent this
>> out to the folks I know are involved in testing and test discussions in
>> the hopes that we can talk about how best to get more CI happening.  The
>> open questions:
>> 
>> 1. How can we notify various downstream consumers of OvS of these
>>0-day builds?  Should we just rely on people rolling their own?
>>Should there be a more formalized framework?  How will these other
>>test frameworks report any kind of failures?
>
> Do you mean notify of successes or failures?  I assumed that the robot's
> email would notify us of that.

I will keep the emails.  I was thinking of some kind of public
dashboard, or even just using the patchwork 'checks' API to report
status of various tests that the robot runs.

> Do you mean actually provide the builds?  I don't know a good way to do
> that.

I didn't know if anyone would find it useful to have something like a
PPA / COPR or other kind of repo available.  That way, they can just
update their package manager configuration to point at the appropriate
place and install a pre-applied series.

But I guess apart from .deb,.rpm for the various distros that are
in-tree, it's difficult to provide something.  Maybe a tarball of the
sources with the series applied, and a tarball of the binaries that were
spit out (but the configurations can be quite varied, so it probably
wouldn't make sense).

>> 2. What kinds of additional testing do we want to see the robot include?
>>Should the test results be made available in general on some kind of
>>public facing site?  Should it just stay as a "bleep bloop -
>>failure!" marker?
>
> It would be super awesome if we could run the various additional
> testsuites that we have: check-system-userspace, check-kernel, etc.  We
> can't run them easily on travis because they require superuser (and bugs
> sometimes crash the system).

I agree.  I'm hoping to take advantage of the poc sub-system to do
various builds to have a superuser environment available that's
insulated from the host system.

>> 3. What other concerns should be addressed?

Thanks for the input, Ben!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing

2018-09-07 Thread Ben Pfaff
On Thu, Sep 06, 2018 at 04:56:18AM -0400, Aaron Conole wrote:
> As of June, the 0-day robot has tested over 450 patch series.
> Occasionally it spams the list (apologies for that), but for the
> majority of the time it has caught issues before they made it to the
> tree - so it's accomplishing the initial goal just fine.
> 
> I see lots of ways it can improve.  Currently, the bot runs on a light
> system.  It takes ~20 minutes to complete a set of tests, including all
> the checkpatch and rebuild runs.  That's not a big issue.  BUT, it does
> mean that the machine isn't able to perform all the kinds of regression
> tests that we would want.  I want to improve this in a way that various
> contributors can bring their own hardware and regression tests to the
> party.  In that way, various projects can detect potential issues before
> they would ever land on the tree and it could flag functional changes
> earlier in the process.
> 
> I'm not sure the best way to do that.  One thing I'll be doing is
> updating the bot to push a series that successfully builds and passes
> checkpatch to a special branch on a github repository to kick off travis
> builds.  That will give us a more complete regression coverage, and we
> could be confident that a series won't break something major.  After
> that, I'm not sure how to notify various alternate test infrastructures
> how to kick off their own tests using the patched sources.

That's pretty exciting.

Don't forget about appveyor, either.  Hardly any of us builds on
Windows, so appveyor is likely to catch things that we won't.

> My goal is to get really early feedback on patch series.  I've sent this
> out to the folks I know are involved in testing and test discussions in
> the hopes that we can talk about how best to get more CI happening.  The
> open questions:
> 
> 1. How can we notify various downstream consumers of OvS of these
>0-day builds?  Should we just rely on people rolling their own?
>Should there be a more formalized framework?  How will these other
>test frameworks report any kind of failures?

Do you mean notify of successes or failures?  I assumed that the robot's
email would notify us of that.

Do you mean actually provide the builds?  I don't know a good way to do
that.

> 2. What kinds of additional testing do we want to see the robot include?
>Should the test results be made available in general on some kind of
>public facing site?  Should it just stay as a "bleep bloop -
>failure!" marker?

It would be super awesome if we could run the various additional
testsuites that we have: check-system-userspace, check-kernel, etc.  We
can't run them easily on travis because they require superuser (and bugs
sometimes crash the system).

> 3. What other concerns should be addressed?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] Federating the 0-day robot, and improving the testing

2018-09-06 Thread Eelco Chaudron



On 6 Sep 2018, at 10:56, Aaron Conole wrote:


As of June, the 0-day robot has tested over 450 patch series.
Occasionally it spams the list (apologies for that), but for the
majority of the time it has caught issues before they made it to the
tree - so it's accomplishing the initial goal just fine.

I see lots of ways it can improve.  Currently, the bot runs on a light
system.  It takes ~20 minutes to complete a set of tests, including 
all
the checkpatch and rebuild runs.  That's not a big issue.  BUT, it 
does
mean that the machine isn't able to perform all the kinds of 
regression
tests that we would want.  I want to improve this in a way that 
various

contributors can bring their own hardware and regression tests to the
party.  In that way, various projects can detect potential issues 
before

they would ever land on the tree and it could flag functional changes
earlier in the process.

I'm not sure the best way to do that.  One thing I'll be doing is
updating the bot to push a series that successfully builds and passes
checkpatch to a special branch on a github repository to kick off 
travis

builds.  That will give us a more complete regression coverage, and we
could be confident that a series won't break something major.  After
that, I'm not sure how to notify various alternate test 
infrastructures

how to kick off their own tests using the patched sources.

My goal is to get really early feedback on patch series.  I've sent 
this
out to the folks I know are involved in testing and test discussions 
in
the hopes that we can talk about how best to get more CI happening.  
The

open questions:

1. How can we notify various downstream consumers of OvS of these
   0-day builds?  Should we just rely on people rolling their own?
   Should there be a more formalized framework?  How will these other
   test frameworks report any kind of failures?

2. What kinds of additional testing do we want to see the robot 
include?


First of all thanks for the 0-day robot, I really like the idea…

One thing I feel would really benefit is some basic performance testing, 
like a PVP test for the kernel/dpdk datapath. This will help easily 
identifying performance impacting patches as they happen… Rather than 
people figuring out after a release why their performance has dropped.


   Should the test results be made available in general on some kind 
of

   public facing site?  Should it just stay as a "bleep bloop -
   failure!" marker?

3. What other concerns should be addressed?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC] Federating the 0-day robot, and improving the testing

2018-09-06 Thread Aaron Conole
As of June, the 0-day robot has tested over 450 patch series.
Occasionally it spams the list (apologies for that), but for the
majority of the time it has caught issues before they made it to the
tree - so it's accomplishing the initial goal just fine.

I see lots of ways it can improve.  Currently, the bot runs on a light
system.  It takes ~20 minutes to complete a set of tests, including all
the checkpatch and rebuild runs.  That's not a big issue.  BUT, it does
mean that the machine isn't able to perform all the kinds of regression
tests that we would want.  I want to improve this in a way that various
contributors can bring their own hardware and regression tests to the
party.  In that way, various projects can detect potential issues before
they would ever land on the tree and it could flag functional changes
earlier in the process.

I'm not sure the best way to do that.  One thing I'll be doing is
updating the bot to push a series that successfully builds and passes
checkpatch to a special branch on a github repository to kick off travis
builds.  That will give us a more complete regression coverage, and we
could be confident that a series won't break something major.  After
that, I'm not sure how to notify various alternate test infrastructures
how to kick off their own tests using the patched sources.

My goal is to get really early feedback on patch series.  I've sent this
out to the folks I know are involved in testing and test discussions in
the hopes that we can talk about how best to get more CI happening.  The
open questions:

1. How can we notify various downstream consumers of OvS of these
   0-day builds?  Should we just rely on people rolling their own?
   Should there be a more formalized framework?  How will these other
   test frameworks report any kind of failures?

2. What kinds of additional testing do we want to see the robot include?
   Should the test results be made available in general on some kind of
   public facing site?  Should it just stay as a "bleep bloop -
   failure!" marker?

3. What other concerns should be addressed?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev