Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
- Original Message - > From: "Pranith Kumar Karampuri" > To: "Emmanuel Dreyfus" , "Ravishankar N" > > Cc: "Gluster Devel" , "gluster-infra" > > Sent: Friday, January 8, 2016 11:45:20 AM > Subject: Re: [Gluster-devel] NetBSD tests not running to completion. > On 01/07/2016 02:39 PM, Emmanuel Dreyfus wrote: > > On Wed, Jan 06, 2016 at 05:49:04PM +0530, Ravishankar N wrote: > >> I re triggered NetBSD regressions for > >> http://review.gluster.org/#/c/13041/3 > >> but they are being run in silent mode and are not completing. Can some one > >> from the infra-team take a look? The last 22 tests in > >> https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/ have > >> failed. Highly unlikely that something is wrong with all those patches. > > I note your latest test compelted with an error in mount-nfs-auth.t: > > https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/13260/consoleFull > > > > Would you have the jenkins build that did not complete s that I can have a > > look at it? > > > > Generally speaking, I have to pôint that NetBSD regression does show light > > on generic bugs, we had a recent exemple with quota-nfs.t. For now there > > are not other well supported platforms, but if you want glusterfs to > > be really portable, removing mandatory NetBSD regression is not a good > > idea: > > portability bugs will crop. > > > > Even a daily or weekly regression run seems a bad idea to me. If you do not > > prevent integration of patches that break NetBSD regression, that will get > > in, and tests will break one by one over time. I have a first hand > > experience of this situation, when I was actually trying to catch on with > > NetBSD regression. Many time I reached something reliable enough to become > > mandatory, and got broken by a new patch before it became actualy > > mandatory. > > > > IMO, relaxing NetBSD regression requirement means the project drops the > > goal > > of being portable. > > > hi Emmanuel, > This Sunday I have some time I can spend helping in making > tests better for NetBSD. I have seen bugs that are caught only by NetBSD > regression just recently, so I see value in making NetBSD more reliable. > Please let me know what are the things we can work on. It would help if > you give me something specific to glusterfs to make it more valuable in > the short term. Over time I would like to learn enough to share the load > with you however little it may be (Please bear with me, I some times go > quiet). Here are the initial things I would like to know to begin with: Please count me in too! -Krutika > 1) How to set up NetBSD VMs on my laptop which is of exact version as > the ones that are run on build systems. > 2) How to prevent NetBSD machines hang when things crash (At least I > used to see that the machines hang when fuse crashes before, not sure if > this is still the case)? (This failure needs manual intervention at the > moment on NetBSD regressions, if we make it report failures and pick > next job that would be the best way forward) > 3) We should come up with a list of known problems and how to > troubleshoot those problems, when things are not going smooth in NetBSD. > Again, we really need to make things automatic, this should be last > resort. Our top goal should be to make NetBSD machines report failures > and go to execute next job. > 4) How can we make debugging better in NetBSD? In the worst case we can > make all tests execute in trace/debug mode on NetBSD. > I really want to appreciate the fine job you have done so far in making > sure glusterfs is stable on NetBSD. > Infra team, > I think we need to make some improvements to our infra. We need > to get information about health of linux, NetBSD regression builds. > 1) Something like, in the last 100 builds how many builds succeeded on > Linux, how many succeeded on NetBSD. > 2) What are the tests that failed in the last 100 builds and how many > times on both Linux and NetBSD. (I actually wrote this part in some > parts, but the whole command output has changed making my scripts stale) > Any other ideas you guys have? > 3) Which components have highest number of spurious failures. > 4) How many builds did not complete/manually aborted etc. > Once we start measuring these things, next steps are to setup a process > in place to get the health of the project stable and keep it that way. > Please let me know if anyone wants to volunteer to make things better in > this infra part. Most of the code will be in python. > Pranith > ___ > Gluster-devel mailing list > gluster-de...@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
On 01/07/2016 02:39 PM, Emmanuel Dreyfus wrote: On Wed, Jan 06, 2016 at 05:49:04PM +0530, Ravishankar N wrote: I re triggered NetBSD regressions for http://review.gluster.org/#/c/13041/3 but they are being run in silent mode and are not completing. Can some one from the infra-team take a look? The last 22 tests in https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/ have failed. Highly unlikely that something is wrong with all those patches. I note your latest test compelted with an error in mount-nfs-auth.t: https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/13260/consoleFull Would you have the jenkins build that did not complete s that I can have a look at it? Generally speaking, I have to pôint that NetBSD regression does show light on generic bugs, we had a recent exemple with quota-nfs.t. For now there are not other well supported platforms, but if you want glusterfs to be really portable, removing mandatory NetBSD regression is not a good idea: portability bugs will crop. Even a daily or weekly regression run seems a bad idea to me. If you do not prevent integration of patches that break NetBSD regression, that will get in, and tests will break one by one over time. I have a first hand experience of this situation, when I was actually trying to catch on with NetBSD regression. Many time I reached something reliable enough to become mandatory, and got broken by a new patch before it became actualy mandatory. IMO, relaxing NetBSD regression requirement means the project drops the goal of being portable. hi Emmanuel, This Sunday I have some time I can spend helping in making tests better for NetBSD. I have seen bugs that are caught only by NetBSD regression just recently, so I see value in making NetBSD more reliable. Please let me know what are the things we can work on. It would help if you give me something specific to glusterfs to make it more valuable in the short term. Over time I would like to learn enough to share the load with you however little it may be (Please bear with me, I some times go quiet). Here are the initial things I would like to know to begin with: 1) How to set up NetBSD VMs on my laptop which is of exact version as the ones that are run on build systems. 2) How to prevent NetBSD machines hang when things crash (At least I used to see that the machines hang when fuse crashes before, not sure if this is still the case)? (This failure needs manual intervention at the moment on NetBSD regressions, if we make it report failures and pick next job that would be the best way forward) 3) We should come up with a list of known problems and how to troubleshoot those problems, when things are not going smooth in NetBSD. Again, we really need to make things automatic, this should be last resort. Our top goal should be to make NetBSD machines report failures and go to execute next job. 4) How can we make debugging better in NetBSD? In the worst case we can make all tests execute in trace/debug mode on NetBSD. I really want to appreciate the fine job you have done so far in making sure glusterfs is stable on NetBSD. Infra team, I think we need to make some improvements to our infra. We need to get information about health of linux, NetBSD regression builds. 1) Something like, in the last 100 builds how many builds succeeded on Linux, how many succeeded on NetBSD. 2) What are the tests that failed in the last 100 builds and how many times on both Linux and NetBSD. (I actually wrote this part in some parts, but the whole command output has changed making my scripts stale) Any other ideas you guys have? 3) Which components have highest number of spurious failures. 4) How many builds did not complete/manually aborted etc. Once we start measuring these things, next steps are to setup a process in place to get the health of the project stable and keep it that way. Please let me know if anyone wants to volunteer to make things better in this infra part. Most of the code will be in python. Pranith ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
Ravishankar N wrote: > > I am a bit disturbed by the fact that people raise the > > "NetBSD regression ruins my life" issue without doing the work of > > listing the actual issues encountered. > I already did earlier- the lack of infrastructure to even find out what > caused the issue in the first place. I meant: what test exhibited spurious failure or hang? You can see that from the regression test run. Previous experence makes me suspect we will narrow the problem to a few tests that can be disabled. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
Jeff Darcy wrote: > > Now what is the policy on post-merge regression failure? What happens > > if original submitter is now willing to investigate? > > Then regressions will continue to fail on NetBSD, as they do now, but > without impacting work on other platforms. Well from previous experience of maintaining NetBSD support without mandatory regression, I am almost certain that it will quickly break. The only relief in the post-merge regression scheme is that we will have a precise idea of the change that caused the regression. In my opinion the best way forward would be to identify what tests cause frequent NetBSD spurious failures and disable them for NetBSD regression. I am a bit disturbed by the fact that people raise the "NetBSD regression ruins my life" issue without doing the work of listing the actual issues encountered. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
> > Good idea. Once per merge is still less than one per submission (what > > we have today), and better than nightly/weekly when it comes to > > identifying the source of a regression. Seems like a good compromise. > > Now what is the policy on post-merge regression failure? What happens > if original submitter is now willing to investigate? Then regressions will continue to fail on NetBSD, as they do now, but without impacting work on other platforms. Someone who is concerned about NetBSD can investigate themselves, or try to get the bug marked as a blocker, or add it to the NetBSD "bad tests" list. None of these options are ideal, but we live in an imperfect world. At least this way some human judgement can be applied, instead of always going with the "stop everything" option. ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
Avra Sengupta wrote: > Agree with your point. If we are ready to make exceptions, then we might > as well not block all the patches. As Jeff suggested, triaging the > nightly/weekly results manually and making any serious issues a blocker > should suffice. How are you going to make a serious issue a blocker? The serious issue will be related to multiple patches, it will be impossible to tell which one is the offender. If we go that way, we need to run a regression for each merged patch, which will be much less load than today. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Switching from salt to ansible ?
Le lundi 04 janvier 2016 à 17:27 +0100, Michael Scherer a écrit : > - salt in epel is still using a old version ( for dependencies reasons > ). While this is working well enough, it make contributing quite > difficult, and prevent using some new features that are needed. of course, the fun is that freebsd is shipping the new version, and it break: # salt -t 60 'freebsd0.cloud.gluster.org' test.ping freebsd0.cloud.gluster.org: 'test' __virtual__ returned False ERROR: Minions returned with non-zero exit code SO yeah, keeping minion and server at the same level if a bit annoying unless we ship our own packages. And while I could do that, I am not sure I want to become a expert in all possible packages managers... -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS signature.asc Description: This is a digitally signed message part ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Timeout for jenkins builds
On Thu, Jan 7, 2016 at 4:26 PM, Niels de Vos wrote: > On Thu, Jan 07, 2016 at 03:32:57PM +0530, Raghavendra Talur wrote: > > Hi, > > > > I have enabled timeouts for jenkins builds. > > Regression on CentOS and NetBSD now both have a 300 minutes(5 hour) > timeout > > and tests will be marked "FAILED" when if it does not complete by then. > > Do you know who is investigating these hangs? > This is already done. Thanks to Manikandan and Raghavendra Gowdappa for getting debug info and providing fix respectively. Here is the fix http://review.gluster.org/#/c/13177/ and it is already merged. Rebase your patches to get them pass on NetBSD. > > > This is keeping in view the recent hangs which ran tests for more than > > 10-14 hour. > > > > Also, the script used in NetBSD does not do any cleanup between runs like > > the CentOS script does. > > I did not modify it for now but we should. Let me know If someone wants > to > > take it up. I will have a go at it next week if there are no replies. > > It would be nice to have an extremely simple Jenkins job, that only > executes a script that is located on disk. Could we get such a script in > https://github.com/gluster/glusterfs-patch-acceptance-tests so that we > can easily start running regression tests in the CentOS CI as well? > +10 Some questions: 1. Currently our images have all the packages baked in, what would be the implication if the slaves are fresh CentOS images and need to download the packages as part of a ansible playbook? 2. Instead of regression.sh(in glusterfs-patch-acceptance-tests), run-tests.sh(in glusterfs) is the place where most of the intelligence should reside. This would make creating jenkins job much simpler. Thoughts? > > I would also like to have the Jenkins jobs exported as XML and saved in > the same git repository. That should make it much easier to import the > jobs in other Jenkins environments. Exporting and importing can be done > with the Jenkins CLI, see http://build.gluster.org/cli for details. > Awesome idea!!! > > Thanks, > Niels > ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
On 01/07/2016 03:15 PM, Jeff Darcy wrote: If you do not prevent integration of patches that break NetBSD regression, that will get in, and tests will break one by one over time. On the other hand, if patch A starts blocking all merges because of NetBSD failures, then all platforms - including NetBSD - are denied the benefit of fixes in patches B through Z. The real problem is that our tests are so non-deterministic, so that they pass once and a patch gets merged, but then fail every time after that. The signal-to-noise ratio is really low, and for some reason this problem seems even worse on NetBSD than on Linux. The cost of mandatory NetBSD tests is unbounded, sometimes small enough to be a good investment (of our time) but sometimes totally out of proportion to that benefit. That's not just frustrating for developers; it's also a disservice to the vast majority of our users who might be waiting on fixes. I'd prefer a "defined level of effort" approach which *might* reduce the benefit we derive from NetBSD testing but *definitely* keeps the cost under control. Can we have closure on it, this time around. We all seem to be open to discussing these approaches, which clearly shows that this problem is faced by almost everyone who is sending patches. So can we have a decision taken this time around, and have it implemented. On the top of my head, we can use any of the following to take a call on this. 1. Have the component maintainers vote on one of the approaches and then decide. 2. Have the architects take a call based on the pros and cons of an approach and then decide. Either way let's address this, once and for all in a way that it addresses the pain-points, instead of coming back to it every other month. ___ Gluster-devel mailing list gluster-de...@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
> If you do not > prevent integration of patches that break NetBSD regression, that will get > in, and tests will break one by one over time. On the other hand, if patch A starts blocking all merges because of NetBSD failures, then all platforms - including NetBSD - are denied the benefit of fixes in patches B through Z. The real problem is that our tests are so non-deterministic, so that they pass once and a patch gets merged, but then fail every time after that. The signal-to-noise ratio is really low, and for some reason this problem seems even worse on NetBSD than on Linux. The cost of mandatory NetBSD tests is unbounded, sometimes small enough to be a good investment (of our time) but sometimes totally out of proportion to that benefit. That's not just frustrating for developers; it's also a disservice to the vast majority of our users who might be waiting on fixes. I'd prefer a "defined level of effort" approach which *might* reduce the benefit we derive from NetBSD testing but *definitely* keeps the cost under control. ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
On Thu, Jan 07, 2016 at 03:01:41PM +0530, Avra Sengupta wrote: > Why is this a bad idea? Because each week you will have multiple regressins introduced. Few people will be willing to investigate wether they pushed a patch that caused a regression, and that few people will have to deal the others regressions they did not cause. Being in the situation to fix the regression test will be a rare event, and the whole thing will quickly rot. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
I agree. Regards, Nithya - Original Message - > From: "Atin Mukherjee" > To: "Joseph Fernandes" , "Avra Sengupta" > > Cc: "Gluster Devel" , "gluster-infra" > > Sent: Thursday, January 7, 2016 1:53:47 PM > Subject: Re: [Gluster-devel] [Gluster-infra] NetBSD tests not running to > completion. > > I have been always with this approach right from the beginning. We can > definitely have nightly if not weekly NetBSD regressions to sanitize the > changes, with that model we wouldn't need to eliminate BSD support but > we can avoid this hard dependency in patch acceptance which has haunted > us *multiple* times now. > > Thanks, > Atin > > On 01/07/2016 12:38 PM, Joseph Fernandes wrote: > > +2 Avra > > > > - Original Message - > > From: "Avra Sengupta" > > To: "Gluster Devel" , "gluster-infra" > > > > Sent: Thursday, January 7, 2016 11:51:51 AM > > Subject: Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to > > completion. > > > > The same issue keeps coming up every few months, where all patch > > acceptances comes to a grinding halt with a dependency on NetBSD > > regressions. I have been re-trigerring my patches too, and they are not > > completing. Not to mention the long wait queue for them to run in the > > first place and then having them not complete. > > > > I know this issue has been discussed many times before and every time we > > have arrived at the conclusion that we need to have more stable tests, > > or a more robust infrastructure, but there's more to it than that. > > Here's listing down a few of the things I have observed: > > 1. Not many people are well versed with debugging the issues, that > > result in failure in NetBSD regression suites, simply because not many > > of us are familiar with the nuances of the platform. > > 2. If I am a developer interested in being a part of the gluster > > community and contributing code to it, the patches I send will have > > dependency on NetBSD regressions. When people who have been contributing > > for years now are finding it cumbersome to have the NetBSD regressions > > pass for their patches, imagine the impression and the impact on > > motivation it will have on a new developer. We need to ask ourselves how > > is this impacting a patch acceptance process. > > > > We can atleast try a different approaches to tackle this problem instead > > of just waiting for the test suite to stabilize or the infrastructure to > > get better. > > 1. We can have NetBSD as a separate port, and not have patches sent to > > the master branch be dependent on it's regression. > > 2. We can also have a nightly NetBSD regression run, instead of running > > it per patch. If a particular regression test fails, the owner of the > > test looks into it, and we debug the issue. One might say it's just > > delaying the problem, but atleast we will not have all patches > > acceptances blocked. > > 3. We really need to trigger regressions only on the patches that have > > been reviewed and have gotten a +2. This will substantially bring down > > the wait time. I remember Atin bringing this up a few months back, but > > it still hasn't been implemented. Can we please have this ASAP. > > > > Regards, > > Avra > > > > On 01/06/2016 05:49 PM, Ravishankar N wrote: > >> I re triggered NetBSD regressions for > >> http://review.gluster.org/#/c/13041/3 but they are being run in silent > >> mode and are not completing. Can some one from the infra-team take a > >> look? The last 22 tests in > >> https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/ > >> have failed. Highly unlikely that something is wrong with all those > >> patches. > >> > >> Thanks, > >> Ravi > >> ___ > >> Gluster-infra mailing list > >> Gluster-infra@gluster.org > >> http://www.gluster.org/mailman/listinfo/gluster-infra > > > > ___ > > Gluster-infra mailing list > > Gluster-infra@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-infra > > ___ > > Gluster-devel mailing list > > gluster-de...@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > ___ > Gluster-devel mailing list > gluster-de...@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
I have been always with this approach right from the beginning. We can definitely have nightly if not weekly NetBSD regressions to sanitize the changes, with that model we wouldn't need to eliminate BSD support but we can avoid this hard dependency in patch acceptance which has haunted us *multiple* times now. Thanks, Atin On 01/07/2016 12:38 PM, Joseph Fernandes wrote: > +2 Avra > > - Original Message - > From: "Avra Sengupta" > To: "Gluster Devel" , "gluster-infra" > > Sent: Thursday, January 7, 2016 11:51:51 AM > Subject: Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to > completion. > > The same issue keeps coming up every few months, where all patch > acceptances comes to a grinding halt with a dependency on NetBSD > regressions. I have been re-trigerring my patches too, and they are not > completing. Not to mention the long wait queue for them to run in the > first place and then having them not complete. > > I know this issue has been discussed many times before and every time we > have arrived at the conclusion that we need to have more stable tests, > or a more robust infrastructure, but there's more to it than that. > Here's listing down a few of the things I have observed: > 1. Not many people are well versed with debugging the issues, that > result in failure in NetBSD regression suites, simply because not many > of us are familiar with the nuances of the platform. > 2. If I am a developer interested in being a part of the gluster > community and contributing code to it, the patches I send will have > dependency on NetBSD regressions. When people who have been contributing > for years now are finding it cumbersome to have the NetBSD regressions > pass for their patches, imagine the impression and the impact on > motivation it will have on a new developer. We need to ask ourselves how > is this impacting a patch acceptance process. > > We can atleast try a different approaches to tackle this problem instead > of just waiting for the test suite to stabilize or the infrastructure to > get better. > 1. We can have NetBSD as a separate port, and not have patches sent to > the master branch be dependent on it's regression. > 2. We can also have a nightly NetBSD regression run, instead of running > it per patch. If a particular regression test fails, the owner of the > test looks into it, and we debug the issue. One might say it's just > delaying the problem, but atleast we will not have all patches > acceptances blocked. > 3. We really need to trigger regressions only on the patches that have > been reviewed and have gotten a +2. This will substantially bring down > the wait time. I remember Atin bringing this up a few months back, but > it still hasn't been implemented. Can we please have this ASAP. > > Regards, > Avra > > On 01/06/2016 05:49 PM, Ravishankar N wrote: >> I re triggered NetBSD regressions for >> http://review.gluster.org/#/c/13041/3 but they are being run in silent >> mode and are not completing. Can some one from the infra-team take a >> look? The last 22 tests in >> https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/ >> have failed. Highly unlikely that something is wrong with all those >> patches. >> >> Thanks, >> Ravi >> ___ >> Gluster-infra mailing list >> Gluster-infra@gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-infra > > ___ > Gluster-infra mailing list > Gluster-infra@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-infra > ___ > Gluster-devel mailing list > gluster-de...@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra