Re: [Gluster-infra] [Gluster-devel] Reduce regression runs wait time - New gerrit/review work flow
On 16-Jun-2015 19:22, "Aravinda" wrote: > +1 for running regressions on need basis. > > Also we need to make the tests more intelligent so that same tests runs > differently when triggered nightly/regular. > > function test_some_functionality > { > if env.NIGHTLY { > # run nightly tests > # time consuming tests > # test with more data > run_nightly_tests(); > } > # Regular basic tests > run_basic_tests(); > } > > TEST test_some_functionality; > > This way we can maintain all tests in single place. Set env/config > variable as NIGHTLY whenever we need to run nightly tests. > > I've discussed this idea with different people before. But the major concern was how do we identify that minimal set of basic tests that would do a good job of identifying most regressions. Considering that regression suite will keep growing and will take longer to complete in the future, nightly full regression runs would be nice to have. -- > regards > Aravinda > > > > On 06/15/2015 06:49 AM, Kaushal M wrote: > >> Hi all, >> >> The recent rush of reviews being sent due to the release of 3.7 was a >> cause of frustration for many of us because of the regression tests >> (gerrit troubles themselves are another thing). >> >> W.R.T regression 3 main sources of frustration were, >> 1. Spurious test failures >> 2. Long wait times >> 3. Regression slave troubles >> >> We've already tackled the spurious failure issue and are quite stable >> now. The trouble with the slave vms is related to the gerrit issues, >> and is mainly due to the network issues we are having between the >> data-centers hosting the slaves and gerrit/jenkins. People have been >> looking into this, but we haven't had much success. This leaves the >> issue of the long wait times. >> >> The long wait times are because of the long queues of pending jobs, >> some of which take days to get scheduled. Two things cause the long >> queues, >> 1. Automatic regression job triggering for all submissions to gerrit >> 2. Long run time for regression (~2h) >> >> The long queues coupled with the spurious failure and network >> problems, meant that jobs would fail for no reason after a long wait, >> and would have to be added to the back of the queue to be re-run. This >> meant that developers would have to wait days for their changes to get >> merged, and was one of the causes for the delay in the release of 3.7. >> >> The solution reduce wait times for regression runs. To reduce wait >> times we should, >> 1. Trigger runs only when required >> 2. Reduce regression run time. >> >> Raghavendra Talur (rtalur/RaSTar) will soon send out a mail with his >> findings on the regression run times, and we can continue discussion >> on it on that thread. >> >> Earlier, the regression runs used to be manually triggered by the >> maintainers once they had determined that a change was ready for >> submission. But as there were only two maintainers before (Vijay and >> Avati) auto triggering was brought in to reduce their load. Auto >> triggering worked fine when we had a lower volume of changes being >> submitted to gerrit. But now, with the large volumes we see during the >> release freeze dates, auto triggering just adds to problems. >> >> I propose that we move back to the old model of starting regression >> runs only once the maintainers are ready to merge. But instead of the >> maintainers manually tiggering the runs, we could automate it. >> >> We can model our new workflow on those of OpenStack[1] and >> Wikimedia[2]. The existing Gerrit plugin for Jenkins doesn't provide >> the features necessary to enable selective triggering based on Gerrit >> flags. Both OpenStack and Wikimedia use a project gating tool called >> Zuul[3], which provides a much better integration with Jenkins and >> Gerrit and more features on top. >> >> I propose the following work flow, >> >> - Developer pushes change to Gerrit. >>- Zuul is notified by Gerrit of new change >> - Zuul runs pre-review checks on Jenkins. This will be the current smoke >> tests. >>- Zuul reports back status of the checks to Gerrit. >> - If checks fail, developer will need to resend the change after >> the required fixes. The process starts once more. >> - If the checks pass, the change is now ready for review >> - The change is now reviewed by other developers and maintainers. >> Non-maintainers will be able to give only a +1 review. >>- On a negative review, the developer will need to rework the change >> and resend it. The process starts once more. >> - The maintainer give a +2 review once he/she is satisfied. The >> maintainers work is done here. >>- Zuul is notified of the +2 review >> - Zuul runs the regression runs and reports back the status. >>- If the regression runs fail, the process starts over again. >>- If the runs pass, the change is ready for acceptance. >> - Zuul will pick the change into the repository. >>- If the pick fails, Zuul will re
Re: [Gluster-infra] Fedora 19 VM's in Rackspace
Sounds good. :) On 17 Jun 2015, at 17:58, Thiago da Silva wrote: > Hello, > I just created two VMs in Rackspace to replace the VMs mentioned below. > I created: > gluster-swift-f22-1 > gluster-swift-el7-1 > > These VMs will be used for libgfapi-python and swift-on-file builds. > Once we have the Jenkins builds setup correctly to build from these new > VMs, I will delete the older VMs. > > Please let me know if there are any issues with plan. > > Thanks, > > Thiago -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On 17 Jun 2015, at 20:14, Niels de Vos wrote: > On Wed, Jun 17, 2015 at 03:14:31PM +0200, Michael Scherer wrote: >> Le mercredi 17 juin 2015 à 11:58 +0100, Justin Clift a écrit : >>> On 17 Jun 2015, at 10:53, Michael Scherer wrote: Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit : > Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit : >> Venky Shankar wrote: >> >>> If that's the case, then I'll vote for this even if it takes some time >>> to get things in workable state. >> >> See my other mail about this: you enter a new slave VM in the DNS and it >> does not resolve, or somethimes you get 20s delays. I am convinced this >> is the reason why Jenkins bugs. > > But cloud.gluster.org is handled by rackspace, not sure how much control > we have for it ( not sure even where to start there ). So I cannot change the DNS destination. What I can do is to create a new dns zone, and then, we can delegate as we want. And migrate some slaves and not others, and see how it goes ? slaves.gluster.org would be ok for everybody ? >>> >>> Try it out, and see if it works. :) >>> >>> On the "scaling the infrastructure" side of things, are the two OSAS servers >>> for Gluster still available? >> >> They are online. >> $ ssh r...@ci.gluster.org uptime >> 09:13:37 up 33 days, 16:34, 0 users, load average: 0,00, 0,01, 0,05 > > Can it run some Jenkins Slave VMs too? There are two boxes. A pretty beefy one for running Jenkins slave VM's (probably about 40 VM's simultaneously), and a slightly less beefy one for running Jenkins/Gerrit/whatever. + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Fedora 19 VM's in Rackspace
Hello, I just created two VMs in Rackspace to replace the VMs mentioned below. I created: gluster-swift-f22-1 gluster-swift-el7-1 These VMs will be used for libgfapi-python and swift-on-file builds. Once we have the Jenkins builds setup correctly to build from these new VMs, I will delete the older VMs. Please let me know if there are any issues with plan. Thanks, Thiago On Mon, 2015-04-20 at 19:48 +0100, Justin Clift wrote: > Either way is good. Completely up to do. :) > > + Justin > > > On 20 Apr 2015, at 18:24, Thiago da Silva wrote: > > Hi Justin, > > We are currently using these VMs so they should not be deleted yet, > > but > > I agree we need to upgrade them to newer OS versions. Probably the > > same > > is true about our CentOS machine. What's better? Can we just create > > two > > new VMs and then nuke these once we are done setting up the new > > ones or > > upgrade these machines to Fedora 21? > > > > Thanks, > > > > Thiago > > > > On Sun, 2015-04-19 at 10:44 +0100, Justin Clift wrote: > > > Hi Thiago, > > > > > > Can we nuke these two VM's in Rackspace? > > > > > > * g4s-rackspace-f19-1 > > > * g4s-rackspace-f19-3 > > > > > > They're running Fedora 19, which is no longer receiving any kind > > > of package updates. So... they'll become a security problem at > > > some point, if they're not already. > > > > > > ? > > > > > > Regards and best wishes, > > > > > > Justin Clift > > > > > > -- > > > GlusterFS - http://www.gluster.org > > > > > > An open source, distributed file system scaling to several > > > petabytes, and handling thousands of clients. > > > > > > My personal twitter: twitter.com/realjustinclift > > > > > > > > > -- > GlusterFS - http://www.gluster.org > > An open source, distributed file system scaling to several > petabytes, and handling thousands of clients. > > My personal twitter: twitter.com/realjustinclift > ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wednesday 17 June 2015 04:23 PM, Niels de Vos wrote: On Wed, Jun 17, 2015 at 09:56:32PM +0200, Emmanuel Dreyfus wrote: Niels de Vos wrote: Maybe, but I hope those issues stay masked when resolving the hostnames is more stable. When we have the other servers up and running, we would have a better understanding and options to investigate issues like this. But Jenkins is still unable to launch an agent on e.g. nbslave75. Perhaps it needs to be restarted? Yes, a Jenkins restart might be good. But, I do not know how it gets stopped safely, or started. The only downside of a Jenkins restart is that we would need to manually re-trigger all existing jobs. Shall we just do that now? -Vijay ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wed, Jun 17, 2015 at 09:56:32PM +0200, Emmanuel Dreyfus wrote: > Niels de Vos wrote: > > > Maybe, but I hope those issues stay masked when resolving the hostnames > > is more stable. When we have the other servers up and running, we would > > have a better understanding and options to investigate issues like this. > > But Jenkins is still unable to launch an agent on e.g. nbslave75. > Perhaps it needs to be restarted? Yes, a Jenkins restart might be good. But, I do not know how it gets stopped safely, or started. Niels ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Reduce regression runs wait time - New gerrit/review work flow
On Mon, Jun 15, 2015 at 04:19:14PM +0530, Kaushal M wrote: > Hi all, ... > I propose that we move back to the old model of starting regression > runs only once the maintainers are ready to merge. But instead of the > maintainers manually tiggering the runs, we could automate it. I think auto triggering regression tests is good. We should ask the developers to run regression tests before posting complex changes. If the parallelisation of regression tests is done, the wait time should reduce too. As a maintainer that spends quite some time reviewing patches, I prefer to see a +1 verified before I start to review something. With that, I at least have some confidence that there are no obvious mistakes I need to point out. If developers have to wait on me before regression testing gets started, I feel more like a block on road than helping them. There really are *many* patches that get a FAILED result where there is a problem in the code. Developers should get a response about that as soon as possible, and waiting for a maintainer to start the regression tests does not help. I also had to ask maintainers for triggering regression tests for my first patches, it is not a nice experience. Anything we can do to improve the experience for (new) developers should be done, delaying (auotmated) feedback isnt a step in the right direction. > We can model our new workflow on those of OpenStack[1] and > Wikimedia[2]. The existing Gerrit plugin for Jenkins doesn't provide > the features necessary to enable selective triggering based on Gerrit > flags. Both OpenStack and Wikimedia use a project gating tool called > Zuul[3], which provides a much better integration with Jenkins and > Gerrit and more features on top. More intelligent triggering would be helpful. Unfortunately we have a stack of xlators and it is difficult to say if there are unintended side-effects in different, untouched pieces of the code. > I propose the following work flow, > > - Developer pushes change to Gerrit. > - Zuul is notified by Gerrit of new change > - Zuul runs pre-review checks on Jenkins. This will be the current smoke > tests. > - Zuul reports back status of the checks to Gerrit. > - If checks fail, developer will need to resend the change after > the required fixes. The process starts once more. > - If the checks pass, the change is now ready for review > - The change is now reviewed by other developers and maintainers. > Non-maintainers will be able to give only a +1 review. > - On a negative review, the developer will need to rework the change > and resend it. The process starts once more. > - The maintainer give a +2 review once he/she is satisfied. The > maintainers work is done here. > - Zuul is notified of the +2 review > - Zuul runs the regression runs and reports back the status. > - If the regression runs fail, the process starts over again. > - If the runs pass, the change is ready for acceptance. > - Zuul will pick the change into the repository. > - If the pick fails, Zuul will report back the failure, and the > process starts once again. It would be nice if Zuul, in its last step, can pick the change on top of the latest HEAD, run the build/smoke test again, and only push the change when all is OK. We have seen patch/merge races where a function/define was changed, and an other patch used that function/define. These caused much issues when the branch failed to compile. Being able to prevent that would be very good. > Following this flow should, > 1. Reduce regression wait time "wait time" for what or who? The merging of the patch would still only happen after all tests are done. If something fails the last test, more people (reviewers and maintainer) need to spend additional time. > 2. Improve change acceptance time > 3. Reduce unnecessary wastage of infra resources We could, and should optimize that in our parallel testing and educating develpers to only re-run regressions when needed. Splitting up the regression tests also makes it possible to only re-run a small part of the tests. > 4. Improve infra stability. Not sure if adding an other component and (complex?) configuration adds to "Improve infra stability". It would be nice to have a very minimal set of tools, and many people understanding them. With the current Gerrit and Jenkins configuration we have, we seem to be already very limited on people that can investigate issues. > It also brings in drawbacks that we need to maintain one other piece > of infra (Zuul). This would be an additional maintenance overhead on > top of Gerrit, Jenkins and the current slaves. But I feel the > reduction in the upkeep efforts of the slaves would be enough to > offset this. > > tl;dr > Current auto-triggering of regression runs is stupid and a waste of > time and resources. Bring in a project gating system, Zuul, which can > do a much more intelligent jobs triggering, and use it to > automatically trigger regression only for changes with Reviewe
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
Niels de Vos wrote: > Maybe, but I hope those issues stay masked when resolving the hostnames > is more stable. When we have the other servers up and running, we would > have a better understanding and options to investigate issues like this. But Jenkins is still unable to launch an agent on e.g. nbslave75. Perhaps it needs to be restarted? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wed, Jun 17, 2015 at 03:14:31PM +0200, Michael Scherer wrote: > Le mercredi 17 juin 2015 à 11:58 +0100, Justin Clift a écrit : > > On 17 Jun 2015, at 10:53, Michael Scherer wrote: > > > Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit : > > >> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit : > > >>> Venky Shankar wrote: > > >>> > > If that's the case, then I'll vote for this even if it takes some time > > to get things in workable state. > > >>> > > >>> See my other mail about this: you enter a new slave VM in the DNS and it > > >>> does not resolve, or somethimes you get 20s delays. I am convinced this > > >>> is the reason why Jenkins bugs. > > >> > > >> But cloud.gluster.org is handled by rackspace, not sure how much control > > >> we have for it ( not sure even where to start there ). > > > > > > So I cannot change the DNS destination. > > > > > > What I can do is to create a new dns zone, and then, we can delegate as > > > we want. And migrate some slaves and not others, and see how it goes ? > > > > > > slaves.gluster.org would be ok for everybody ? > > > > Try it out, and see if it works. :) > > > > On the "scaling the infrastructure" side of things, are the two OSAS servers > > for Gluster still available? > > They are online. > $ ssh r...@ci.gluster.org uptime > 09:13:37 up 33 days, 16:34, 0 users, load average: 0,00, 0,01, 0,05 Can it run some Jenkins Slave VMs too? Thanks, Niels ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wed, Jun 17, 2015 at 11:48:46AM +0200, Michael Scherer wrote: > Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit : > > Venky Shankar wrote: > > > > > If that's the case, then I'll vote for this even if it takes some time > > > to get things in workable state. > > > > See my other mail about this: you enter a new slave VM in the DNS and it > > does not resolve, or somethimes you get 20s delays. I am convinced this > > is the reason why Jenkins bugs. > > But cloud.gluster.org is handled by rackspace, not sure how much control > we have for it ( not sure even where to start there ). On build.gluster.org there now is a /usr/local/bin/get-hosts.py script (needs to be executed through sude). This pulls down the DNS records from our cloud.gluster.org domain in Rackspace and proves a /etc/hosts formatted output. /etc/hosts on build.gluster.org contains all the current entries. We could automatically update it with a cron job or something, if needed. New VMs should get added to /etc/hosts too, either manually or by executing the script (sudo vim /etc/hosts, :r!/usr/local/bin/get-hosts.py). > And I think the DNS issues are just a symptom of a bigger network issue, > having local DNS might just mask the problem and which would then be non > DNS related ( like tcp connexion not working ). Maybe, but I hope those issues stay masked when resolving the hostnames is more stable. When we have the other servers up and running, we would have a better understanding and options to investigate issues like this. HTH, Niels ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wed, Jun 17, 2015 at 12:13:46PM +, Emmanuel Dreyfus wrote: > On Wed, Jun 17, 2015 at 07:44:14AM -0400, Vijay Bellur wrote: > > Do we still have the NFS crash that was causing tests to hang? > > Do we still have it on rebased patchsets? Yes, the fixes depend on the refcounting change which does not seem as trivial as I hoped. http://review.gluster.org/11022 for the interested. http://review.gluster.org/11023 is the fix that should solve the segfaults in the NFS-server. Niels ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Reduce regression runs wait time - New gerrit/review work flow
Le lundi 15 juin 2015 à 16:19 +0530, Kaushal M a écrit : > Hi all, > > The recent rush of reviews being sent due to the release of 3.7 was a > cause of frustration for many of us because of the regression tests > (gerrit troubles themselves are another thing). > > W.R.T regression 3 main sources of frustration were, > 1. Spurious test failures > 2. Long wait times > 3. Regression slave troubles > > We've already tackled the spurious failure issue and are quite stable > now. The trouble with the slave vms is related to the gerrit issues, > and is mainly due to the network issues we are having between the > data-centers hosting the slaves and gerrit/jenkins. People have been > looking into this, but we haven't had much success. This leaves the > issue of the long wait times. > > The long wait times are because of the long queues of pending jobs, > some of which take days to get scheduled. Two things cause the long > queues, > 1. Automatic regression job triggering for all submissions to gerrit > 2. Long run time for regression (~2h) > > The long queues coupled with the spurious failure and network > problems, meant that jobs would fail for no reason after a long wait, > and would have to be added to the back of the queue to be re-run. This > meant that developers would have to wait days for their changes to get > merged, and was one of the causes for the delay in the release of 3.7. > > The solution reduce wait times for regression runs. To reduce wait > times we should, > 1. Trigger runs only when required > 2. Reduce regression run time. > > Raghavendra Talur (rtalur/RaSTar) will soon send out a mail with his > findings on the regression run times, and we can continue discussion > on it on that thread. > > Earlier, the regression runs used to be manually triggered by the > maintainers once they had determined that a change was ready for > submission. But as there were only two maintainers before (Vijay and > Avati) auto triggering was brought in to reduce their load. Auto > triggering worked fine when we had a lower volume of changes being > submitted to gerrit. But now, with the large volumes we see during the > release freeze dates, auto triggering just adds to problems. > > I propose that we move back to the old model of starting regression > runs only once the maintainers are ready to merge. But instead of the > maintainers manually tiggering the runs, we could automate it. > > We can model our new workflow on those of OpenStack[1] and > Wikimedia[2]. The existing Gerrit plugin for Jenkins doesn't provide > the features necessary to enable selective triggering based on Gerrit > flags. Both OpenStack and Wikimedia use a project gating tool called > Zuul[3], which provides a much better integration with Jenkins and > Gerrit and more features on top. > > I propose the following work flow, > > - Developer pushes change to Gerrit. > - Zuul is notified by Gerrit of new change > - Zuul runs pre-review checks on Jenkins. This will be the current smoke > tests. > - Zuul reports back status of the checks to Gerrit. > - If checks fail, developer will need to resend the change after > the required fixes. The process starts once more. > - If the checks pass, the change is now ready for review > - The change is now reviewed by other developers and maintainers. > Non-maintainers will be able to give only a +1 review. > - On a negative review, the developer will need to rework the change > and resend it. The process starts once more. > - The maintainer give a +2 review once he/she is satisfied. The > maintainers work is done here. > - Zuul is notified of the +2 review > - Zuul runs the regression runs and reports back the status. > - If the regression runs fail, the process starts over again. > - If the runs pass, the change is ready for acceptance. > - Zuul will pick the change into the repository. > - If the pick fails, Zuul will report back the failure, and the > process starts once again. > > Following this flow should, > 1. Reduce regression wait time > 2. Improve change acceptance time > 3. Reduce unnecessary wastage of infra resources > 4. Improve infra stability. > > It also brings in drawbacks that we need to maintain one other piece > of infra (Zuul). This would be an additional maintenance overhead on > top of Gerrit, Jenkins and the current slaves. But I feel the > reduction in the upkeep efforts of the slaves would be enough to > offset this. > > tl;dr > Current auto-triggering of regression runs is stupid and a waste of > time and resources. Bring in a project gating system, Zuul, which can > do a much more intelligent jobs triggering, and use it to > automatically trigger regression only for changes with Reviewed+2 and > automatically merge ones that pass. > > What does the community think of this? Zuul is being packaged for Fedora/EPEL, so it would greatly help to have it packaged rather that a non sustainable self installation like we had in the past. -- Micha
Re: [Gluster-infra] Status of nbslave7x
On Wed, Jun 17, 2015 at 03:00:29PM +, Emmanuel Dreyfus wrote: > Oh no, it did, but nuked them all almost instantly (see below). I > disabled it again. Basically we have borken jenkins setups, and DNS > trouble prevent us from adding new VM. What a mess. I retriggered most of the jobs, but at soem time the webUI refreshed and I lose track of what jobs I already retriggered or not. I left as is. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Status of nbslave7x
On Wed, Jun 17, 2015 at 08:34:06PM +0530, Kaushal M wrote: > Would restarting jenkins once help? It might help it pick up the newly > added entries to the hosts file. Won't it break all running jobs? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Status of nbslave7x
Would restarting jenkins once help? It might help it pick up the newly added entries to the hosts file. On Wed, Jun 17, 2015 at 8:30 PM, Emmanuel Dreyfus wrote: > On Wed, Jun 17, 2015 at 02:57:28PM +, Emmanuel Dreyfus wrote: >> I re-enabled it and it went online, but it does not seems to pick a job. > > Oh no, it did, but nuked them all almost instantly (see below). I > disabled it again. Basically we have borken jenkins setups, and DNS > trouble prevent us from adding new VM. What a mess. > > Triggered by Gerrit: http://review.gluster.org/11264 in silent mode. > Building remotely on nbslave71.cloud.gluster.org (netbsd7_regression) in > workspace /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered > java.io.IOException: remote file operation failed: > /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered at > hudson.remoting.Channel@1f76c8cf:nbslave71.cloud.gluster.org: > hudson.remoting.ChannelClosedException: channel is already closed > at hudson.FilePath.act(FilePath.java:987) > at hudson.FilePath.act(FilePath.java:969) > at hudson.FilePath.mkdirs(FilePath.java:1152) > at hudson.model.AbstractProject.checkout(AbstractProject.java:1269) > at > hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:610) > at > jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86) > at > hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:532) > at hudson.model.Run.execute(Run.java:1744) > at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) > at hudson.model.ResourceController.execute(ResourceController.java:98) > at hudson.model.Executor.run(Executor.java:374) > Caused by: hudson.remoting.ChannelClosedException: channel is already closed > at hudson.remoting.Channel.send(Channel.java:550) > at hudson.remoting.Request.call(Request.java:129) > at hudson.remoting.Channel.call(Channel.java:752) > at hudson.FilePath.act(FilePath.java:980) > ... 10 more > Caused by: java.io.IOException > at hudson.remoting.Channel.close(Channel.java:1110) > at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:118) > at hudson.remoting.PingThread.ping(PingThread.java:126) > at hudson.remoting.PingThread.run(PingThread.java:85) > Caused by: java.util.concurrent.TimeoutException: Ping started at > 1433860950328 hasn't completed by 1433861190328 > ... 2 more > Finished: FAILURE > > > -- > Emmanuel Dreyfus > m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Status of nbslave7x
On Wed, Jun 17, 2015 at 02:57:28PM +, Emmanuel Dreyfus wrote: > I re-enabled it and it went online, but it does not seems to pick a job. Oh no, it did, but nuked them all almost instantly (see below). I disabled it again. Basically we have borken jenkins setups, and DNS trouble prevent us from adding new VM. What a mess. Triggered by Gerrit: http://review.gluster.org/11264 in silent mode. Building remotely on nbslave71.cloud.gluster.org (netbsd7_regression) in workspace /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered java.io.IOException: remote file operation failed: /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered at hudson.remoting.Channel@1f76c8cf:nbslave71.cloud.gluster.org: hudson.remoting.ChannelClosedException: channel is already closed at hudson.FilePath.act(FilePath.java:987) at hudson.FilePath.act(FilePath.java:969) at hudson.FilePath.mkdirs(FilePath.java:1152) at hudson.model.AbstractProject.checkout(AbstractProject.java:1269) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:610) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:532) at hudson.model.Run.execute(Run.java:1744) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:98) at hudson.model.Executor.run(Executor.java:374) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:550) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:752) at hudson.FilePath.act(FilePath.java:980) ... 10 more Caused by: java.io.IOException at hudson.remoting.Channel.close(Channel.java:1110) at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:118) at hudson.remoting.PingThread.ping(PingThread.java:126) at hudson.remoting.PingThread.run(PingThread.java:85) Caused by: java.util.concurrent.TimeoutException: Ping started at 1433860950328 hasn't completed by 1433861190328 ... 2 more Finished: FAILURE -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Status of nbslave7x
On Wed, Jun 17, 2015 at 07:39:06PM +0530, Kaushal M wrote: > nbslave7{d..f} were the entries created by Vijay last week, which were > resolving to nbslave71; there were no actual vms on rackspace. I had > disabled nbslave71 at that point in time to reboot it, but I think I > forgot to re-enable it. I re-enabled it and it went online, but it does not seems to pick a job. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Status of nbslave7x
nbslave7{d..f} were the entries created by Vijay last week, which were resolving to nbslave71; there were no actual vms on rackspace. I had disabled nbslave71 at that point in time to reboot it, but I think I forgot to re-enable it. ~kaushal On Wed, Jun 17, 2015 at 7:21 PM, Emmanuel Dreyfus wrote: > Status of NetBSD slave VM: > > 1 booked: nbslave71 > It is noted to be disconnected by amarts. Is usage over? > > 3 removed from rackspace but still in jenkins: nbslave7d, nbslave7e, nbslave7f > > 6 active: nbslave72, nbslave77, nbslave7c, nbslave7g, nbslave7i, nbslave7j > > 3 offline: nbslave74 nbslave75 nbslave79 > The 3 DNS records do not resolve (timeout) from build.gluster.org, > while they do at mine. Adding them to /etc/hosts helps a lot on the > command line, and it becomes possible to connect to port 22. > But jenkins is still unable to connect and launch the agent. > tcpdump on build.gluster;org shows it does not even tries. > > Perhaps there is a name cache in jenkisn and it needs to be restarted? > I am leaving the /etc/hosts file loaded with nbslave74 nbslave75 nbslave79 > > -- > Emmanuel Dreyfus > m...@netbsd.org > ___ > Gluster-infra mailing list > Gluster-infra@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-infra ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] Status of nbslave7x
Status of NetBSD slave VM: 1 booked: nbslave71 It is noted to be disconnected by amarts. Is usage over? 3 removed from rackspace but still in jenkins: nbslave7d, nbslave7e, nbslave7f 6 active: nbslave72, nbslave77, nbslave7c, nbslave7g, nbslave7i, nbslave7j 3 offline: nbslave74 nbslave75 nbslave79 The 3 DNS records do not resolve (timeout) from build.gluster.org, while they do at mine. Adding them to /etc/hosts helps a lot on the command line, and it becomes possible to connect to port 22. But jenkins is still unable to connect and launch the agent. tcpdump on build.gluster;org shows it does not even tries. Perhaps there is a name cache in jenkisn and it needs to be restarted? I am leaving the /etc/hosts file loaded with nbslave74 nbslave75 nbslave79 -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
Le mercredi 17 juin 2015 à 11:58 +0100, Justin Clift a écrit : > On 17 Jun 2015, at 10:53, Michael Scherer wrote: > > Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit : > >> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit : > >>> Venky Shankar wrote: > >>> > If that's the case, then I'll vote for this even if it takes some time > to get things in workable state. > >>> > >>> See my other mail about this: you enter a new slave VM in the DNS and it > >>> does not resolve, or somethimes you get 20s delays. I am convinced this > >>> is the reason why Jenkins bugs. > >> > >> But cloud.gluster.org is handled by rackspace, not sure how much control > >> we have for it ( not sure even where to start there ). > > > > So I cannot change the DNS destination. > > > > What I can do is to create a new dns zone, and then, we can delegate as > > we want. And migrate some slaves and not others, and see how it goes ? > > > > slaves.gluster.org would be ok for everybody ? > > Try it out, and see if it works. :) > > On the "scaling the infrastructure" side of things, are the two OSAS servers > for Gluster still available? They are online. $ ssh r...@ci.gluster.org uptime 09:13:37 up 33 days, 16:34, 0 users, load average: 0,00, 0,01, 0,05 > If so, we should get them online ASAP, as that will give us ~40 new VMs > + get us out of iWeb (which I suspect is the problem). I suspect too. But then that mean migrating jenkins and everything, and I would prefer a quick fix. I am looking at the dns solution. -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS signature.asc Description: This is a digitally signed message part ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wednesday 17 June 2015 08:13 AM, Emmanuel Dreyfus wrote: On Wed, Jun 17, 2015 at 07:44:14AM -0400, Vijay Bellur wrote: Do we still have the NFS crash that was causing tests to hang? Do we still have it on rebased patchsets? I am not certain. I am still trying to come to terms with my email backlog and hence seeking a quick opinion here to see if we need to address it asap. -Vijay ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wed, Jun 17, 2015 at 07:44:14AM -0400, Vijay Bellur wrote: > Do we still have the NFS crash that was causing tests to hang? Do we still have it on rebased patchsets? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wednesday 17 June 2015 05:20 AM, Emmanuel Dreyfus wrote: On Wed, Jun 17, 2015 at 11:05:38AM +0200, Niels de Vos wrote: I've already scripted the reboot-vm job to use Rackspace API, the DNS requesting and formatting the results into some file can't be that difficult. Let me know if a /etc/hosts format would do, or if you expect something else. Perhaps a /etc/hosts would do it: jenkins launches the ssh command, and ssh should use /etc/hosts before the DNS. Why don't we try this out while we find an alternate solution? Given that there are plenty of patches awaiting NetBSD regression, anything that we can do to alleviate the situation would be more than welcome! Do we still have the NFS crash that was causing tests to hang? Thanks, Vijay ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
Just moving Gerrit and Jenkins out of iWeb should help a lot. On Wed, Jun 17, 2015 at 4:28 PM, Justin Clift wrote: > On 17 Jun 2015, at 10:53, Michael Scherer wrote: >> Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit : >>> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit : Venky Shankar wrote: > If that's the case, then I'll vote for this even if it takes some time > to get things in workable state. See my other mail about this: you enter a new slave VM in the DNS and it does not resolve, or somethimes you get 20s delays. I am convinced this is the reason why Jenkins bugs. >>> >>> But cloud.gluster.org is handled by rackspace, not sure how much control >>> we have for it ( not sure even where to start there ). >> >> So I cannot change the DNS destination. >> >> What I can do is to create a new dns zone, and then, we can delegate as >> we want. And migrate some slaves and not others, and see how it goes ? >> >> slaves.gluster.org would be ok for everybody ? > > Try it out, and see if it works. :) > > On the "scaling the infrastructure" side of things, are the two OSAS servers > for Gluster still available? > > If so, we should get them online ASAP, as that will give us ~40 new VMs > + get us out of iWeb (which I suspect is the problem). > > Regards and best wishes, > > Justin Clift > > -- > GlusterFS - http://www.gluster.org > > An open source, distributed file system scaling to several > petabytes, and handling thousands of clients. > > My personal twitter: twitter.com/realjustinclift > > ___ > Gluster-infra mailing list > Gluster-infra@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-infra ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On 17 Jun 2015, at 10:53, Michael Scherer wrote: > Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit : >> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit : >>> Venky Shankar wrote: >>> If that's the case, then I'll vote for this even if it takes some time to get things in workable state. >>> >>> See my other mail about this: you enter a new slave VM in the DNS and it >>> does not resolve, or somethimes you get 20s delays. I am convinced this >>> is the reason why Jenkins bugs. >> >> But cloud.gluster.org is handled by rackspace, not sure how much control >> we have for it ( not sure even where to start there ). > > So I cannot change the DNS destination. > > What I can do is to create a new dns zone, and then, we can delegate as > we want. And migrate some slaves and not others, and see how it goes ? > > slaves.gluster.org would be ok for everybody ? Try it out, and see if it works. :) On the "scaling the infrastructure" side of things, are the two OSAS servers for Gluster still available? If so, we should get them online ASAP, as that will give us ~40 new VMs + get us out of iWeb (which I suspect is the problem). Regards and best wishes, Justin Clift -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On 17 Jun 2015, at 07:29, Kaushal M wrote: > cloud.gluster.org is served by Rackspace Cloud DNS. AFAICT, there is > no readily available option to do zone transfers from it. We might > have to contact the Rackspace support to find out if they can do it as > a special request. Contacting Rackspace support is very easy, and they're normally very responsive. They have an online support ticket submission thing in the Rackspace UI. Often they get back to us with meaningful responses in less than 15-20 minutes. Please go ahead and submit a ticket. :) (Btw - I suspect the DNS issue is likely related to the hardware firewall in the iWeb infrastructure. It's probably acting up. :<). Regards and best wishes, Justin Clift -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wed, Jun 17, 2015 at 9:50 AM, Atin Mukherjee wrote: > > > On 06/11/2015 08:04 PM, Emmanuel Dreyfus wrote: >> On Thu, Jun 11, 2015 at 04:04:44PM +0200, Niels de Vos wrote: >>> Michael installed and configured dnsmasq on build.gluster.org yesterday. >>> If that does not help today, we need other ideas... >> >> Just to confirm the problem: >> >> [manu@build ~]$ time nslookup nbslave7i.cloud.gluster.org >> ;; connection timed out; trying next origin >> ;; connection timed out; no servers could be reached >> >> >> real0m20.013s >> user0m0.002s >> sys 0m0.012s >> >> Having a local cache does not help because upstream DNS service is >> weak. Without the local cache, individual processes crave for a reply, >> and with the local server, the local server crave itself crave for >> a reply. >> >> And here upstream DNS is really at fault: at mine I get a reply in >> 0.29s. >> >> We need to configure a local authoritative secondary DNS for the zone, >> so that the answer is always available locally wihtout having to rely >> on outside's infrastructure. > I am not sure whether we have any improvements on this front. I still > see patches are waiting for ages to get their turn for the regression > run and hence delaying merges and effecting the release process. > > I still feel we don't need to wait for NetBSD's vote for merging patches > on a temporary basis till we fix the infrastructure problem. This is the > only quick solution which I can think of now. That *might* result in lots of NetBSD regression failures later on and we may end up with another round of fixups. I can't think of a quick solution either. > > Thoughts? > > ~Atin >> > > -- > ~Atin > ___ > Gluster-devel mailing list > gluster-de...@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wed, Jun 17, 2015 at 10:13 AM, Emmanuel Dreyfus wrote: > Atin Mukherjee wrote: > >> > That *might* result in lots of NetBSD regression failures later on and >> > we may end up with another round of fixups. >> Agreed, that's the known risk but we don't have any other alternatives atm. > > I strongly disagree, we have a good alternative: configure a secondary > DNS on build.gluster.org for the cloud.gluster.org zone. I could do the > local configuration, but someone with administrative access will have to > touch primary configuration to allow zone transfer (and enable > notifications). If that's the case, then I'll vote for this even if it takes some time to get things in workable state. I think Kaushal/Niels/Justin could surely help here. > > The current situation is that we have 14 NetBSD VM online and only 5 are > capable of running jobs because of various infrastructure configuration > problems, broken DNS being the first offender. > > Another issue is the hanging NFS mounts (ps -axl shows dd stuck in wchan > tstile), for which I had a change merged that should fix the problem, > but only for rebased changes. > > > -- > Emmanuel Dreyfus > http://hcpnet.free.fr/pubz > m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
- Original Message - > From: "Kaushal M" > To: "Emmanuel Dreyfus" > Cc: "Gluster Devel" , "gluster-infra" > > Sent: Wednesday, 17 June, 2015 11:59:22 AM > Subject: Re: [Gluster-devel] [Gluster-infra] NetBSD regressions not being > triggered for patches > > cloud.gluster.org is served by Rackspace Cloud DNS. AFAICT, there is > no readily available option to do zone transfers from it. We might > have to contact the Rackspace support to find out if they can do it as > a special request. > If this is going to take time then I prefer not to block patches for NetBSD. We can address any NetBSD regression caused by patches as a separate bug. Otherwise our regression queue will continue to grow. > > On Wed, Jun 17, 2015 at 11:50 AM, Emmanuel Dreyfus wrote: > > Venky Shankar wrote: > > > >> If that's the case, then I'll vote for this even if it takes some time > >> to get things in workable state. > > > > See my other mail about this: you enter a new slave VM in the DNS and it > > does not resolve, or somethimes you get 20s delays. I am convinced this > > is the reason why Jenkins bugs. > > > > -- > > Emmanuel Dreyfus > > http://hcpnet.free.fr/pubz > > m...@netbsd.org > > ___ > > Gluster-infra mailing list > > Gluster-infra@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-infra > ___ > Gluster-devel mailing list > gluster-de...@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
- Original Message - > From: "Avra Sengupta" > To: "Rajesh Joseph" , "Kaushal M" > Cc: "Gluster Devel" , "gluster-infra" > > Sent: Wednesday, June 17, 2015 1:42:25 PM > Subject: Re: [Gluster-devel] [Gluster-infra] NetBSD regressions not being > triggered for patches > > On 06/17/2015 12:12 PM, Rajesh Joseph wrote: > > > > - Original Message - > >> From: "Kaushal M" > >> To: "Emmanuel Dreyfus" > >> Cc: "Gluster Devel" , "gluster-infra" > >> > >> Sent: Wednesday, 17 June, 2015 11:59:22 AM > >> Subject: Re: [Gluster-devel] [Gluster-infra] NetBSD regressions not being > >> triggered for patches > >> > >> cloud.gluster.org is served by Rackspace Cloud DNS. AFAICT, there is > >> no readily available option to do zone transfers from it. We might > >> have to contact the Rackspace support to find out if they can do it as > >> a special request. > >> > > If this is going to take time then I prefer not to block patches for > > NetBSD. We can address > > any NetBSD regression caused by patches as a separate bug. Otherwise our > > regression queue will > > continue to grow. > +1 for this. We shouldn't be blocking patches for NetBSD regression till > the infra scales enough to handle the kind of load we are throwing at > it. Once the regression framework is scalable enough, we can fix any > regressions (if any) introduced. This will bring down the turnaround > time, for the patch acceptance. +1 > > > >> On Wed, Jun 17, 2015 at 11:50 AM, Emmanuel Dreyfus > >> wrote: > >>> Venky Shankar wrote: > >>> > If that's the case, then I'll vote for this even if it takes some time > to get things in workable state. > >>> See my other mail about this: you enter a new slave VM in the DNS and it > >>> does not resolve, or somethimes you get 20s delays. I am convinced this > >>> is the reason why Jenkins bugs. > >>> > >>> -- > >>> Emmanuel Dreyfus > >>> http://hcpnet.free.fr/pubz > >>> m...@netbsd.org > >>> ___ > >>> Gluster-infra mailing list > >>> Gluster-infra@gluster.org > >>> http://www.gluster.org/mailman/listinfo/gluster-infra > >> ___ > >> Gluster-devel mailing list > >> gluster-de...@gluster.org > >> http://www.gluster.org/mailman/listinfo/gluster-devel > >> ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On 06/17/2015 09:57 AM, Venky Shankar wrote: > On Wed, Jun 17, 2015 at 9:50 AM, Atin Mukherjee wrote: >> >> >> On 06/11/2015 08:04 PM, Emmanuel Dreyfus wrote: >>> On Thu, Jun 11, 2015 at 04:04:44PM +0200, Niels de Vos wrote: Michael installed and configured dnsmasq on build.gluster.org yesterday. If that does not help today, we need other ideas... >>> >>> Just to confirm the problem: >>> >>> [manu@build ~]$ time nslookup nbslave7i.cloud.gluster.org >>> ;; connection timed out; trying next origin >>> ;; connection timed out; no servers could be reached >>> >>> >>> real0m20.013s >>> user0m0.002s >>> sys 0m0.012s >>> >>> Having a local cache does not help because upstream DNS service is >>> weak. Without the local cache, individual processes crave for a reply, >>> and with the local server, the local server crave itself crave for >>> a reply. >>> >>> And here upstream DNS is really at fault: at mine I get a reply in >>> 0.29s. >>> >>> We need to configure a local authoritative secondary DNS for the zone, >>> so that the answer is always available locally wihtout having to rely >>> on outside's infrastructure. >> I am not sure whether we have any improvements on this front. I still >> see patches are waiting for ages to get their turn for the regression >> run and hence delaying merges and effecting the release process. >> >> I still feel we don't need to wait for NetBSD's vote for merging patches >> on a temporary basis till we fix the infrastructure problem. This is the >> only quick solution which I can think of now. > > That *might* result in lots of NetBSD regression failures later on and > we may end up with another round of fixups. Agreed, that's the known risk but we don't have any other alternatives atm. > > I can't think of a quick solution either. > >> >> Thoughts? >> >> ~Atin >>> >> >> -- >> ~Atin >> ___ >> Gluster-devel mailing list >> gluster-de...@gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-devel -- ~Atin ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On 06/11/2015 08:04 PM, Emmanuel Dreyfus wrote: > On Thu, Jun 11, 2015 at 04:04:44PM +0200, Niels de Vos wrote: >> Michael installed and configured dnsmasq on build.gluster.org yesterday. >> If that does not help today, we need other ideas... > > Just to confirm the problem: > > [manu@build ~]$ time nslookup nbslave7i.cloud.gluster.org > ;; connection timed out; trying next origin > ;; connection timed out; no servers could be reached > > > real0m20.013s > user0m0.002s > sys 0m0.012s > > Having a local cache does not help because upstream DNS service is > weak. Without the local cache, individual processes crave for a reply, > and with the local server, the local server crave itself crave for > a reply. > > And here upstream DNS is really at fault: at mine I get a reply in > 0.29s. > > We need to configure a local authoritative secondary DNS for the zone, > so that the answer is always available locally wihtout having to rely > on outside's infrastructure. I am not sure whether we have any improvements on this front. I still see patches are waiting for ages to get their turn for the regression run and hence delaying merges and effecting the release process. I still feel we don't need to wait for NetBSD's vote for merging patches on a temporary basis till we fix the infrastructure problem. This is the only quick solution which I can think of now. Thoughts? ~Atin > -- ~Atin ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit : > Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit : > > Venky Shankar wrote: > > > > > If that's the case, then I'll vote for this even if it takes some time > > > to get things in workable state. > > > > See my other mail about this: you enter a new slave VM in the DNS and it > > does not resolve, or somethimes you get 20s delays. I am convinced this > > is the reason why Jenkins bugs. > > But cloud.gluster.org is handled by rackspace, not sure how much control > we have for it ( not sure even where to start there ). So I cannot change the DNS destination. What I can do is to create a new dns zone, and then, we can delegate as we want. And migrate some slaves and not others, and see how it goes ? slaves.gluster.org would be ok for everybody ? -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS signature.asc Description: This is a digitally signed message part ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wed, Jun 17, 2015 at 11:48:46AM +0200, Michael Scherer wrote: > And I think the DNS issues are just a symptom of a bigger network issue, > having local DNS might just mask the problem and which would then be non > DNS related ( like tcp connexion not working ). Well, if it is lost packets, TCP is more resistant, and if it is an overloaded DNS server, the problem is only for DNS. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit : > Venky Shankar wrote: > > > If that's the case, then I'll vote for this even if it takes some time > > to get things in workable state. > > See my other mail about this: you enter a new slave VM in the DNS and it > does not resolve, or somethimes you get 20s delays. I am convinced this > is the reason why Jenkins bugs. But cloud.gluster.org is handled by rackspace, not sure how much control we have for it ( not sure even where to start there ). And I think the DNS issues are just a symptom of a bigger network issue, having local DNS might just mask the problem and which would then be non DNS related ( like tcp connexion not working ). -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS signature.asc Description: This is a digitally signed message part ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wed, Jun 17, 2015 at 11:05:38AM +0200, Niels de Vos wrote: > I've already scripted the reboot-vm job to use Rackspace API, the DNS > requesting and formatting the results into some file can't be that > difficult. Let me know if a /etc/hosts format would do, or if you expect > something else. Perhaps a /etc/hosts would do it: jenkins launches the ssh command, and ssh should use /etc/hosts before the DNS. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wed, Jun 17, 2015 at 11:59:22AM +0530, Kaushal M wrote: > cloud.gluster.org is served by Rackspace Cloud DNS. AFAICT, there is > no readily available option to do zone transfers from it. We might > have to contact the Rackspace support to find out if they can do it as > a special request. Not sure about zone transfers, but we can request the DNS records through the Rackspace DNS API: http://docs.rackspace.com/cdns/api/v1.0/cdns-getting-started/content/List_Domain_Details.html The IP addresses of the VMs do not change often, so a regular fetching of the records would be sufficient. We could even have a Jenkins job that downloads an updated /etc/hosts to a slave. I've already scripted the reboot-vm job to use Rackspace API, the DNS requesting and formatting the results into some file can't be that difficult. Let me know if a /etc/hosts format would do, or if you expect something else. Thanks, Niels > > > On Wed, Jun 17, 2015 at 11:50 AM, Emmanuel Dreyfus wrote: > > Venky Shankar wrote: > > > >> If that's the case, then I'll vote for this even if it takes some time > >> to get things in workable state. > > > > See my other mail about this: you enter a new slave VM in the DNS and it > > does not resolve, or somethimes you get 20s delays. I am convinced this > > is the reason why Jenkins bugs. > > > > -- > > Emmanuel Dreyfus > > http://hcpnet.free.fr/pubz > > m...@netbsd.org > > ___ > > Gluster-infra mailing list > > Gluster-infra@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-infra > ___ > Gluster-infra mailing list > Gluster-infra@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-infra ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On 06/17/2015 12:12 PM, Rajesh Joseph wrote: - Original Message - From: "Kaushal M" To: "Emmanuel Dreyfus" Cc: "Gluster Devel" , "gluster-infra" Sent: Wednesday, 17 June, 2015 11:59:22 AM Subject: Re: [Gluster-devel] [Gluster-infra] NetBSD regressions not being triggered for patches cloud.gluster.org is served by Rackspace Cloud DNS. AFAICT, there is no readily available option to do zone transfers from it. We might have to contact the Rackspace support to find out if they can do it as a special request. If this is going to take time then I prefer not to block patches for NetBSD. We can address any NetBSD regression caused by patches as a separate bug. Otherwise our regression queue will continue to grow. +1 for this. We shouldn't be blocking patches for NetBSD regression till the infra scales enough to handle the kind of load we are throwing at it. Once the regression framework is scalable enough, we can fix any regressions (if any) introduced. This will bring down the turnaround time, for the patch acceptance. On Wed, Jun 17, 2015 at 11:50 AM, Emmanuel Dreyfus wrote: Venky Shankar wrote: If that's the case, then I'll vote for this even if it takes some time to get things in workable state. See my other mail about this: you enter a new slave VM in the DNS and it does not resolve, or somethimes you get 20s delays. I am convinced this is the reason why Jenkins bugs. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra ___ Gluster-devel mailing list gluster-de...@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list gluster-de...@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wed, Jun 17, 2015 at 11:59:22AM +0530, Kaushal M wrote: > cloud.gluster.org is served by Rackspace Cloud DNS Perhaps we can change that and setup a DNS for the zone? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra