Re: [Gluster-infra] Jenkins switched over to new builders for regression

2019-02-08 Thread Nigel Babu
in the logs. On Fri, Feb 8, 2019 at 7:49 AM Nigel Babu wrote: > Hello, > > We've reached the half way mark in the migration and half our builders > today are now running on AWS. I've turned off the RAX builders and have > them try to be online only if the AWS builders cannot ha

[Gluster-infra] Jenkins switched over to new builders for regression

2019-02-07 Thread Nigel Babu
Hello, We've reached the half way mark in the migration and half our builders today are now running on AWS. I've turned off the RAX builders and have them try to be online only if the AWS builders cannot handle the number of jobs running at any given point. The new builders are named

[Gluster-infra] Regression logs issue

2019-02-07 Thread Nigel Babu
Hello folks, In the last week, if you have had a regression job that failed, you will not find a log for it. This is due to a mistake I made while deleting code. Rather than deleting the code for the push to an internal HTTP server, I also deleted a line which handled the log creation. Apologies

[Gluster-infra] Please do not upgrade the cppcheck Jenkins plugin

2019-01-10 Thread Nigel Babu
Hello folks, This is a note to myself and everyone else. Please do not upgrade cppcheck from 1.22. The plugin seems to have changed in a backwards incompatible manner. For now we'll stick to the 1.22 version until we have to figure out how to make it work with the latest version. -- nigelb

[Gluster-infra] Infra Update for Nov and Dec

2018-12-19 Thread Nigel Babu
Hello folks, The infra team has not been sending regular updates recently because we’ve been caught up in several different pieces of work that were running into longer than 2 week sprint cycles. This is a summary of what we’ve done so far since the last update. * The bugzilla updates are done

[Gluster-infra] Github notifications for spec reviews

2018-11-30 Thread Nigel Babu
Hello folks, We've had a bug to automate adding a comment on Github when there is a new spec patch. I'm going to deny this request. * The glusterfs-specs repo does not have an issue tracker and does not seem to ever need an issue tracker. We currently limit pre-merge commenting on Github to the

[Gluster-infra] Short review.gluster.org outage in the next 15 mins

2018-11-05 Thread Nigel Babu
Hello folks, Going to restart gerrit on review.gluster.org for a quick config change in the next 15 mins. Estimate outage of 5 mins. I'll update this thread when we're back online -- nigelb ___ Gluster-infra mailing list Gluster-infra@gluster.org

Re: [Gluster-infra] Centos CI automation Retrospective

2018-11-02 Thread Nigel Babu
Oops, missed finishing a line. Please avoid making any changes directly via the Jenkins UI going forward. Any configuration changes need to be made from the repo so the config drives Jenkins. On Fri, Nov 2, 2018 at 11:32 AM Nigel Babu wrote: > Hello folks, > > On Monday,

[Gluster-infra] Centos CI automation Retrospective

2018-11-02 Thread Nigel Babu
Hello folks, On Monday, I merged in the changes that allowed all the jobs in Centos CI to be handled in an automated fashion. In the past, it depended on Infra team members to review, merge, and apply the changes on Centos CI. I've now changed that so that the individual job owners can do their

Re: [Gluster-infra] Maintaining gluster/centosci repo

2018-10-29 Thread Nigel Babu
On Mon, Oct 29, 2018 at 3:00 PM Michael Adam wrote: > > > On Mon, Oct 29, 2018 at 10:09 AM Nigel Babu wrote: > >> This patch was merged today. >> > > Sorry, but what does "This" refer to? > > The .travis.yml change with a test and deployment script

[Gluster-infra] Gluster Infra Update

2018-10-18 Thread Nigel Babu
Hello folks, Here's the update from the last 2 weeks from the Infra team. * Created an architecture document for Automated Upgrade Testing. This is now done and is undergoing reviews. It is scheduled to be published on the devel list as soon as we have a decent PoC. * Finished part of the

Re: [Gluster-infra] Reducing the number of builders in the cage

2018-10-15 Thread Nigel Babu
I think it might we worth pulling out some utilization numbers to see how many to pull. If we can get the freebsd builder working, that would eliminate the need to run it on rackspace and having two of them would increase the speed at which we process the smoke queue. On Mon, Oct 15, 2018 at 5:32

[Gluster-infra] Maintaining gluster/centosci repo

2018-10-12 Thread Nigel Babu
Hello folks, The centosci repo keeps falling behind in terms of reviews and merges + delay in applying the merges on ci.centos.org. I'd like to propose the following to change that. This change will impact everyone who runs a job on Centos CI. * As soon as you merge a patch into that repo, we

[Gluster-infra] Infra Update for the last 2 weeks

2018-10-03 Thread Nigel Babu
Hello folks, I meant to send this out on Monday, but it's been a busy few days. * The infra pieces of distributed regression are now complete. A big shout out to Deepshikha for driving this and Ramky for his help in get this to completion. * The GD2 containers and CSI container builds work now.

Re: [Gluster-infra] Repository needed - cockpit-gluster

2018-10-02 Thread Nigel Babu
n, Aug 20, 2018 at 4:21 PM, Nigel Babu wrote: > >> Yep. Done. >> >> On Mon, Aug 20, 2018 at 4:05 PM Sahina Bose wrote: >> >>> Hi Nigel, >>> >>> I've raised a bug for repository creation - >>> https://bugzilla.redhat

[Gluster-infra] Unplanned Jenkins maintenance

2018-09-28 Thread Nigel Babu
Hello folks, I did a quick unplanned Jenkins maintenance today to upgrade 3 plugins with security issues in them. This is now complete. There was a brief period where we did not start new jobs until Jenkins restarted. There should have been no interruption of existing jobs or any jobs canceled.

Re: [Gluster-infra] [Gluster-devel] Freebsd builder upgrade to 10.4, maybe 11

2018-09-11 Thread Nigel Babu
On Tue, Sep 11, 2018 at 7:06 PM Michael Scherer wrote: > And... rescue mode is not working. So the server is down until > Rackspace fix it. > > Can someone disable the freebsd smoke test, as I think our 2nd builder > is not yet building fine ? > Disabled. Please do not merge any JJB review

[Gluster-infra] Urgent Gerrit reboot today

2018-08-23 Thread Nigel Babu
Hello folks, We're going to do an urgent reboot of the Gerrit server in the next 1h or so. For some reason, hot-adding RAM on this machine isn't working, so we're going to do a reboot to get this working. This is needed to prevent the OOM Kill problems we've been running into since last night.

Re: [Gluster-infra] Reboot policy for the infra

2018-08-22 Thread Nigel Babu
One more piece that's missing is when we'll restart the physical servers. That seems to be entirely missing. The rest looks good to me and I'm happy to add an item to next sprint to automate the node rebooting. On Tue, Aug 21, 2018 at 9:56 PM Michael Scherer wrote: > Hi, > > so that's kernel

Re: [Gluster-infra] Repository needed - cockpit-gluster

2018-08-20 Thread Nigel Babu
Yep. Done. On Mon, Aug 20, 2018 at 4:05 PM Sahina Bose wrote: > Hi Nigel, > > I've raised a bug for repository creation - > https://bugzilla.redhat.com/show_bug.cgi?id=1619205 > > Could you help? > > thanks > sahina > -- nigelb ___ Gluster-infra

Re: [Gluster-infra] Portmortem for gluster jenkins disk full outage on the 15th of August

2018-08-15 Thread Nigel Babu
On Wed, Aug 15, 2018 at 2:41 PM Michael Scherer wrote: > Hi folks, > > So Gluster jenkins disk was full today (cause outages do not respect > public holiday in India (Independance day) and France(Assumption)), > here is the post mortem for your reading pleasure > > Date: 15/08/2018 > > Service

Re: [Gluster-infra] Looks like glusterfs's smoke job is not running for the patches posted

2018-08-15 Thread Nigel Babu
This is something I've highlighted in the past. If you trigger regression and smoke at the same time, smoke will only vote after regression job is done. That's Jenkins optimizing the communication with Gerrit so it needs to do the voting only once. This is a feature and not a bug. On Wed, Aug 15,

[Gluster-infra] Setting up machines from softserve in under 5 mins

2018-08-13 Thread Nigel Babu
Hello folks, Deepshikha did the work to make loaning a machine to running your regressions on them faster a while ago. I've tested them a few times today to confirm it works as expected. In the past, Softserve[1] machines would be a clean Centos 7 image. Now, we have an image with all the

[Gluster-infra] Post-upgrade issues

2018-08-08 Thread Nigel Babu
Hello folks, We have two post-upgrade issues 1. Jenkins jobs are failing because git clones fail. This is now fixed. 2. git.gluster.org shows no repos at the moment. I'm currently debugging this. -- nigelb ___ Gluster-infra mailing list

Re: [Gluster-infra] [Gluster-devel] Fwd: Gerrit downtime on Aug 8, 2016

2018-08-08 Thread Nigel Babu
On Wed, Aug 8, 2018 at 4:59 PM Yaniv Kaul wrote: > > Nice, thanks! > I'm trying out the new UI. Needs getting used to, I guess. > Have we upgraded to NotesDB? > Yep! Account information is now completely in NoteDB and not in ReviewDB(which is backed by postgresql for us) anymore.

[Gluster-infra] Fwd: Gerrit downtime on Aug 8, 2016

2018-08-07 Thread Nigel Babu
Reminder, this upgrade is tomorrow. -- Forwarded message - From: Nigel Babu Date: Fri, Jul 27, 2018 at 5:28 PM Subject: Gerrit downtime on Aug 8, 2016 To: gluster-devel Cc: gluster-infra , < automated-test...@gluster.org> Hello, It's been a while since we upgraded Gerr

[Gluster-infra] Coverity on build nodes

2018-08-06 Thread Nigel Babu
I've just added two nodes for Coverity. The tarball is on build.gluster.org, but the process of setting it up on a node is pretty manual at the moment. Ideally, I'd like an internal only server from which we can download private binaries that we can distribute. The tar has been extracted to /opt

[Gluster-infra] Master branch is closed

2018-08-05 Thread Nigel Babu
Hello folks, Master branch is now closed. Only a few people have commit access now and it's to be exclusively used to merge fixes to make master stable again. -- nigelb ___ Gluster-infra mailing list Gluster-infra@gluster.org

Re: [Gluster-infra] Gerrit downtime on Aug 8, 2016

2018-07-28 Thread Nigel Babu
will get overwritten by Ansible tonight :) On Fri, Jul 27, 2018 at 5:28 PM Nigel Babu wrote: > Hello, > > It's been a while since we upgraded Gerrit. We plan to do a full upgrade > and move to 2.15.3. Among other changes, this brings in the new PolyGerrit > interface which brings signi

Re: [Gluster-infra] [automated-testing] Gerrit downtime on Aug 8, 2016

2018-07-27 Thread Nigel Babu
gt; The staging URL seems to be missing from the note > > On Fri, Jul 27, 2018 at 5:28 PM, Nigel Babu wrote: > > Hello, > > > > It's been a while since we upgraded Gerrit. We plan to do a full upgrade > and > > move to 2.15.3. Among other changes, this brings in the n

[Gluster-infra] Gerrit downtime on Aug 8, 2016

2018-07-27 Thread Nigel Babu
Hello, It's been a while since we upgraded Gerrit. We plan to do a full upgrade and move to 2.15.3. Among other changes, this brings in the new PolyGerrit interface which brings significant frontend changes. You can take a look at how this would look on the staging site[1]. ## Outage Window 0330

Re: [Gluster-infra] [Gluster-devel] Github teams/repo cleanup

2018-07-25 Thread Nigel Babu
On Wed, Jul 25, 2018 at 6:51 PM Niels de Vos wrote: > We had someone working on starting/stopping Jenkins slaves in Rackspace > on-demand. He since has left Red Hat and I do not think the infra team > had a great interest in this either (with the move out of Rackspace). > > It can be deleted

Re: [Gluster-infra] [Gluster-devel] Github teams/repo cleanup

2018-07-25 Thread Nigel Babu
> So while cleaning thing up, I wonder if we can remove this one: > https://github.com/gluster/jenkins-ssh-slaves-plugin > > We have just a fork, lagging from upstream and I am sure we do not use > it. > Safe to delete. We're not using it for sure. > > The same goes for: >

Re: [Gluster-infra] [Gluster-devel] Github teams/repo cleanup

2018-07-25 Thread Nigel Babu
I think our team structure on Github has become unruly. I prefer that we use teams only when we can demonstrate that there is a strong need. At the moment, the gluster-maintainers and the glusterd2 projects have teams that have a strong need. If any other repo has a strong need for teams, please

[Gluster-infra] Postmortem for Jenkins Outage on 20/07/18

2018-07-20 Thread Nigel Babu
Hello folks, I had to take down Jenkins for some time today. The server ran out of space and was silently ignoring Gerrit requests for new jobs. If you think one of your jobs needed a smoke or regression run and it wasn't triggered, this is the root cause. Please retrigger your jobs. ## Summary

[Gluster-infra] Gerrit and postgresql replica

2018-06-30 Thread Nigel Babu
Hello, I think the various pieces around infra have stabilized enough for us to think about this. I suggest that we think about having a Gerrit replica in the cloud (whichever clouds the CI consumes). This gives us a fall back option in case the cage has problems. It also gives us a good way to

[Gluster-infra] Fedora builds and rawhide builds

2018-06-19 Thread Nigel Babu
Hello, We ran into a problem where builds for F28 and above will not build on CentOS7 chroots. We caught this when F28 was rawhide but deemed it not yet important enough to fix, however, recent developments have forced us to make the switch. Our Fedora builds will also switch to using F28. We

Re: [Gluster-infra] 'Clone with commit-msg hook' produces wrong scp command

2018-06-18 Thread Nigel Babu
Hi Yaniv, This was because we forward port 22 to port 29418. I just changed the sshd.advertisedAddress to say review.gluster.org:22. That did the trick. Thanks for bringing this to our attention. On Mon, Jun 18, 2018 at 9:01 PM, Yaniv Kaul wrote: > When I choose 'clone with a commit-msg hook'

[Gluster-infra] Reminder OUTAGE Today 0800 EDT / 1200 UTC / 1730 IST

2018-05-14 Thread Nigel Babu
Hello, This is a reminder that we have a an outage today at the community cage outage window. The switches and routers will be getting updated and rebooted. This will cause an outage for a short period of time. -- nigelb ___ Gluster-infra mailing list

[Gluster-infra] Fwd: Planned Network Outage in Community Cage on May 15

2018-05-11 Thread Nigel Babu
Hello folks, There is a 15--minute cage outage on Tuesday 15th May. Jenkins and Gerrit will be affected by this outage. On Tuesday May 15th, there will be a brief (~15 minutes) network > outage in the Community Cage to allow for software upgrades on our > network equipment. The outage will

[Gluster-infra] Unplanned Jenkins restart

2018-04-16 Thread Nigel Babu
Hello folks, I've just restarted Jenkins for an security update to a plugin. There was one running centos-regression job that I had to cancel. -- nigelb ___ Gluster-infra mailing list Gluster-infra@gluster.org

[Gluster-infra] Jenkins upgrade today

2018-04-10 Thread Nigel Babu
Hello folks, There's a Jenkins security fix scheduled to be released today. This will most likely happen in the morning EDT. The Jenkins team has not specified a time. When we're ready for an upgrade, I'll cancel all running jobs and re-trigger them at te end of the upgrade. The downtime should

[Gluster-infra] Jenkins restart on Tuesday (27)

2018-03-21 Thread Nigel Babu
Hello folks, On Tuesday morning IST, I'll be upgrading and restarting build.gluster.org for an upcoming Jenkins plugin security issue related upgrade. -- nigelb ___ Gluster-infra mailing list Gluster-infra@gluster.org

Re: [Gluster-infra] [Gluster-devel] Announcing Softserve- serve yourself a VM

2018-03-20 Thread Nigel Babu
a machine is expired, one has to configure the machine and all > other stuff from the beginning. > > Thanks, > Sanju > > On Tue, Mar 13, 2018 at 12:37 PM, Nigel Babu <nig...@redhat.com> wrote: > >> >> We’ve enabled certain limits for this application: >&g

[Gluster-infra] Distributed Testing and Memory issues

2018-03-17 Thread Nigel Babu
Hey Karthik, Deepshikha has been working on testing the distributed test framework that you contributed (thank you!). Instead of writing our own code to chunk the tests, we've decided to just consume what you've written so we can work on making it run both at FB and upstream. We're running into

[Gluster-infra] Fwd: Query regarding coverity scan

2018-03-15 Thread Nigel Babu
Hi Kaleb, Do you know what's going wrong with Coverity jobs? -- Forwarded message -- From: Varsha Rao <va...@redhat.com> Date: Thu, Mar 15, 2018 at 10:15 AM Subject: Query regarding coverity scan To: Nigel Babu <nb...@redhat.com> Hello Nigel, I have been observing

[Gluster-infra] gluster-ant is now admin on synced repos

2018-03-15 Thread Nigel Babu
Hello, If there's a repo that's synced from Gerrit to Github, gluster-ant is now admin on those repos. This is so that when issues are closed via commit message, it is closed by the right user (the bot). Rather than the Infra person who set that repo up. As always, please file a bug if you

[Gluster-infra] Please help test Gerrit 2.14

2018-03-04 Thread Nigel Babu
Hello, It's that time again. We need to move up a Gerrit release. Staging has now been upgraded to the latest version. Please help test it and give us feedback on any issues you notice: https://gerrit-stage.rht.gluster.org/ -- nigelb ___ Gluster-infra

Re: [Gluster-infra] Continuous tests failure on Fedora RPM builds

2018-03-02 Thread Nigel Babu
This is now fixed. Shyam found the root case. After a mock upgrade, mock would wait for user confirmation that DNF wasn't installed on the system. Given this was a centos machine, DNF wasn't readily available. I set the config option dnf_warning=False and that fixed the failures. All previously

[Gluster-infra] Infra machines update

2018-02-19 Thread Nigel Babu
Hello folks, We're all out of Centos 6 nodes from today. I've just deleted the last of them. We now run exclusively on Centos 7 nodes. We've not received any negative feedback about plans to move NetBSD, so I've disabled and removed all the NetBSD jobs and nodes as well. -- nigelb

Re: [Gluster-infra] [Gluster-devel] Jenkins Issues this weekend and how we're solving them

2018-02-19 Thread Nigel Babu
On Mon, Feb 19, 2018 at 5:58 PM, Nithya Balachandran <nbala...@redhat.com> wrote: > > > On 19 February 2018 at 13:12, Atin Mukherjee <amukh...@redhat.com> wrote: > >> >> >> On Mon, Feb 19, 2018 at 8:53 AM, Nigel Babu <nig...@redhat.com> wrote: >&

[Gluster-infra] Jenkins Issues this weekend and how we're solving them

2018-02-18 Thread Nigel Babu
Hello, As you all most likely know, we store the tarball of the binaries and core if there's a core during regression. Occasionally, we've introduced a bug in Gluster and this tar can take up a lot of space. This has happened recently with brick multiplex tests. The build-install tar takes up

Re: [Gluster-infra] build.gluster.org in shutdown mode

2018-02-14 Thread Nigel Babu
This upgrade is now complete and we're now running the latest version of Jenkins. On Thu, Feb 15, 2018 at 9:53 AM, Nigel Babu <nig...@redhat.com> wrote: > Hello, > > I've just placed Jenkins in shutdown mode. No new jobs will be started for > about an hour from now. I intend

[Gluster-infra] build.gluster.org in shutdown mode

2018-02-14 Thread Nigel Babu
Hello, I've just placed Jenkins in shutdown mode. No new jobs will be started for about an hour from now. I intend to upgrade Jenkins to pull in the latest security fixes. -- nigelb ___ Gluster-infra mailing list Gluster-infra@gluster.org

[Gluster-infra] Planned Outage: supercolony.gluster.org on 2018-02-21

2018-01-31 Thread Nigel Babu
Hello folks, We're going to be resizing the supercolony.gluster.org on our cloud provider. This will definitely lead to a small outage for 5 mins. In the event that something goes wrong in this process, we're taking a 2-hour window for this outage. Date: Feb 21 Server: supercolony.gluster.org

[Gluster-infra] Gerrit: Maintainers can now edit review topic

2018-01-30 Thread Nigel Babu
Hello, Anoop pointed out that he couldn't edit the topic on a patch submitted by an external contributor. To improve our drive-by contribution, I've enabled Edit Topic permission for maintainers. This means you can fix topic problems when an external contributor submits a patch. Let me know if

Re: [Gluster-infra] Infra-related Regression Failures and What We're Doing

2018-01-22 Thread Nigel Babu
Update: All the nodes that had problems with geo-rep are now fixed. Waiting on the patch to be merged before we switch over to Centos 7. If things go well, we'll replace nodes one by one as soon as we have one green on Centos 7. On Mon, Jan 22, 2018 at 12:21 PM, Nigel Babu <nig...@redhat.

[Gluster-infra] Infra-related Regression Failures and What We're Doing

2018-01-21 Thread Nigel Babu
Hello folks, As you may have noticed, we've had a lot of centos6-regression failures lately. The geo-replication failures are the new ones which particularly concern me. These failures have nothing to do with the test. The tests are exposing a problem in our infrastructure that we've carried

[Gluster-infra] Please file a bug if you take a machine offline

2018-01-10 Thread Nigel Babu
Hello folks, If you take a machine offline, please file a bug so that the machine can be debugged and return to the pool. -- nigelb ___ Gluster-infra mailing list Gluster-infra@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-infra

[Gluster-infra] Shutting down cloud machines

2018-01-07 Thread Nigel Babu
Hello folks, In an effort to cut down machines that we don't use, I plan to shut down the following machines in the following days. Please let me know if for some reason I should not be shutting them down: salt-master.gluster.org webbuilder.gluster.org nbslave70.cloud.gluster.org

Re: [Gluster-infra] r.g.o returns a 503 error

2017-12-25 Thread Nigel Babu
Unplanned. I fixed this yesterday. Going to apply a more permanent fix today. The server restarted and we haven't implemented a way to start the service when the machine restarts. We're testing a systemd config file in staging and I'll look at applying that to production today. On Mon, Dec 25,

[Gluster-infra] Moving Regressions to Centos 7

2017-12-20 Thread Nigel Babu
Hello folks, We've been using Centos 6 for our regressions for a long time. I believe it's time that we moved to Centos 7. It's causing us minor issues. For example, tests run fine on the regression boxes but don't work on local machines or vice-versa. Moving up gives us the ability to use newer

[Gluster-infra] Changes in handling logs from (centos) regressions and smoke

2017-11-20 Thread Nigel Babu
Hello folks, We're making some changes in how we handle logs from Centos regression and smoke tests. Instead of having them available via HTTP access to the node itself, it will be available via the Jenkins job as artifacts. For example: Smoke job:

[Gluster-infra] Unplanned Jenkins restart

2017-11-19 Thread Nigel Babu
I noticed that Jenkins wasn't loading up this morning. Further debugging showed a java heap size problem. I tried to debug it, but eventually just restarted Jenkins. This means any running job or any job triggered was stopped. Please re-trigger your jobs. -- nigelb

Re: [Gluster-infra] gluster-zeroconf has been moved to the Gluster GitHub organisation

2017-11-16 Thread Nigel Babu
Please create a team add non-org admins as team maintainers. On Thu, Nov 16, 2017 at 7:27 PM, Niels de Vos wrote: > Hi all, > > I have moved the gluster-zeroconf repository from my personal github > account to the Gluster organisation one. Dustin Black and me are the >

[Gluster-infra] Unplanned Jenkins restart this morning

2017-11-08 Thread Nigel Babu
Hello folks, I had to do a quick Jenkins upgrade and restart this morning for an urgent security fix. A few of our periodic jobs were cancelled, I'll re-trigger them now. -- nigelb ___ Gluster-infra mailing list Gluster-infra@gluster.org

[Gluster-infra] Unplanned Gerrit Outage yesterday

2017-11-02 Thread Nigel Babu
Hello folks, Yesterday, we had an unplanned Gerrit outage. We have now determined that for some reason the machine rebooted for some reason. Michael is continuing to debug what lead to this issue. Gerrit does not start automatically when the VM restarted at this point. We are currently testing a

[Gluster-infra] Postmortem of emails from gerrit-stage.rht.gluster.org

2017-11-01 Thread Nigel Babu
Hello folks, Some of you may have gotten a large number of emails that your Gerrit filters did not catch. This is because we added a config to Gerrit Stage yesterday to start closing old reviews. This is a conversation we've had for a while and it needed testing. In anticipation of a large number

[Gluster-infra] Jenkins OS Upgrade Complete

2017-11-01 Thread Nigel Babu
Hello folks, The downtime window is now complete and here's a report on what's happened today. ## Jenkins We've moved build.gluster.org to a new server that runs Centos 7. This server is managed in Ansible, though we do not yet manage Jenkins in ansible. The old server is now called

[Gluster-infra] Jenkins outage on 1st Nov

2017-10-30 Thread Nigel Babu
Hello folks, We'll have a Jenkins outage on 1st Nov for Jenkins. This outage will be from 0900 to 1700 UTC. EDT: 0500 to 1300 CET: 1000 to 1800 IST: 1430 to 2230 Given that Michael and I are co-located and this is a holiday in India, it's a good opportunity for us to fix a lot of security

[Gluster-infra] Quarterly Infra Updates

2017-10-30 Thread Nigel Babu
It's been a while since I posted an update. We're shifting to a quarterly update system from this time onwards. Here's what's kept us busy last quarter: * We've moved our long-term planning from "bugs" to a (currently private) Trello board. This helps us plan for long-term projects and scheduling

Re: [Gluster-infra] Upgrading Gerrit at 1400 UTC

2017-10-30 Thread Nigel Babu
Hello, This upgrade is now complete. We've tested a push and merge. Please let us know if you run into any troubles. On Mon, Oct 30, 2017 at 10:09 AM, Nigel Babu <nig...@redhat.com> wrote: > Hello, > > We've been running Gerrit staging on the latest version of Gerrit. We've &g

[Gluster-infra] Upgrading Gerrit at 1400 UTC

2017-10-30 Thread Nigel Babu
Hello, We've been running Gerrit staging on the latest version of Gerrit. We've planned several times to do an upgrade, but the timing hasn't worked out. Given that most people are traveling back from Gluster Summit, I'll be working on doing an upgrade today. The downtime window will be from

Re: [Gluster-infra] Jenkins Nodes changes

2017-10-11 Thread Nigel Babu
t; Regards, > Amar > > On 11-Oct-2017 10:14 AM, "Nigel Babu" <nig...@redhat.com> wrote: > >> Hello folks, >> >> I've just gotten back after a week away. I've made a couple of changes to >> Jenkins nodes: >> >> * All smoke jobs now run o

[Gluster-infra] review.gluster.org outage today

2017-09-25 Thread Nigel Babu
Hello folks, We had a brief outage today of review.gluster.org. # Timeline of Events (Times in IST) 1311: I receive a notification from monitoring that review.gluster.org is throwing 503 errors. I logged into the machine and noticed that Gerrit wasn't running at all. I started the service and it

[Gluster-infra] review.gluster.org outage for the next 5 mins

2017-09-19 Thread Nigel Babu
Hello folks, We need to do a restart of Gerrit thanks to a Java security update. We'll be back in ~5 mins or so. -- nigelb ___ Gluster-infra mailing list Gluster-infra@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-infra

[Gluster-infra] Postmortem for yesterday's outage

2017-09-18 Thread Nigel Babu
Hello folks, We had a brief outage yesterday that misc and I were working on fixing. We're committed to doing a formal post-mortem of outages whether it affects everyone or not, as a habit. Here's a post-mortem of yesterday's event. ## Affected Servers * salt-master.rax.gluster.org *

[Gluster-infra] Postmortem for Thursday's outage

2017-09-18 Thread Nigel Babu
Hello folks, We had a brief outage yesterday that misc and I were working on fixing. We're committed to doing a formal post-mortem of outages whether it affects everyone or not, as a habit. Here's a post-mortem of yesterday's event. ## Affected Servers * salt-master.rax.gluster.org *

Re: [Gluster-infra] Jenkins Restart at 1830 IST

2017-08-29 Thread Nigel Babu
The restart and upgrade is complete. Any jobs that were triggered during the quiet period are starting up now. On Tue, Aug 29, 2017 at 4:05 PM, Nigel Babu <nig...@redhat.com> wrote: > Hello folks, > > We need to fix a networking problem on Jenkins, upgrade, and apply a few &

Re: [Gluster-infra] [Gluster-devel] Migration from gerrit bugzilla hook to a jenkins job

2017-08-17 Thread Nigel Babu
On Thu, Aug 17, 2017 at 5:36 PM, Mohammed Rafi K C <rkavu...@redhat.com> wrote: > > > On 08/17/2017 05:07 PM, Nigel Babu wrote: > > This change is taking the first step towards implementing those ideas. One > of the major blockers to implementing them was that it was d

Re: [Gluster-infra] [Gluster-devel] Migration from gerrit bugzilla hook to a jenkins job

2017-08-17 Thread Nigel Babu
This change is taking the first step towards implementing those ideas. One of the major blockers to implementing them was that it was difficult to grant easy access to change the hook. Granting production access to Gerrit is next to impossible unless you really know what you're doing. We'll not

Re: [Gluster-infra] Jenkins and Gerrit issues today

2017-07-17 Thread Nigel Babu
We just started using the Jenkins pipeline and its associated plugins: https://build.gluster.org/job/nightly-master/ On Mon, Jul 17, 2017 at 6:08 PM, Michael Scherer <msche...@redhat.com> wrote: > Le vendredi 14 juillet 2017 à 13:00 +0530, Nigel Babu a écrit : > > Hello, > &

Re: [Gluster-infra] Did we exhausted the disk space?

2017-07-17 Thread Nigel Babu
Please file a bug: https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS=project-infrastructure On Mon, Jul 17, 2017 at 6:17 PM, Karthik Subrahmanya wrote: > Hi, > > One of my patch[1] just failed smoke test with error: > > Disk Requirements:At least 21MB more space

Re: [Gluster-infra] Jenkins and Gerrit issues today

2017-07-14 Thread Nigel Babu
Note: I've retriggered smoke/regression where appropriate for all patches posted since the issue started. On Fri, Jul 14, 2017 at 1:00 PM, Nigel Babu <nig...@redhat.com> wrote: > Hello, > > ## Highlights > * If you pushed a patch today or did "recheck centos", pleas

[Gluster-infra] Jenkins and Gerrit issues today

2017-07-14 Thread Nigel Babu
Hello, ## Highlights * If you pushed a patch today or did "recheck centos", please do a recheck. Those jobs were not triggered. * Please actually verify that the jobs for your patches have started. You can do that by visiting https://build.gluster.org/job/smoke/ (for smoke) or

Re: [Gluster-infra] Missing information - netbsd regression.

2017-07-05 Thread Nigel Babu
We no longer run netbsd regression on a per patch basis. See: http://lists.gluster.org/pipermail/gluster-devel/2017-June/053080.html On Tue, Jul 4, 2017 at 6:33 AM, Ravishankar N wrote: > Hi, > > https://build.gluster.org/job/netbsd7-regression/ used to show the patch >

[Gluster-infra] Gerrit and Jenkins status

2017-06-26 Thread Nigel Babu
Hello folks, We had downtimes today for both Gerrit and Jenkins related upgrades. The Gerrit upgrade went very smoothly. We will need to figure out a date in the short-term where we'll upgrade to the next major release. The Jenkins upgrade caused some issues causing some downtime. The latest

[Gluster-infra] Jenkins outage on Jun 26

2017-06-22 Thread Nigel Babu
Hello folks, We'll also have a short Jenkins outage on 26 June 2017, for a Jenkins plugin installation and upgrade. Date: 26th June 2017 Time: 0230 UTC (2230 EDT / 0430 CEST / 0800 IST) Duration: 1h Jenkins will be in a quiet time from 1h before the outage where no new builds will be allowed to

Re: [Gluster-infra] Cleaning up Jenkins

2017-06-20 Thread Nigel Babu
On Thu, Apr 20, 2017 at 10:57:53AM +0530, Nigel Babu wrote: > Hello folks, > > As I was testing the Jenkins upgrade, I realized we store quite a lot of old > builds on Jenkins that doesn't seem to be useful. I'm going to start cleaning > them slowly in anticipation of movi

Re: [Gluster-infra] build.gluster.org downtime on 20th June

2017-06-19 Thread Nigel Babu
Other than Gerrit, the rest can be rebooted ruthlessly. On 19-Jun-2017 23:16, "Michael Scherer" <msche...@redhat.com> wrote: > Le lundi 19 juin 2017 à 17:55 +0200, Michael Scherer a écrit : > > Le lundi 19 juin 2017 à 17:12 +0530, Nigel Babu a écrit : > > &

[Gluster-infra] build.gluster.org downtime on 20th June

2017-06-19 Thread Nigel Babu
Hello folks, We'll be having a short downtime for build.gluster.org on 20th June 2017 (tomorrow). Date: 20th June 2017 Time: 0230 UTC (2230 EDT / 0430 CEST / 0800 IST) Duration: 1h Jenkins will be in a quiet time from 1h before the outage where no new builds will be allowed to start. This

Re: [Gluster-infra] Weird branches in the glusterfs repository on GitHub

2017-05-01 Thread Nigel Babu
The first one is a Gerrit configuration branch and it *should* be there. The other two, well, I recommend asking around to see who created those branches. There's no "sync script". We sync all of what's in Gerrit over to Github. If it's there, it means someone created it on Gerrit. On Sun, Apr

[Gluster-infra] Jenkins authentication with Github

2017-04-27 Thread Nigel Babu
Hello folks, In testing the Jenkins upgrade, I learned that we allow Jenkins read access to /etc/shadow to allow Unix authentication. In addition to this, our authentication was open to brute force attacks and hard to keep the user list updated. To ease these pains, we've switched Jenkins

Re: [Gluster-infra] Jenkins Upgrade

2017-04-27 Thread Nigel Babu
Hello, The upgrade is now complete and we should be good to go. Please let me know if there are any problems. -- nigelb On Thu, Apr 27, 2017 at 11:34:05AM +0530, Nigel Babu wrote: > Hello folks, > > The first part of the Jenkins upgrade has now begun. The Jenkins server is now > o

[Gluster-infra] Jenkins Upgrade

2017-04-27 Thread Nigel Babu
Hello folks, The first part of the Jenkins upgrade has now begun. The Jenkins server is now on quiet mode. No new builds will be scheduled. I will be shuting down Jenkins in the next 1h to begin the upgrade. -- nigelb signature.asc Description: PGP signature

Re: [Gluster-infra] Planned Jenkins Outage on 27th Apr (Thu)

2017-04-26 Thread Nigel Babu
Hello A reminder that this is today. On Tue, Apr 18, 2017 at 04:06:25PM +0530, Nigel Babu wrote: > Hello folks, > > We're announcing a Jenkins outage window on 27th Apr 2017 during the following > times: > > 0300 - 0700 EDT > 0700 - 1100 UTC > 0800 - 120

[Gluster-infra] Cleaning up Jenkins

2017-04-19 Thread Nigel Babu
Hello folks, As I was testing the Jenkins upgrade, I realized we store quite a lot of old builds on Jenkins that doesn't seem to be useful. I'm going to start cleaning them slowly in anticipation of moving Jenkins over to a CentOS 7 server in the not-so-distant future. * Old and disabled jobs

[Gluster-infra] Planned Jenkins Outage on 27th Apr (Thu)

2017-04-18 Thread Nigel Babu
Hello folks, We're announcing a Jenkins outage window on 27th Apr 2017 during the following times: 0300 - 0700 EDT 0700 - 1100 UTC 0800 - 1200 CEST 1230 - 1630 IST This is purposely in the middle of the working day so we can identify any issues that might come up post-upgrade and fix them

Re: [Gluster-infra] [Gluster-devel] Is anyone else having trouble authenticating with review.gluster.org over ssh?

2017-04-16 Thread Nigel Babu
This should be fixed now: https://bugzilla.redhat.com/ show_bug.cgi?id=1442672 Vijay, can you link me to your failed Jenkins job? Jenkins should have been able to clone since it uses the git protocol and not SSH. On Sun, Apr 16, 2017 at 9:25 PM, Vijay Bellur wrote: > > > On

[Gluster-infra] compare-bug-version-and-git-branch now runs on bugziller.rht.gluster.org

2017-04-05 Thread Nigel Babu
Hello folks, I've been slowly working on moving jobs off our master machine on build.gluster.org. This will let us upgrade to a new Jenkins server without too much pain. In this regard, I've moved compare-bug-version-and-git-branch off master onto bugziller.rht.gluster.org. This machine has

  1   2   3   >