On Thu, Aug 23, 2018 at 10:49 AM, Michael Scherer <msche...@redhat.com> wrote:
> Le jeudi 23 août 2018 à 11:21 +0530, Nigel Babu a écrit : > > One more piece that's missing is when we'll restart the physical > > servers. > > That seems to be entirely missing. The rest looks good to me and I'm > > happy > > to add an item to next sprint to automate the node rebooting. > > That's covered as "as critical as the services that depend on them. > > Now, the problem I do have is that some server (myrmicinae to name it) > do take 30 minutes to reboot, and I can't diagnose nor fix without > taking hours. This is the one running gerrit/jenkins, so that's not > possible to spent time on this kind of test. > You'd imagine people would move to kexec reboots for VMs by now. Not sure why it's not catching up. (BTW, is it taking time to shutdown or to bring up?) Y. > > > > > On Tue, Aug 21, 2018 at 9:56 PM Michael Scherer <msche...@redhat.com> > > wrote: > > > > > Hi, > > > > > > so that's kernel reboot time again, this time courtesy of Intel > > > (again). I do not consider the issue to be "OMG the sky is > > > falling", > > > but enough to take time to streamline our process to reboot. > > > > > > > > > > > > Currently, we do not have a policy or anything, and I think the > > > negociation time around that is cumbersome: > > > - we need to reach people, which take time and add latency (would > > > be > > > bad if that was a urgent issue, and likely add undeed stress while > > > waiting) > > > > > > - we need to keep track of what was supposed to be done, which is > > > also > > > cumbersome > > > > > > While that's not a problem if I had only gluster to deal with, my > > > team > > > of 3 do have to deal with a few more projects than 1, and > > > orchestrating > > > choice for a dozen of group is time consuming (just think last time > > > you > > > had to go to a restaurant after a conference to see how hard it is > > > to > > > reach agreements). > > > > > > So I would propose that we simplify that with the following policy: > > > > > > - Jenkins builder would be reboot by jenkins on a regular basis. > > > I do not know how we can do that, but given that we have enough > > > node to > > > sustain builds, it shouldn't impact developpers in a big way. The > > > only > > > exception is the freebsd builder, since we only have 1 functionnal > > > at > > > the moment. But once the 2nd is working, it should be treated like > > > the > > > others. > > > > > > - service in HA (firewall, reverse proxy, internal squid/DNS) would > > > be > > > reboot during the day without notice. Due to working HA, that's non > > > user impacting. In fact, that's already what I do. > > > > > > - service not in HA should be pushed for HA (gerrit might get there > > > one > > > day, no way for jenkins :/, need to see for postgres and so > > > fstat/softserve, and maybe try to get something for > > > download.gluster.org) > > > > > > - service critical and not in HA should be announced in advance. > > > Critical mean the service listed here: https://gluster-infra-docs.r > > > eadt > > > hedocs.io/emergency.html > > > > > > - service non visible to end user (backup servers, ansible > > > deployment > > > etc) can be reboot at will > > > > > > Then the only question is what about stuff not in the previous > > > category, like softserve, fstat. > > > > > > Also, all dependencies are as critical as the most critical service > > > that depend on them. So hypervisors hosting gerrit/jenkins are > > > critical > > > (until we find a way to avoid outage), the ones for builders are > > > not. > > > > > > > > > > > > Thoughts, ideas ? > > > > > > > > > -- > > > Michael Scherer > > > Sysadmin, Community Infrastructure and Platform, OSAS > > > > > > _______________________________________________ > > > Gluster-infra mailing list > > > Gluster-infra@gluster.org > > > https://lists.gluster.org/mailman/listinfo/gluster-infra > > > > > > > -- > Michael Scherer > Sysadmin, Community Infrastructure and Platform, OSAS > > > _______________________________________________ > Gluster-devel mailing list > gluster-de...@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel >
_______________________________________________ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra