Hi, I'd like to see us move from a single host asap. I wonder if a sensible approach would be to migrate services one at a time to a more traditional yet modern setup - starting with git and web. If we stick to the same tools then it should be possible with a few DNS changes to have it running at 90% pretty fast. Who controls DNS?
As of mid June I will have nothing better to do so could look into doing a bunch of this and documenting as I go. TBH I'd prefer Debian over anything else for servers but I'm a little out of touch with how to get cheap hosting - and I don't have a spare box in my rack to offer... Andy On Fri, 2 Jun 2017 at 05:23, Jonathan Aquilina <jaquil...@eagleeyet.net> wrote: > Hi Guys beber is not the only one in charge, I have volunteered but for > some reason he is waiting for me to give him a pgp key. Not sure what > difference having a pgp key will make. I can easily provide him with an > ssh key if needed no problem. > > Another thing which worries me about even touching the server is its > complexity of the setup with out any documentation. I could offer > another server that we can run kvm on migrate the vm's too temporarily > and rebuild the entire server with out having any down time on the > current setup. Also I do not think its kvm so I am not sure how > switching to containers will help either. I do agree though moving away > from gentoo might be better or at least a not so complex setup. > > --- > Regards, > > Jonathan Aquilina > > On 2017-06-02 02:37, Carsten Haitzler wrote: > > > On Thu, 1 Jun 2017 09:48:27 -0500 Derek Foreman <der...@osg.samsung.com> > said: > > > >> Hi, > >> > >> git's still down, phab's still greeting me with the ridiculously > >> rude/embarrassing message that seems to imply that it's the fault of the > >> person trying to connect that something's wrong. (Gee, a spanking, > thanks!) > > > > i left it down for beber to debug (he implied he was going to). i hope > he did. > > i know 100% for sure this has nothing to do with jenkins. jenkins itself > fails > > sometimes due to it's vm ram running out etc. as it runs too many builds > in > > it or something... but it's not jenkins fault. every time this happens > the vm > > processes are mostly idle (0-10% cpu or so) and maybe only 1 or 2 hit > 10%. it's > > deeper down. i've explained before. what i've seen is i/o simply > stalling until > > some kind of timeout. restarting the vm's themselves doesn't help. only > > rebooting the host seems to help. > > > >> Our server problems are a continually annoying barrier to work, and the > >> thought process around debugging them seems fairly baffling - even if > >> disabling jenkins makes things stumble along for a while it's surely not > >> a real fix for anything? > > > > beber has said he plans to move to containers. that was a while ago he > said > > that. so either we wait for beber or someone just takes over and tears > down our > > setup and rebuilds it without him. tbh all the vm's run gentoo and this > just > > makes that exceedingly painful as anything that needs installation needs > > compilation (even more fun upgrading a kernel in gentoo). > > > > if i were going to do this, i'd just want to reinstall the whole host > with > > something i can set up easily and quickly (that is certainly not > gentoo). to do > > this i have to get openvpn to work with osuosl properly to get a text > console > > working to do a re-install anyway. > > > >> Jenkins runs in userspace, userspace problems shouldn't be able to > >> cripple a server. And whatever we may think jenkins is doing to cause a > >> problem can be done by any other userspace program. Other threads seem > >> to blame a mysterious kernel bug nobody else can trigger - do we think > >> Jenkins is aggravating that for us and us alone? > > > > i actually suspect it's some paranoid switch beber has enabled. he's > pretty > > paranoid about the servers and security etc. and maybe has done something > > unusual. our server setup is highly unusual in my experience... so i > suspect > > it's this... what it is - i don't know, but i'd be a fan of "don't be so > > paranoid about security and get the damned thing to work stably". that > served > > us pretty well for many years on e1/e2... i'm not advocating no > security... > > just keep things as simple as necessary to work. > > > >> Turning it off appears to provide zero information as to what the real > >> problem is and costs us useful functionality. > >> > >> At what point do we consider, say, applying for freedesktop.org > hosting? > >> I'm sure I'm going to see arguments that they don't provide something > >> we need, however right now we have exactly no functional services. No > >> web, no phab, no git. Our mailing list continues to work because > >> someone else hosts it for us. > > > > Then we abandon all review tools, abandon all filed bugs/tickets, wiki > etc. > > etc. ? Not going to do that. Just moving git means having to reconfigre > > everything that manages and controls it. it means now having to ask fdo > admins > > all the time for stuff (like git hooks to trigger jenkins builds). This > is not > > a good solution. > > > >> At least having a stable git repository is critical to getting work done > >> and allowing users/packagers/whoever to get the software - we can still > >> run less important services ourselves and have them fail without > >> blocking everyone. > > > > The real core issue is that beber is solely in charge of our server > setup and > > he doesn't have the time to fix it. I'm going to leave it to him, as the > > alternative is me just jumping in and doing it and that means a full > system > > rebuild in a simpler way which is actually impossible for me to do any > time > > except weekends assuming all the IPMI console stuff was working and I > could > > boot a new OS installer and start again (after making a full system > backup of > > everything that is there). It'd likely result in an e.org that is down > for > > multiple weeks as I likely cant rebuild it in a single weekend. And this > even > > assumes I can figure out how it's all put together... I don't know how > network > > is set up and a lot more... how packets route to where they end up and > so on... > > > > Thanks, > > Derek > > > > (And could someone *please* change that stupid "SPANK SPANK SPANK" > > message? Is that really what we want potential users to see during our > > perpetual server outtages?) > > > > On 31/05/17 06:14 PM, Carsten Haitzler (The Rasterman) wrote: On Wed, 31 > May 2017 21:08:05 +0100 Bertrand Jacquin <bertr...@jacquin.bzh> > > said: > > > > On 31/05/2017 13:21, Stefan Schmidt wrote: Hello. > > > > On 05/29/2017 12:41 PM, Stefan Schmidt wrote: Hello. > > > > Once again I had problems with the E server being over capacity and > > returning 503 when accessing phab. > > > > I often heard claims that the load of Jenkins jobs trigger the bug we > > see here. To actually see if the load due to Jenkins jobs is related > > to this I now disabled all build triggers on Jenkins. That means no > > Jenkins jobs are going to run until I enable these again. be more > > careful with your commits. > > > > Right now the last builds are being performed and after that we should > > have no new ones. > > > > I will go an re-enable them again if we either see the same server > > instabilities without Jenkins running (which proves in my book hat > > Jenkins is not the culprit) or in something like 2 or 3 weeks even if > > we have no server problems. In the later case we will have to see what > > to do. > > And phab 503 again. Without any Jenkins jobs running at all. The last > > one have been running two days ago. > > > > Beber, you still think Jenkins is to blame? > > > > I will leave it disabled for a few more days to see how that goes. > > Yes, this is still my main believe as of today due to qemu layer. We > won't be able to take any conclusion before I've been able to reboot the > physical host after Jenkins build were disabled, this will be done by > tomorrow. > > Thanks for disabling the job, that will definitely help to understand > what to blame. > I may be wrong of course, but at least we'll know. > everything is down now... no phab. no e.org. no git. i literally can't > do > anything. i'm relying on phab for some patch review discussions (gl > thread). > :( i'm not touching e5 so you can poke around and look... > > > Cheers, > > Bertrand > > > > -- > > Bertrand > > > > > ------------------------------------------------------------------------------ > > Check out the vibrant tech community on one of the world's most > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > _______________________________________________ > > enlightenment-devel mailing list > > enlightenment-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/enlightenment-devel > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > enlightenment-devel mailing list > enlightenment-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/enlightenment-devel > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > enlightenment-devel mailing list > enlightenment-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/enlightenment-devel > -- http://andywilliams.me http://ajwillia.ms ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel