On Thu, 2009-08-20 at 10:06 +1000, Martin Pool wrote: > When I was trying to use it in Taipei for demos, staging seemed to be > down quite a lot over Friday and the weekend. Was that a known issue?
Eventually... The thread in question is "make not working in devel" The fail over the weekend, as best I could determine, was a follow on from the original. Essentially the staging restore process left the app services in gaga land. The process listing was a mess - would be another description. :-/ Excess mailman processes that only responded to a hard kill. The App (etc) server itself recovered after a straight stop/start cycle. > I couldn't find anything on the list in a brief scan. What's the > right escalation for such an outage? I'd suggest there isn't one and shouldn't be one, at least outside core hours. Within core hours the process is well established: we get alerts and respond based on priority need. Treating it as a production system would entail needing another another system that could trial run these updates - and break - in the current automated fashion. ie staging-production vs staging-staging. Or bring back the rarely updated demo system. Following on from this particular staging issue, we have added an RT task against ourselves to add a "die! Die! DIE!!!!" loop to the init.d scripts for all the launchpad services. It's excessive, but appears necessary. Cheers! - Steve _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

