On Wed, Nov 5, 2014 at 5:10 PM, Ryan Kaldari <[email protected]> wrote: > On Wed, Oct 29, 2014 at 11:31 AM, Greg Grossmeier <[email protected]> > wrote: >> >> I fear much of the worry about Beta Cluster is due to the rocky >> transition to HHVM (which was less than ideal). We are better >> equipped/able to deal with such changes in the future right now (and >> we are no longer experiencing HHVM-related issues, afaict). > > > We've been told this many times, i.e. "Don't worry, the problems are all in > the past." Yet Beta Labs keep having serious outages on a weekly basis. Just > a few days after you sent this email, mobile Beta Labs had a nearly full-day > outage which caused serious headaches for both the mobile and VE teams.[1] I > would totally love to stay on the existing Beta Labs cluster, but we just > keep having these outages week after week, despite assurances that things > would stabilize. This is a major pain point that we need to have addressed > in some way or another. Would spinning up an Alpha Labs cluster (for > experimental features) be a reasonable solution? > > 1. https://bugzilla.wikimedia.org/show_bug.cgi?id=72997
That outage was caused by https://gerrit.wikimedia.org/r/#/c/171055/ which was reviewed and merged by members of the mobile team. I'm not sure how this is the fault of beta. Actually I'd say that it is exactly what beta is for. The real problem here was that nobody investigated the cause of the problem by logging in to beta and looking at the logs. I'm pretty sure that this particular error could have been reproduced in any development environment by checking out the current git HEAD. I'm not trying to pick a fight here, I really am trying to get a handle on the general expectations of the teams that are making heavy use of beta in their daily workflow. If we had a two stage integration environment (alpha & beta in the current local vernacular), this error would have appeared in the alpha environment first. Depending on the test coverage it may or may not have been caught before the gating process to advance the code to the slightly more stable beta environment. There are some advantages to this sort of system but they come at a cost. Typically that cost is a slower pace of production change. Depending on where you come down on the "ship it and see what happens" spectrum you may see that as a good or a bad thing. Bryan -- Bryan Davis Wikimedia Foundation <[email protected]> [[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA irc: bd808 v:415.839.6885 x6855 _______________________________________________ QA mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/qa
