Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Hi, The maintenance was scheduled on Monday, for the day after that. We had only a few hours to plan for it and communicate about it, and I think we did a pretty good job given the time we had. The maintenance banner was up for a few hours (not a day) prior to the maintenance window to give readers editors a heads-up. The notice was also posted to social media channels (identica, twitter, facebook) as well as on the most relevant lists. I think that amount of communication is reasonable for a planned maintenance operation that shouldn't result in long downtime. As it was already mentioned in this thread, database errors weren't expected during this network maintenance. It's always possible that unplanned issues arise, and this is why the error page shouldn't be too specific: if we plan for an issue and we end up encountering another one, the error page may display incorrect information about the cause or, more importantly, the severity of the issue. About more ways to communicate on outages: I have a few items on my todo list about this as well, so I'm glad that they were brought up in this thread. The status.wikimedia.org page could certainly be designed in a way that emphasizes the main information; I'm also investigating whether we can use an API to display information on other places, e.g. the Wikimedia blog (assuming the blog isn't down too). I also agree the WMF error page could be improved. As a matter of fact, I started thinking about how to improve it a few weeks ago. If you're interested in this, I would welcome your help. Thanks, -- Guillaume Paumier ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On 05/25/2011 01:12 PM, Tim Starling wrote: On 25/05/11 18:14, Thomas Morton wrote: IRC was flooded with people who didn't understand what was going on. And many didn't believe/understand that it was maintenance... so this is definitely an area worth improving. Maybe we can replace the IRC link in the Squid error message with a link to the WatchMouse page (status.wikimedia.org). That would reduce the IRC flood. Site notice for a week before the maintenance would be useful, too. We communicate with our users via web site, not via emails. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Milos Rancic, 26/05/2011 09:57: Site notice for a week before the maintenance would be useful, too. We communicate with our users via web site, not via emails. A week of pain to signal (and not avoid) an hour of pain? Doesn't look like a gain. Nemo ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
I'm pretty sure there was a site notice; I recall seeing one anyway :) Tom On 26 May 2011 09:09, Federico Leva (Nemo) nemow...@gmail.com wrote: Milos Rancic, 26/05/2011 09:57: Site notice for a week before the maintenance would be useful, too. We communicate with our users via web site, not via emails. A week of pain to signal (and not avoid) an hour of pain? Doesn't look like a gain. Nemo ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On 05/26/2011 10:09 AM, Federico Leva (Nemo) wrote: Milos Rancic, 26/05/2011 09:57: Site notice for a week before the maintenance would be useful, too. We communicate with our users via web site, not via emails. A week of pain to signal (and not avoid) an hour of pain? Doesn't look like a gain. A small site notice? Not shown after dismissal? :) I mean, there are always ways to make site notices less intrusive. It is now common to get notice ~2 weeks before maintenance. And the most of our users are not getting foundation-l or announcement-l emails. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
There was, it ran for a day. ( http://meta.wikimedia.org/wiki/Special:CentralNotice)- Generic maintenance notice. Theo On Thu, May 26, 2011 at 1:41 PM, Thomas Morton morton.tho...@googlemail.com wrote: I'm pretty sure there was a site notice; I recall seeing one anyway :) Tom On 26 May 2011 09:09, Federico Leva (Nemo) nemow...@gmail.com wrote: Milos Rancic, 26/05/2011 09:57: Site notice for a week before the maintenance would be useful, too. We communicate with our users via web site, not via emails. A week of pain to signal (and not avoid) an hour of pain? Doesn't look like a gain. Nemo ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Thomas Morton, 26/05/2011 10:11: I'm pretty sure there was a site notice; I recall seeing one anyway :) For a day: http://meta.wikimedia.org/wiki/Special:CentralNotice Nemo ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On 05/26/2011 10:18 AM, Theo10011 wrote: There was, it ran for a day. ( http://meta.wikimedia.org/wiki/Special:CentralNotice)- Generic maintenance notice. So, then it should just last a bit longer (maybe three days if not a week?) and we would avoid the most of complains. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On 26/05/11 17:57, Milos Rancic wrote: On 05/25/2011 01:12 PM, Tim Starling wrote: On 25/05/11 18:14, Thomas Morton wrote: IRC was flooded with people who didn't understand what was going on. And many didn't believe/understand that it was maintenance... so this is definitely an area worth improving. Maybe we can replace the IRC link in the Squid error message with a link to the WatchMouse page (status.wikimedia.org). That would reduce the IRC flood. Site notice for a week before the maintenance would be useful, too. We communicate with our users via web site, not via emails. I think a banner for a week would be excessive to advertise 2 minutes of downtime. I think a banner for a day was excessive. Some people don't care if Wikipedia is down for 2 minutes. If it wasn't my job to care, I would be one of them. -- Tim Starling ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
We already get spammed enough with notices, which is one of the reasons many people hide them permanently via css so they never intrude again, which would make them pointless for the more established users, also overkill for what was meant to be (from my understanding) only a few minutes of downtime. Just because we can send something via notices doesn't mean we should, it can and has been devaluing their importance to people. As for the earlier comments about changing the irc channel, how about we point it to the #wikimedia-status (or whatever its called) channel that is designed to hand out info in downtimes compared to the general #wikipedia which i believe it currently points to. -Peachey ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On 24/05/11 23:32, Thomas Morton wrote: So, just a quick thought for future reference - during maintenance is it possible in future to update the error message to explain that maintenance is ongoing? Seeing as how widely WMF projects are used by a non-technical project the current MySQL connection error I am seeing on Commons is just going to cause confusion :) And the standard error page WIkipedia was showing a minute ago is not particularly helpful/explanatory in this specific situation. Database connection errors were not an anticipated consequence of the scheduled router upgrades. The people who might have been able to change the error message were busy diagnosing and fixing the problem. When we have a lengthy period of downtime, more sysadmins arrive online, and a wider perspective on the problem develops, including attention to community impact and communication. But since the downtime in this case was only half an hour, there was not enough time for this to happen. -- Tim Starling ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
I don't get this. Would it be possible in future, if the sites are unresponsive, or will be unresponsive due to planned maintenance, to establish a fallback that simply displays an explanatory status message to the public? FT2 On Wed, May 25, 2011 at 8:15 AM, Tim Starling tstarl...@wikimedia.orgwrote: (snip) The people who might have been able to change the error message were busy diagnosing and fixing the problem. When we have a lengthy period of downtime, more sysadmins arrive online, and a wider perspective on the problem develops, including attention to community impact and communication. But since the downtime in this case was only half an hour, there was not enough time for this to happen. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On 25/05/11 17:32, FT2 wrote: I don't get this. Would it be possible in future, if the sites are unresponsive, or will be unresponsive due to planned maintenance, to establish a fallback that simply displays an explanatory status message to the public? You mean replace the entire site with an error page? But only part of the site was down. More and more things became accessible as each database server was fixed. I'm not sure how this could work. Even if we did prepare an error message saying Wikipedia will be down for 2 minutes while a router restarts, I don't think that could be called explanatory if it were displayed for half an hour. Writing informative error messages and displaying them in appropriate places is necessarily a low-priority task during downtime, the higher priority task being to get the site working again. Maybe at some time in the future, we will have enough 24/7 sysadmin manpower that we can respond to any unplanned downtime in the way you suggest. But we don't have that capability just yet. -- Tim Starling ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
I think it's reasonable (and indeed standard) to deploy some sort of downtime maintenance error message. If that requires improving the error handling code to catch a wider variety of errors and push people to the error message page then I understand the time issues :). If the short term solution is that the error page that kept appearing gets tweaked (before the maintenance is started) to explain what is happening, then that seems fine. IRC was flooded with people who didn't understand what was going on. And many didn't believe/understand that it was maintenance... so this is definitely an area worth improving. I'm not trying to criticise; just passing on some ideas based on the issues raised. Tom On 25 May 2011 08:56, Tim Starling tstarl...@wikimedia.org wrote: On 25/05/11 17:32, FT2 wrote: I don't get this. Would it be possible in future, if the sites are unresponsive, or will be unresponsive due to planned maintenance, to establish a fallback that simply displays an explanatory status message to the public? You mean replace the entire site with an error page? But only part of the site was down. More and more things became accessible as each database server was fixed. I'm not sure how this could work. Even if we did prepare an error message saying Wikipedia will be down for 2 minutes while a router restarts, I don't think that could be called explanatory if it were displayed for half an hour. Writing informative error messages and displaying them in appropriate places is necessarily a low-priority task during downtime, the higher priority task being to get the site working again. Maybe at some time in the future, we will have enough 24/7 sysadmin manpower that we can respond to any unplanned downtime in the way you suggest. But we don't have that capability just yet. -- Tim Starling ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
priority task being to get the site working again. Maybe at some time in the future, we will have enough 24/7 sysadmin manpower that we can respond to any unplanned downtime in the way you suggest. But we don't have that capability just yet. In future we will have five nines availability and no downtimes will happen. Domas ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
In future can I have vanilla and strawberry with that? :) FT2 On Wed, May 25, 2011 at 9:16 AM, Domas Mituzas midom.li...@gmail.comwrote: In future we will have five nines availability and no downtimes will happen. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On Wed, May 25, 2011 at 9:32 AM, FT2 ft2.w...@gmail.com wrote: I don't get this. Would it be possible in future, if the sites are unresponsive, or will be unresponsive due to planned maintenance, to establish a fallback that simply displays an explanatory status message to the public? Would it have changed anything for you? I tried to load Wikipedia a few times during the downtime, and a maintenance error actually did appear most of the time. I did get a few database errors, but I assumed that I wasn't the first to notice and that someone was diligently working on it. Regardless, my action was the same as it would have been in any case: try back later. Austin ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Austin, That's interesting, what was the wording for the maintenance message? I only ever saw the default our servers are experiencing a technical problem error page. Tom On 25 May 2011 10:53, Austin Hair adh...@gmail.com wrote: On Wed, May 25, 2011 at 9:32 AM, FT2 ft2.w...@gmail.com wrote: I don't get this. Would it be possible in future, if the sites are unresponsive, or will be unresponsive due to planned maintenance, to establish a fallback that simply displays an explanatory status message to the public? Would it have changed anything for you? I tried to load Wikipedia a few times during the downtime, and a maintenance error actually did appear most of the time. I did get a few database errors, but I assumed that I wasn't the first to notice and that someone was diligently working on it. Regardless, my action was the same as it would have been in any case: try back later. Austin ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On Wed, May 25, 2011 at 11:57 AM, Thomas Morton morton.tho...@googlemail.com wrote: That's interesting, what was the wording for the maintenance message? I only ever saw the default our servers are experiencing a technical problem error page. I could be misremembering, because I honestly didn't care that much, but I do believe I saw the word maintenance in there somewhere. Either way, it was as informative as any message could be under the circumstances—unless, as Tim already addressed, you wanted a developer assigned to updating the message in real time. Austin ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
unless, as Tim already addressed, you wanted a developer assigned to updating the message in real time. No, definitely not what was being suggested. This is the error message that appeared for me (and apparently others): http://nomulous.com/blog/wp-content/uploads/2009/09/wikipedia_error.png As you can see it refers to some unknown error. In this case the maintentance was known and* pre-planned* for several days. A lot of people were confused by the outage and the error page was unhelpful to them. This could have been mitigated simply by editing that page temporarily to say Our servers are undergoing scheduled maintenance, which has resulted in some downtime. This should be concluded by 14:00 UTC, please be patient whilst the maintenance progesses. And this is the extent of my suggestion to improve our communication with readers. Tom On 25 May 2011 11:02, Austin Hair adh...@gmail.com wrote: On Wed, May 25, 2011 at 11:57 AM, Thomas Morton morton.tho...@googlemail.com wrote: That's interesting, what was the wording for the maintenance message? I only ever saw the default our servers are experiencing a technical problem error page. I could be misremembering, because I honestly didn't care that much, but I do believe I saw the word maintenance in there somewhere. Either way, it was as informative as any message could be under the circumstances—unless, as Tim already addressed, you wanted a developer assigned to updating the message in real time. Austin ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
As you can see it refers to some unknown error. In this case the maintentance was known and* pre-planned* for several days. technically this was unknown problem :) A lot of people were confused by the outage and the error page was unhelpful to them. This could have been mitigated simply by editing that page temporarily to say Our servers are undergoing scheduled maintenance, which has resulted in some downtime. This should be concluded by 14:00 UTC, please be patient whilst the maintenance progesses. We did not really know when we will fix it :) And this is the extent of my suggestion to improve our communication with readers. IMO we're discussing completely wrong things here. Site was down, doesn't really matter in what way ;-) I'm sure we'd look much more professional if our downtime message would always say planned maintenance in process! ;-) Domas ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Huh? The downtime was expected during 13:00 and 14:00 UTC, or at least there was an email warning of such things the day before... hardly unplanned or unknown. Tom On 25 May 2011 11:12, Domas Mituzas midom.li...@gmail.com wrote: As you can see it refers to some unknown error. In this case the maintentance was known and* pre-planned* for several days. technically this was unknown problem :) A lot of people were confused by the outage and the error page was unhelpful to them. This could have been mitigated simply by editing that page temporarily to say Our servers are undergoing scheduled maintenance, which has resulted in some downtime. This should be concluded by 14:00 UTC, please be patient whilst the maintenance progesses. We did not really know when we will fix it :) And this is the extent of my suggestion to improve our communication with readers. IMO we're discussing completely wrong things here. Site was down, doesn't really matter in what way ;-) I'm sure we'd look much more professional if our downtime message would always say planned maintenance in process! ;-) Domas ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On Wed, May 25, 2011 at 12:09 PM, Thomas Morton morton.tho...@googlemail.com wrote: This is the error message that appeared for me (and apparently others): http://nomulous.com/blog/wp-content/uploads/2009/09/wikipedia_error.png I won't continue arguing about whether or not it should say planned, but I do have to say that I love probably temporary. (That, or Wikipedia has gone offline FOREVER.) Austin ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
It might be more worthwhile to put downtime status updates on status.wikimedia.org as a logical page to display the status of the servers, and link to it from the default error messages. Given that status.wm.org is an external service, it would hopefully not be affected by any outages and the Watchmouse service probably should have the functionality to host informational messages like this and explanations for outages (like appstatus.google.com does) even if only after the fact when the ops team has time to write down what is happening. Best regards, Bence ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Hi! Huh? The downtime was expected during 13:00 and 14:00 UTC, or at least there was an email warning of such things the day before... hardly unplanned or unknown. there's a bit of a difference between maintenance window and expected downtime during it. Domas ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
The maintenance was planned, downtime was noted as possible. An error message that reflects that seems, frankly, a good idea. The response to what I thought to be a helpful suggestion in improving communication with readership has been... incredibly disappointing. I wish I hadn't bothered. :( I was just passing on comments from people who came to IRC and basically said oh, well why didn't the site just say that then. Of course; if we are ignoring our readers' concerns now, then fine. Tom On 25 May 2011 11:20, Domas Mituzas midom.li...@gmail.com wrote: Hi! Huh? The downtime was expected during 13:00 and 14:00 UTC, or at least there was an email warning of such things the day before... hardly unplanned or unknown. there's a bit of a difference between maintenance window and expected downtime during it. Domas ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Hi! The maintenance was planned, downtime was noted as possible. An error message that reflects that seems, frankly, a good idea. There're lots of great ideas around the world, feeding the hungry and curing the cancer among them. The response to what I thought to be a helpful suggestion in improving communication with readership has been... incredibly disappointing. Well, you were complaining about confusion at first, probably we indeed should not show any technical details about anything. Site is down, bye! might be better choice, I guess. I wish I hadn't bothered. :( I was just passing on comments from people who came to IRC and basically said oh, well why didn't the site just say that then. If we knew what would fail to put an appropriate error message there, we'd probably fix the problem beforehand. :-) Of course; if we are ignoring our readers' concerns now, then fine. Nobody is ignoring any concerns, they are carefully weighted, hehehe, unlike your negativism. Cheers, Domas ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
If we knew what would fail to put an appropriate error message there, we'd probably fix the problem beforehand. :-) That's... completely missing the point. Yes the specific errors faced were unexpected or unforseen, BUT they were a* direct result* of the maintenance between 13:00 and 14:00. I am simply passing on the feeling of our readership; which was that the situation was badly communicated to them. I am trying to share my experience here as a sysadmin and website operator; users hate downtime/maintenance, and will complain about it endlessly. Improving our communication of planned maintenance is definitely a good idea. Nobody is ignoring any concerns, they are carefully weighted, hehehe, unlike your negativism. I'm trying to be positive, but it seems to simply be dismissed (incorrectly) as well we didn't know what was going to happen. Tom On 25 May 2011 11:33, Domas Mituzas midom.li...@gmail.com wrote: Hi! The maintenance was planned, downtime was noted as possible. An error message that reflects that seems, frankly, a good idea. There're lots of great ideas around the world, feeding the hungry and curing the cancer among them. The response to what I thought to be a helpful suggestion in improving communication with readership has been... incredibly disappointing. Well, you were complaining about confusion at first, probably we indeed should not show any technical details about anything. Site is down, bye! might be better choice, I guess. I wish I hadn't bothered. :( I was just passing on comments from people who came to IRC and basically said oh, well why didn't the site just say that then. If we knew what would fail to put an appropriate error message there, we'd probably fix the problem beforehand. :-) Of course; if we are ignoring our readers' concerns now, then fine. Nobody is ignoring any concerns, they are carefully weighted, hehehe, unlike your negativism. Cheers, Domas ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Tim, When I originally wrote: during maintenance is it possible in future to update the error message to explain that maintenance is ongoing? That was a bit of a silly moment from me :) I see how that implies in-maintenance updates. In fact my suggestion was to update the error message to mention the planned maintenance and the timeframes. Sorry for the confusion! Tom On 25 May 2011 08:15, Tim Starling tstarl...@wikimedia.org wrote: On 24/05/11 23:32, Thomas Morton wrote: So, just a quick thought for future reference - during maintenance is it possible in future to update the error message to explain that maintenance is ongoing? Seeing as how widely WMF projects are used by a non-technical project the current MySQL connection error I am seeing on Commons is just going to cause confusion :) And the standard error page WIkipedia was showing a minute ago is not particularly helpful/explanatory in this specific situation. Database connection errors were not an anticipated consequence of the scheduled router upgrades. The people who might have been able to change the error message were busy diagnosing and fixing the problem. When we have a lengthy period of downtime, more sysadmins arrive online, and a wider perspective on the problem develops, including attention to community impact and communication. But since the downtime in this case was only half an hour, there was not enough time for this to happen. -- Tim Starling ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Domas, what are you trying to achieve with your comments on Tom's suggestions? He just said that if we know that maintenance is done and could cause outages we should put up an error message that informs the reader about the maintenance work and tells him not to worry. That's obviously a good thing. The sensible reaction (from a person who is involved in the maintenance) would be: Oh, sorry, we were so much occupied with making the maintenance work as smooth and uninterruptive as possible that we totally didn't think about that. We will integrate it into our flow charts so we won't forget it the next time we need to do maintenance that could cause outages. Everything else is not very goal-oriented. Marcus Buck User:Slomox ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Hi! That's... completely missing the point. Yes the specific errors faced were unexpected or unforseen, BUT they were a* direct result* of the maintenance between 13:00 and 14:00. I am simply passing on the feeling of our readership; which was that the situation was badly communicated to them. As majority of our users are anons, who visit us once a day or two, we should probably have started a communication campaign at least two months before the maintenance. We practice a lot during fundraisers :-) OTOH, if there's no downtime, maybe we're causing quite some frustration with superfluous communication? :-) I am trying to share my experience here as a sysadmin and website operator; Oh, finally we got some sysadmins and website operators here. As a sysadmin you sure understand that in larger distributed systems which are not all built on a set of SPOFs there can be various failure modes, happening at various layers and various fuzziness. As a website operator you sure know that it is lots of effort to prepare boilerplates for every possible situation :-) users hate downtime/maintenance, and will complain about it endlessly. You have some annoying users, our users are awesome and don't complain endlessly! Improving our communication of planned maintenance is definitely a good idea. So is curing cancer. Marcus Buck wrote: Domas, what are you trying to achieve with your comments on Tom's suggestions? Put some clue in? The sensible reaction (from a person who is involved in the maintenance) would be: I know nobody likes this, but sensible reaction is to work on good operation rather than standing in front of a mirror and trying five hundred different I'm sorry phrases. You look too much from that single position, that communication is good, without weighting costs or other options. Cheers, Domas ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On 25/05/11 18:14, Thomas Morton wrote: IRC was flooded with people who didn't understand what was going on. And many didn't believe/understand that it was maintenance... so this is definitely an area worth improving. Maybe we can replace the IRC link in the Squid error message with a link to the WatchMouse page (status.wikimedia.org). That would reduce the IRC flood. -- Tim Starling ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Tim Starling wrote: Maybe we can replace the IRC link in the Squid error message with a link to the WatchMouse page (status.wikimedia.org). That would reduce the IRC flood. * https://bugzilla.wikimedia.org/show_bug.cgi?id=16043 * https://bugzilla.wikimedia.org/show_bug.cgi?id=20079 MZMcBride ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
m...@marcusbuck.org wrote: The sensible reaction (from a person who is involved in the maintenance) would be: Oh, sorry, we were so much occupied with making the maintenance work as smooth and uninterruptive as possible that we totally didn't think about that. We will integrate it into our flow charts so we won't forget it the next time we need to do maintenance that could cause outages. I'm kind of surprised that you think Wikimedia has flow charts for this kind of thing. MZMcBride ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Maybe we can replace the IRC link in the Squid error message with a link to the WatchMouse page @Tim; that seems a good idea. @Domas, I'm afraid you don't seem to have understood the premise of my suggestion.. which is fine. But one fallacy is worth responding to: You have some annoying users, our users are awesome and don't complain endlessly! The first rule of a website people use regularly is: users will complain endlessly One of my business mentors has a good maxim about this: Just because you can't see them complaining, don't simply assume they are not. Because they are. Twitter, Facebook, IRC and all sorts of other websites had people complaining about the down time. That is just a fact of life :) To wit: If that static error page cannot easily be changed prior to a maintenance, then fine :) no worries Tom On 25 May 2011 12:10, Domas Mituzas midom.li...@gmail.com wrote: Hi! That's... completely missing the point. Yes the specific errors faced were unexpected or unforseen, BUT they were a* direct result* of the maintenance between 13:00 and 14:00. I am simply passing on the feeling of our readership; which was that the situation was badly communicated to them. As majority of our users are anons, who visit us once a day or two, we should probably have started a communication campaign at least two months before the maintenance. We practice a lot during fundraisers :-) OTOH, if there's no downtime, maybe we're causing quite some frustration with superfluous communication? :-) I am trying to share my experience here as a sysadmin and website operator; Oh, finally we got some sysadmins and website operators here. As a sysadmin you sure understand that in larger distributed systems which are not all built on a set of SPOFs there can be various failure modes, happening at various layers and various fuzziness. As a website operator you sure know that it is lots of effort to prepare boilerplates for every possible situation :-) users hate downtime/maintenance, and will complain about it endlessly. You have some annoying users, our users are awesome and don't complain endlessly! Improving our communication of planned maintenance is definitely a good idea. So is curing cancer. Marcus Buck wrote: Domas, what are you trying to achieve with your comments on Tom's suggestions? Put some clue in? The sensible reaction (from a person who is involved in the maintenance) would be: I know nobody likes this, but sensible reaction is to work on good operation rather than standing in front of a mirror and trying five hundred different I'm sorry phrases. You look too much from that single position, that communication is good, without weighting costs or other options. Cheers, Domas ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On Wed, May 25, 2011 at 4:40 PM, Domas Mituzas midom.li...@gmail.comwrote: Hi! That's... completely missing the point. Yes the specific errors faced were unexpected or unforseen, BUT they were a* direct result* of the maintenance between 13:00 and 14:00. I am simply passing on the feeling of our readership; which was that the situation was badly communicated to them. As majority of our users are anons, who visit us once a day or two, we should probably have started a communication campaign at least two months before the maintenance. We practice a lot during fundraisers :-) OTOH, if there's no downtime, maybe we're causing quite some frustration with superfluous communication? :-) I am trying to share my experience here as a sysadmin and website operator; Oh, finally we got some sysadmins and website operators here. As a sysadmin you sure understand that in larger distributed systems which are not all built on a set of SPOFs there can be various failure modes, happening at various layers and various fuzziness. As a website operator you sure know that it is lots of effort to prepare boilerplates for every possible situation :-) users hate downtime/maintenance, and will complain about it endlessly. You have some annoying users, our users are awesome and don't complain endlessly! Improving our communication of planned maintenance is definitely a good idea. So is curing cancer. Marcus Buck wrote: Domas, what are you trying to achieve with your comments on Tom's suggestions? Put some clue in? The sensible reaction (from a person who is involved in the maintenance) would be: I know nobody likes this, but sensible reaction is to work on good operation rather than standing in front of a mirror and trying five hundred different I'm sorry phrases. You look too much from that single position, that communication is good, without weighting costs or other options. Cheers, Domas ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l I have no idea what Domas is trying to say. I agree with Thomas that there should be a better option to communicate with users about downtime and possible performance issues. I don't know how one would expect a user to discern between a planned downtime for maintenance vs. actual performance issues. There has been several issues earlier this year with performance and even temporary outages, not to mention there might have been more pronounced performance issues in certain locations. Instead of diverting users to IRC, how about an outage/error page with a twitter/identi.ca feed with updates from the tech team, or at least a page with customized message in case of previously planned outage. Most of the tech staff already use Twitter/Identi.ca to update users, maybe we can look for a way to incorporate that feed in the outage page itself or point them to it. How would someone who is not on any of the mailing lists, or has suppressed the banners supposed to find out about the difference between these issues? Theo User:Theo10011 ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Theo10011 wrote: Instead of diverting users to IRC, how about an outage/error page with a twitter/identi.ca feed with updates from the tech team, or at least a page with customized message in case of previously planned outage. Most of the tech staff already use Twitter/Identi.ca to update users, maybe we can look for a way to incorporate that feed in the outage page itself or point them to it. Is it so much to ask that you read the mailing list thread before replying? Nobody's asking you to memorize every word, but having some general idea of what has been discussed would make your replies less redundant and/or seemingly obtuse. MZMcBride ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On Wed, May 25, 2011 at 5:31 PM, MZMcBride z...@mzmcbride.com wrote: Theo10011 wrote: Instead of diverting users to IRC, how about an outage/error page with a twitter/identi.ca feed with updates from the tech team, or at least a page with customized message in case of previously planned outage. Most of the tech staff already use Twitter/Identi.ca to update users, maybe we can look for a way to incorporate that feed in the outage page itself or point them to it. Is it so much to ask that you read the mailing list thread before replying? Yes! hehyou expect me to read Bugzilla? Nobody's asking you to memorize every word, but having some general idea of what has been discussed would make your replies less redundant and/or seemingly obtuse. More Noise. MZMcBride ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On Wed, May 25, 2011 at 10:09 PM, Theo10011 de10...@gmail.com wrote: On Wed, May 25, 2011 at 5:31 PM, MZMcBride z...@mzmcbride.com wrote: Theo10011 wrote: Instead of diverting users to IRC, how about an outage/error page with a twitter/identi.ca feed with updates from the tech team, or at least a page with customized message in case of previously planned outage. Most of the tech staff already use Twitter/Identi.ca to update users, maybe we can look for a way to incorporate that feed in the outage page itself or point them to it. Is it so much to ask that you read the mailing list thread before replying? Yes! hehyou expect me to read Bugzilla? Where did Mz ever suggest to read Bz... I only see mention to reading what was already suggested in this email thread. -Peachey ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
What I understood from this thread is: if you have a planned maintenance windows between 13 and 14 GMT, it would be appreciated if you could: - create a simple page that says: We are working on our servers between 13 and 14 GMT and Wikipedia might be unavailable during that time - replace the usual error message with the newly created page as close as possible to 12:59 - reinstate the usual error message at 14:01 (or whenever the maintenance ends) Nobody (of the millions of anonymous users) really cares about whether a certain db server is down or up at 13:49, or some router is rebooting at 13:23. They just wanna know when they can come back to read about spark plugs (sic!). AFAIK, this is the way big websites like Yahoo do it. It seems like a simple thing to do, so perhaps you could explain calmly and without ironies where is the difficulty? Strainu 2011/5/25 Domas Mituzas midom.li...@gmail.com: Hi! That's... completely missing the point. Yes the specific errors faced were unexpected or unforseen, BUT they were a* direct result* of the maintenance between 13:00 and 14:00. I am simply passing on the feeling of our readership; which was that the situation was badly communicated to them. As majority of our users are anons, who visit us once a day or two, we should probably have started a communication campaign at least two months before the maintenance. We practice a lot during fundraisers :-) OTOH, if there's no downtime, maybe we're causing quite some frustration with superfluous communication? :-) I am trying to share my experience here as a sysadmin and website operator; Oh, finally we got some sysadmins and website operators here. As a sysadmin you sure understand that in larger distributed systems which are not all built on a set of SPOFs there can be various failure modes, happening at various layers and various fuzziness. As a website operator you sure know that it is lots of effort to prepare boilerplates for every possible situation :-) users hate downtime/maintenance, and will complain about it endlessly. You have some annoying users, our users are awesome and don't complain endlessly! Improving our communication of planned maintenance is definitely a good idea. So is curing cancer. Marcus Buck wrote: Domas, what are you trying to achieve with your comments on Tom's suggestions? Put some clue in? The sensible reaction (from a person who is involved in the maintenance) would be: I know nobody likes this, but sensible reaction is to work on good operation rather than standing in front of a mirror and trying five hundred different I'm sorry phrases. You look too much from that single position, that communication is good, without weighting costs or other options. Cheers, Domas ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On 25/05/11 21:19, MZMcBride wrote: Tim Starling wrote: Maybe we can replace the IRC link in the Squid error message with a link to the WatchMouse page (status.wikimedia.org). That would reduce the IRC flood. * https://bugzilla.wikimedia.org/show_bug.cgi?id=16043 * https://bugzilla.wikimedia.org/show_bug.cgi?id=20079 Maybe it's time to completely rewrite it. I noticed Austin joking about the awkward probably temporary phrasing, and the idea that we need to buy new hardware to avoid downtime is a bit dated. The source is in Subversion, at /trunk/debs/squid/debian/errors. Maybe if someone proposed some new text on meta.wikimedia.org, I could see that it gets included in the next Squid update. -- Tim Starling ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Tim, Great, thanks for that. Seeing as it was me that raise this ;) I guess it's only right I take up the gauntlet, so will try and find time later to propose something. Tom On 25 May 2011 13:48, Tim Starling tstarl...@wikimedia.org wrote: On 25/05/11 21:19, MZMcBride wrote: Tim Starling wrote: Maybe we can replace the IRC link in the Squid error message with a link to the WatchMouse page (status.wikimedia.org). That would reduce the IRC flood. * https://bugzilla.wikimedia.org/show_bug.cgi?id=16043 * https://bugzilla.wikimedia.org/show_bug.cgi?id=20079 Maybe it's time to completely rewrite it. I noticed Austin joking about the awkward probably temporary phrasing, and the idea that we need to buy new hardware to avoid downtime is a bit dated. The source is in Subversion, at /trunk/debs/squid/debian/errors. Maybe if someone proposed some new text on meta.wikimedia.org, I could see that it gets included in the next Squid update. -- Tim Starling ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On 25/05/11 22:27, Strainu wrote: What I understood from this thread is: if you have a planned maintenance windows between 13 and 14 GMT, it would be appreciated if you could: - create a simple page that says: We are working on our servers between 13 and 14 GMT and Wikipedia might be unavailable during that time - replace the usual error message with the newly created page as close as possible to 12:59 - reinstate the usual error message at 14:01 (or whenever the maintenance ends) There are dozens of places where error messages are generated. It's not trivial to replace them all. Some of them are hard-coded in compiled binaries, some are on the client side. The error message in question comes from DBConnectionError in Database.php in the MediaWiki source. It's hard-coded and the source would have had to have been patched. Since no database problems were anticipated, even if we had tried to implement your plan, we wouldn't have thought to patch Database.php, and the result would have been the same. Nobody (of the millions of anonymous users) really cares about whether a certain db server is down or up at 13:49, or some router is rebooting at 13:23. They just wanna know when they can come back to read about spark plugs (sic!). There was no way to tell when the site was going to be back up, except perhaps after the problem was isolated and the fix was halfway through being implemented. But by that time there was only a few minutes of downtime left. The maintenance window was 13:00 to 14:00, but after things went wrong, there was no guarantee that all problems would be fixed by 14:00. Indeed, if it wasn't for Domas's help as a volunteer sysadmin, the problem may have lasted much longer. Then there would have been plenty of time for messaging and maybe we wouldn't be having this conversation. -- Tim Starling ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Me - no. Readers who didn't know - yes. Wikipedia going down without a temporary explanation page is roughly of the same scale as apple.com going down with no explanation, google.com going down with no explanation, microsoft.com going down with no explanation, and so on. Top 5 website means we have that kind of use, perception, stature -- and a similar scale of response within the general public if it suddenly doesn't work. Most members of the public do not have the insight you or I would. FT2 On Wed, May 25, 2011 at 10:53 AM, Austin Hair adh...@gmail.com wrote: On Wed, May 25, 2011 at 9:32 AM, FT2 ft2.w...@gmail.com wrote: I don't get this. Would it be possible in future, if the sites are unresponsive, or will be unresponsive due to planned maintenance, to establish a fallback that simply displays an explanatory status message to the public? Would it have changed anything for you? ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
2011/5/25 Tim Starling tstarl...@wikimedia.org: On 25/05/11 22:27, Strainu wrote: What I understood from this thread is: if you have a planned maintenance windows between 13 and 14 GMT, it would be appreciated if you could: - create a simple page that says: We are working on our servers between 13 and 14 GMT and Wikipedia might be unavailable during that time - replace the usual error message with the newly created page as close as possible to 12:59 - reinstate the usual error message at 14:01 (or whenever the maintenance ends) There are dozens of places where error messages are generated. It's not trivial to replace them all. Some of them are hard-coded in compiled binaries, some are on the client side. [...] I kind of anticipated that response, but it's nice to have it written somewhere. I think it is now clear for everybody why there is a need for more sysadmins and/or developers to handle such issues. I believe much of this thread could have been avoided (and this means less time wasted writing emails for you guys) if you had stated that in your first email. Strainu ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
As a non-tech, don't all reads (at least) pass through the squids, so we can identify and report in a nice way a lot of connection errors at that point? /ignoreifnaive FT2 On Wed, May 25, 2011 at 2:18 PM, Tim Starling tstarl...@wikimedia.orgwrote: There are dozens of places where error messages are generated. It's not trivial to replace them all. Some of them are hard-coded in compiled binaries, some are on the client side. The error message in question comes from DBConnectionError in Database.php in the MediaWiki source. It's hard-coded and the source would have had to have been patched. Since no database problems were anticipated, even if we had tried to implement your plan, we wouldn't have thought to patch Database.php, and the result would have been the same. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Just conceptualising... I haven't played with Squid for a while (so am rusty) but the simplest solution would probably be to catch all PHP errors somewhere in the Mediawiki code and return a 500 status error code. Then get Squid to map that to the static error page. On the other hand throwing a catch any sort of error into an application isn't good practice. As Tim points out, errors can generate from all over the place and it is better to catch them explicitly. So that would be a non-trivial process. Tom On 25 May 2011 14:41, FT2 ft2.w...@gmail.com wrote: As a non-tech, don't all reads (at least) pass through the squids, so we can identify and report in a nice way a lot of connection errors at that point? /ignoreifnaive FT2 On Wed, May 25, 2011 at 2:18 PM, Tim Starling tstarl...@wikimedia.org wrote: There are dozens of places where error messages are generated. It's not trivial to replace them all. Some of them are hard-coded in compiled binaries, some are on the client side. The error message in question comes from DBConnectionError in Database.php in the MediaWiki source. It's hard-coded and the source would have had to have been patched. Since no database problems were anticipated, even if we had tried to implement your plan, we wouldn't have thought to patch Database.php, and the result would have been the same. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Wikipedia going down without a temporary explanation page is roughly of the same scale as apple.com going down with no explanation, google.com going down with no explanation, microsoft.com going down with no explanation, and so on. WHOAH THERE IS QUITE SOME SELF ENTITLEMENT THERE. Microsoft revenue: $62B (though you should look at their internet division losses) Google revenue: $29B Apple revenue: $62B Wikimedia revenue: ??? Tech staffing and such is somewhat proportional :) Oh, by the way, I don't know where you look, but I somewhat missed communication about maintenance events ongoing in Google or Microsoft or Apple - you think they have none? Did you get lots of clarification why your gmail was unreachable? Did you get explanation/information why search index was outdated? Do they use site-wide sitenotices for that or what? Top 5 website means we have that kind of use, perception, stature -- and a similar scale of response within the general public if it suddenly doesn't work. Most members of the public do not have the insight you or I would. *shrug*, would be interesting if anyone would actually explain policies of other website incident handling. Domas ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On 25/05/11 23:41, FT2 wrote: As a non-tech, don't all reads (at least) pass through the squids, so we can identify and report in a nice way a lot of connection errors at that point? /ignoreifnaive Maybe it would be possible to identify error messages by their HTTP response code, and replace the body with some other text, presumably with the original text embedded somehow for debugging purposes. But I don't think Squid has such a feature, and we have very little development time to spend on this sort of thing. -- Tim Starling ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Domas, why so defensive? No one accused you of anything or blamed you for the downtime. The comments suggesting more finely-tuned error messages weren't critical of you or Tim or the developers in general, they were just (reasonable) suggestions. Maybe adjusting all the various error messages in anticipation of possible downtime is totally unfeasible because of the work involved, but you can probably say that without all the combative snark. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Is the Squid configuration the foundation employs available publicly somewhere (I'm scanning the SVN and not seeing it..)? Because I don't mind having a look and filing a specific bugzilla correction with various bits of code changes. It's about time I refreshed my Squid knowledge :) Tom On 25 May 2011 14:58, Tim Starling tstarl...@wikimedia.org wrote: On 25/05/11 23:41, FT2 wrote: As a non-tech, don't all reads (at least) pass through the squids, so we can identify and report in a nice way a lot of connection errors at that point? /ignoreifnaive Maybe it would be possible to identify error messages by their HTTP response code, and replace the body with some other text, presumably with the original text embedded somehow for debugging purposes. But I don't think Squid has such a feature, and we have very little development time to spend on this sort of thing. -- Tim Starling ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On 25 May 2011 09:50, Domas Mituzas midom.li...@gmail.com wrote: Oh, by the way, I don't know where you look, but I somewhat missed communication about maintenance events ongoing in Google or Microsoft or Apple - you think they have none? Did you get lots of clarification why your gmail was unreachable? Did you get explanation/information why search index was outdated? Do they use site-wide sitenotices for that or what? Ummyes, actually. My Gmail produces an error code or gives me advance notice when there is scheduled maintenance, as does my hotmail (Microsoft), and Google fairly frequently explains its technical problems (though sometimes one has to look for it). Apple - I know nothing. And I'm realistic enough not to expect that level of service from Wikimedia; there's simply not the personnel to do it. I think we all appreciate, Domas, that notifying customers is not the #1 priority when our excellent team of paid and volunteer developers are fighting a pitched battle with wayward squids - all of us know getting the system working is the top priority, and anyone who's sat back and watched wikimedia-tech during a serious problem knows how incredibly diligent and focused you all are. Wiki(p)(m)edians who forget what collaborative work means should watch you folks when you're taking care of the serious business for a free lesson. It would be worthwhile, however, during a relatively quiet period to tweak the error messages (perhaps make them more generic and all purpose?). There are some useful ideas, particularly Tim's, in this thread, and it appears Thomas has volunteered to do much of the heavy lifting on it. Thanks to you and to all of the team who worked to address this situation yesterday, you did a good job. I know we don't say that nearly often enough. Risker/Anne ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Hi! Domas, why so defensive? I'm contrarian in this case :) unfeasible because of the work involved, but you can probably say that without all the combative snark. Well, as with every downtime, there are way more issues* that end up uncovered and have to be looked at, and yet largest email threads are about nicer error messages :-) This will be my constructive contribution to the thread: FAIL WHALE! W W W WW W W '. W .--._ \ \.--| / -..__) .-' | _ / \'-.__, .__.,' `''._\--' V Domas * buggy forcedeth behavior, european DNS server was hanging before maintenance started, loadbalancer likes to throw errors on first slave failure and doesn't go to others, no auto-fallback to read-only mode, too long connect timeouts, etc ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
@Tim: Understood, I'll make sure I know this will work first so as not to generate work for you. My initial idea might not be so workable given the architecture used (and how Squid handles error codes). I'll roll up some servers here at work and run some tests. @Domos; echoing what Risker said... The intent wasn't to criticise your work :) just to try constructively suggest improvements to something of minor importance. Sorry if that intent got lost somewhere in my messages. As you say; more critical issues appear to have cropped up internally in the ops team. Don't hold our ignorance of these things against us! We're just trying to contribute where we can. :) Tom On 25 May 2011 15:26, Tim Starling tstarl...@wikimedia.org wrote: On 26/05/11 00:05, Thomas Morton wrote: Is the Squid configuration the foundation employs available publicly somewhere (I'm scanning the SVN and not seeing it..)? Because I don't mind having a look and filing a specific bugzilla correction with various bits of code changes. It's about time I refreshed my Squid knowledge :) No, it's private. I can give it to you if you want it, just give me your assurance offlist that it'll be worth my time to clean any private data out of it and tar it up. -- Tim Starling ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Zitat von MZMcBride z...@mzmcbride.com: m...@marcusbuck.org wrote: The sensible reaction (from a person who is involved in the maintenance) would be: Oh, sorry, we were so much occupied with making the maintenance work as smooth and uninterruptive as possible that we totally didn't think about that. We will integrate it into our flow charts so we won't forget it the next time we need to do maintenance that could cause outages. I'm kind of surprised that you think Wikimedia has flow charts for this kind of thing. I'm not a native speaker of English and I don't know whether it was the right word. I meant a documentation that tells you which steps need to be taken before, while and after you touch a critical system. Marcus Buck User:Slomox ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
On Tue, May 24, 2011 at 6:32 AM, Thomas Morton morton.tho...@googlemail.com wrote: So, just a quick thought for future reference - during maintenance is it possible in future to update the error message to explain that maintenance is ongoing? I work with lots of (library) databases, and standard practice for these services is to display messages across the top or some other visible space warning of scheduled maintenance ahead of time. I guess we could do that with centralnotice, but /goes back to reading centralnotice thread :-) -- phoebe ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Domas Mituzas wrote: FAIL WHALE! W W W WW W W '. W .--._ \ \.--| / -..__) .-' | _ / \'-.__, .__.,' `''._\--' V http://en.wikipedia.org/wiki/User:MZMcBride/Blame_wheel 3 MZMcBride ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
So, just a quick thought for future reference - during maintenance is it possible in future to update the error message to explain that maintenance is ongoing? Seeing as how widely WMF projects are used by a non-technical project the current MySQL connection error I am seeing on Commons is just going to cause confusion :) And the standard error page WIkipedia was showing a minute ago is not particularly helpful/explanatory in this specific situation. Indeed, given the almost certainty of downtime from this maintenance, would it not just be best to bite the bullet (in such cases) and take the affected sites off-line with a useful maintenance message? It's essentially the same end point. What's the best way of addressing this suggestion to the right people? (i.e. the ops team?) Tom On 23 May 2011 21:19, Guillaume Paumier gpaum...@wikimedia.org wrote: Dear all, The Wikimedia Foundation will be performing network maintenance on Tuesday, May 24 between 13:00 and 14:00 (UTC) (see other timezones on timeanddate.com: http://ur1.ca/49cl2 ). During the maintenance period, you may experience intermittent connection issues to Wikimedia Foundation websites, including wikipedia.org. We have been experiencing router networking issues (and as a direct result, latency issues) since last week. After much investigation, and temporary fixes, the Operations team decided to update the router software and tune the configuration. We apologize for the inconvenience. -- Guillaume Paumier ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
I totally agree with Thomas. On Tue, May 24, 2011 at 4:32 PM, Thomas Morton morton.tho...@googlemail.com wrote: So, just a quick thought for future reference - during maintenance is it possible in future to update the error message to explain that maintenance is ongoing? Seeing as how widely WMF projects are used by a non-technical project the current MySQL connection error I am seeing on Commons is just going to cause confusion :) And the standard error page WIkipedia was showing a minute ago is not particularly helpful/explanatory in this specific situation. Indeed, given the almost certainty of downtime from this maintenance, would it not just be best to bite the bullet (in such cases) and take the affected sites off-line with a useful maintenance message? It's essentially the same end point. What's the best way of addressing this suggestion to the right people? (i.e. the ops team?) Tom On 23 May 2011 21:19, Guillaume Paumier gpaum...@wikimedia.org wrote: Dear all, The Wikimedia Foundation will be performing network maintenance on Tuesday, May 24 between 13:00 and 14:00 (UTC) (see other timezones on timeanddate.com: http://ur1.ca/49cl2 ). During the maintenance period, you may experience intermittent connection issues to Wikimedia Foundation websites, including wikipedia.org. We have been experiencing router networking issues (and as a direct result, latency issues) since last week. After much investigation, and temporary fixes, the Operations team decided to update the router software and tune the configuration. We apologize for the inconvenience. -- Guillaume Paumier ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24
Speaking of WP downtime, you might be particularly interested in today's XKCD: http://xkcd.com/903/ wittylama.com/blog Peace, love metadata On 24 May 2011 21:35, Itzik Edri it...@infra.co.il wrote: I totally agree with Thomas. On Tue, May 24, 2011 at 4:32 PM, Thomas Morton morton.tho...@googlemail.com wrote: So, just a quick thought for future reference - during maintenance is it possible in future to update the error message to explain that maintenance is ongoing? Seeing as how widely WMF projects are used by a non-technical project the current MySQL connection error I am seeing on Commons is just going to cause confusion :) And the standard error page WIkipedia was showing a minute ago is not particularly helpful/explanatory in this specific situation. Indeed, given the almost certainty of downtime from this maintenance, would it not just be best to bite the bullet (in such cases) and take the affected sites off-line with a useful maintenance message? It's essentially the same end point. What's the best way of addressing this suggestion to the right people? (i.e. the ops team?) Tom On 23 May 2011 21:19, Guillaume Paumier gpaum...@wikimedia.org wrote: Dear all, The Wikimedia Foundation will be performing network maintenance on Tuesday, May 24 between 13:00 and 14:00 (UTC) (see other timezones on timeanddate.com: http://ur1.ca/49cl2 ). During the maintenance period, you may experience intermittent connection issues to Wikimedia Foundation websites, including wikipedia.org. We have been experiencing router networking issues (and as a direct result, latency issues) since last week. After much investigation, and temporary fixes, the Operations team decided to update the router software and tune the configuration. We apologize for the inconvenience. -- Guillaume Paumier ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l