Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-30 Thread Guillaume Paumier
Hi,

The maintenance was scheduled on Monday, for the day after that. We
had only a few hours to plan for it and communicate about it, and I
think we did a pretty good job given the time we had.

The maintenance banner was up for a few hours (not a day) prior to the
maintenance window to give readers  editors a heads-up. The notice
was also posted to social media channels (identica, twitter, facebook)
as well as on the most relevant lists.

I think that amount of communication is reasonable for a planned
maintenance operation that shouldn't result in long downtime.

As it was already mentioned in this thread, database errors weren't
expected during this network maintenance. It's always possible that
unplanned issues arise, and this is why the error page shouldn't be
too specific: if we plan for an issue and we end up encountering
another one, the error page may display incorrect information about
the cause or, more importantly, the severity of the issue.

About more ways to communicate on outages: I have a few items on my
todo list about this as well, so I'm glad that they were brought up in
this thread. The status.wikimedia.org page could certainly be designed
in a way that emphasizes the main information; I'm also investigating
whether we can use an API to display information on other places, e.g.
the Wikimedia blog (assuming the blog isn't down too).

I also agree the WMF error page could be improved. As a matter of
fact, I started thinking about how to improve it a few weeks ago. If
you're interested in this, I would welcome your help.

Thanks,

-- 
Guillaume Paumier

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-26 Thread Milos Rancic
On 05/25/2011 01:12 PM, Tim Starling wrote:
 On 25/05/11 18:14, Thomas Morton wrote:
 IRC was flooded with people who didn't understand what was going on. And
 many didn't believe/understand that it was maintenance... so this is
 definitely an area worth improving.
 
 Maybe we can replace the IRC link in the Squid error message with a
 link to the WatchMouse page (status.wikimedia.org). That would reduce
 the IRC flood.

Site notice for a week before the maintenance would be useful, too. We
communicate with our users via web site, not via emails.

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-26 Thread Federico Leva (Nemo)
Milos Rancic, 26/05/2011 09:57:
 Site notice for a week before the maintenance would be useful, too. We
 communicate with our users via web site, not via emails.

A week of pain to signal (and not avoid) an hour of pain? Doesn't look 
like a gain.

Nemo

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-26 Thread Thomas Morton
I'm pretty sure there was a site notice; I recall seeing one anyway :)

Tom

On 26 May 2011 09:09, Federico Leva (Nemo) nemow...@gmail.com wrote:

 Milos Rancic, 26/05/2011 09:57:
  Site notice for a week before the maintenance would be useful, too. We
  communicate with our users via web site, not via emails.

 A week of pain to signal (and not avoid) an hour of pain? Doesn't look
 like a gain.

 Nemo

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-26 Thread Milos Rancic
On 05/26/2011 10:09 AM, Federico Leva (Nemo) wrote:
 Milos Rancic, 26/05/2011 09:57:
 Site notice for a week before the maintenance would be useful, too. We
 communicate with our users via web site, not via emails.
 
 A week of pain to signal (and not avoid) an hour of pain? Doesn't look
 like a gain.

A small site notice? Not shown after dismissal? :) I mean, there are
always ways to make site notices less intrusive.

It is now common to get notice ~2 weeks before maintenance. And the most
of our users are not getting foundation-l or announcement-l emails.

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-26 Thread Theo10011
There was, it ran for a day. (
http://meta.wikimedia.org/wiki/Special:CentralNotice)-  Generic maintenance
notice.

Theo


On Thu, May 26, 2011 at 1:41 PM, Thomas Morton morton.tho...@googlemail.com
 wrote:

 I'm pretty sure there was a site notice; I recall seeing one anyway :)

 Tom

 On 26 May 2011 09:09, Federico Leva (Nemo) nemow...@gmail.com wrote:

  Milos Rancic, 26/05/2011 09:57:
   Site notice for a week before the maintenance would be useful, too. We
   communicate with our users via web site, not via emails.
 
  A week of pain to signal (and not avoid) an hour of pain? Doesn't look
  like a gain.
 
  Nemo
 
  ___
  foundation-l mailing list
  foundation-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-26 Thread Federico Leva (Nemo)
Thomas Morton, 26/05/2011 10:11:
 I'm pretty sure there was a site notice; I recall seeing one anyway :)

For a day: http://meta.wikimedia.org/wiki/Special:CentralNotice

Nemo

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-26 Thread Milos Rancic
On 05/26/2011 10:18 AM, Theo10011 wrote:
 There was, it ran for a day. (
 http://meta.wikimedia.org/wiki/Special:CentralNotice)-  Generic maintenance
 notice.

So, then it should just last a bit longer (maybe three days if not a
week?) and we would avoid the most of complains.

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-26 Thread Tim Starling
On 26/05/11 17:57, Milos Rancic wrote:
 On 05/25/2011 01:12 PM, Tim Starling wrote:
 On 25/05/11 18:14, Thomas Morton wrote:
 IRC was flooded with people who didn't understand what was going on. And
 many didn't believe/understand that it was maintenance... so this is
 definitely an area worth improving.

 Maybe we can replace the IRC link in the Squid error message with a
 link to the WatchMouse page (status.wikimedia.org). That would reduce
 the IRC flood.
 
 Site notice for a week before the maintenance would be useful, too. We
 communicate with our users via web site, not via emails.

I think a banner for a week would be excessive to advertise 2 minutes
of downtime. I think a banner for a day was excessive.

Some people don't care if Wikipedia is down for 2 minutes. If it
wasn't my job to care, I would be one of them.

-- Tim Starling


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-26 Thread K. Peachey
We already get spammed enough with notices, which is one of the
reasons many people hide them permanently via css so they never
intrude again, which would make them pointless for the more
established users, also overkill for what was meant to be (from my
understanding) only a few minutes of downtime. Just because we can
send something via notices doesn't mean we should, it can and has been
devaluing their importance to people.

As for the earlier comments about changing the irc channel, how about
we point it to the #wikimedia-status (or whatever its called) channel
that is designed to hand out info in downtimes compared to the general
#wikipedia which i believe it currently points to.
-Peachey

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Tim Starling
On 24/05/11 23:32, Thomas Morton wrote:
 So, just a quick thought for future reference - during maintenance is it
 possible in future to update the error message to explain that maintenance
 is ongoing?
 
 Seeing as how widely WMF projects are used by a non-technical project the
 current MySQL connection error I am seeing on Commons is just going to cause
 confusion :) And the standard error page WIkipedia was showing a minute ago
 is not particularly helpful/explanatory in this specific situation.

Database connection errors were not an anticipated consequence of the
scheduled router upgrades. The people who might have been able to
change the error message were busy diagnosing and fixing the problem.

When we have a lengthy period of downtime, more sysadmins arrive
online, and a wider perspective on the problem develops, including
attention to community impact and communication. But since the
downtime in this case was only half an hour, there was not enough time
for this to happen.

-- Tim Starling


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread FT2
I don't get this.

Would it be possible in future, if the sites are unresponsive, or will be
unresponsive due to planned maintenance, to establish a fallback that simply
displays an explanatory status message to the public?

FT2


On Wed, May 25, 2011 at 8:15 AM, Tim Starling tstarl...@wikimedia.orgwrote:

 (snip)
 The people who might have been able to
 change the error message were busy diagnosing and fixing the problem.

 When we have a lengthy period of downtime, more sysadmins arrive
 online, and a wider perspective on the problem develops, including
 attention to community impact and communication. But since the
 downtime in this case was only half an hour, there was not enough time
 for this to happen.


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Tim Starling
On 25/05/11 17:32, FT2 wrote:
 I don't get this.
 
 Would it be possible in future, if the sites are unresponsive, or will be
 unresponsive due to planned maintenance, to establish a fallback that simply
 displays an explanatory status message to the public?

You mean replace the entire site with an error page? But only part of
the site was down. More and more things became accessible as each
database server was fixed. I'm not sure how this could work.

Even if we did prepare an error message saying Wikipedia will be down
for 2 minutes while a router restarts, I don't think that could be
called explanatory if it were displayed for half an hour.

Writing informative error messages and displaying them in appropriate
places is necessarily a low-priority task during downtime, the higher
priority task being to get the site working again. Maybe at some time
in the future, we will have enough 24/7 sysadmin manpower that we can
respond to any unplanned downtime in the way you suggest. But we don't
have that capability just yet.

-- Tim Starling


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Thomas Morton
I think it's reasonable (and indeed standard) to deploy some sort of
downtime maintenance error message.

If that requires improving the error handling code to catch a wider variety
of errors and push people to the error message page then I understand the
time issues :).

If the short term solution is that the error page that kept appearing gets
tweaked (before the maintenance is started) to explain what is happening,
then that seems fine.

IRC was flooded with people who didn't understand what was going on. And
many didn't believe/understand that it was maintenance... so this is
definitely an area worth improving.

I'm not trying to criticise; just passing on some ideas based on the issues
raised.

Tom

On 25 May 2011 08:56, Tim Starling tstarl...@wikimedia.org wrote:

 On 25/05/11 17:32, FT2 wrote:
  I don't get this.
 
  Would it be possible in future, if the sites are unresponsive, or will be
  unresponsive due to planned maintenance, to establish a fallback that
 simply
  displays an explanatory status message to the public?

 You mean replace the entire site with an error page? But only part of
 the site was down. More and more things became accessible as each
 database server was fixed. I'm not sure how this could work.

 Even if we did prepare an error message saying Wikipedia will be down
 for 2 minutes while a router restarts, I don't think that could be
 called explanatory if it were displayed for half an hour.

 Writing informative error messages and displaying them in appropriate
 places is necessarily a low-priority task during downtime, the higher
 priority task being to get the site working again. Maybe at some time
 in the future, we will have enough 24/7 sysadmin manpower that we can
 respond to any unplanned downtime in the way you suggest. But we don't
 have that capability just yet.

 -- Tim Starling


 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Domas Mituzas

 priority task being to get the site working again. Maybe at some time
 in the future, we will have enough 24/7 sysadmin manpower that we can
 respond to any unplanned downtime in the way you suggest. But we don't
 have that capability just yet.

In future we will have five nines availability and no downtimes will happen.

Domas

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread FT2
In future can I have vanilla and strawberry with that? :)

FT2


On Wed, May 25, 2011 at 9:16 AM, Domas Mituzas midom.li...@gmail.comwrote:

 In future we will have five nines availability and no downtimes will
 happen.


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Austin Hair
On Wed, May 25, 2011 at 9:32 AM, FT2 ft2.w...@gmail.com wrote:
 I don't get this.

 Would it be possible in future, if the sites are unresponsive, or will be
 unresponsive due to planned maintenance, to establish a fallback that simply
 displays an explanatory status message to the public?

Would it have changed anything for you?

I tried to load Wikipedia a few times during the downtime, and a
maintenance error actually did appear most of the time. I did get a
few database errors, but I assumed that I wasn't the first to notice
and that someone was diligently working on it.

Regardless, my action was the same as it would have been in any case:
try back later.

Austin

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Thomas Morton
Austin,

That's interesting, what was the wording for the maintenance message? I only
ever saw the default our servers are experiencing a technical problem
error page.

Tom

On 25 May 2011 10:53, Austin Hair adh...@gmail.com wrote:

 On Wed, May 25, 2011 at 9:32 AM, FT2 ft2.w...@gmail.com wrote:
  I don't get this.
 
  Would it be possible in future, if the sites are unresponsive, or will be
  unresponsive due to planned maintenance, to establish a fallback that
 simply
  displays an explanatory status message to the public?

 Would it have changed anything for you?

 I tried to load Wikipedia a few times during the downtime, and a
 maintenance error actually did appear most of the time. I did get a
 few database errors, but I assumed that I wasn't the first to notice
 and that someone was diligently working on it.

 Regardless, my action was the same as it would have been in any case:
 try back later.

 Austin

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Austin Hair
On Wed, May 25, 2011 at 11:57 AM, Thomas Morton
morton.tho...@googlemail.com wrote:
 That's interesting, what was the wording for the maintenance message? I only
 ever saw the default our servers are experiencing a technical problem
 error page.

I could be misremembering, because I honestly didn't care that much,
but I do believe I saw the word maintenance in there somewhere.

Either way, it was as informative as any message could be under the
circumstances—unless, as Tim already addressed, you wanted a developer
assigned to updating the message in real time.

Austin

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Thomas Morton
 unless, as Tim already addressed, you wanted a developer
assigned to updating the message in real time.

No, definitely not what was being suggested.

This is the error message that appeared for me (and apparently others):
http://nomulous.com/blog/wp-content/uploads/2009/09/wikipedia_error.png

As you can see it refers to some unknown error. In this case the
maintentance was known and* pre-planned* for several days.

A lot of people were confused by the outage and the error page was unhelpful
to them. This could have been mitigated simply by editing that
page temporarily to say Our servers are undergoing scheduled maintenance,
which has resulted in some downtime. This should be concluded by 14:00 UTC,
please be patient whilst the maintenance progesses.

And this is the extent of my suggestion to improve our communication with
readers.

Tom

On 25 May 2011 11:02, Austin Hair adh...@gmail.com wrote:

 On Wed, May 25, 2011 at 11:57 AM, Thomas Morton
 morton.tho...@googlemail.com wrote:
  That's interesting, what was the wording for the maintenance message? I
 only
  ever saw the default our servers are experiencing a technical problem
  error page.

 I could be misremembering, because I honestly didn't care that much,
 but I do believe I saw the word maintenance in there somewhere.

 Either way, it was as informative as any message could be under the
 circumstances—unless, as Tim already addressed, you wanted a developer
 assigned to updating the message in real time.

 Austin

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Domas Mituzas
 
 As you can see it refers to some unknown error. In this case the
 maintentance was known and* pre-planned* for several days.

technically this was unknown problem :) 

 A lot of people were confused by the outage and the error page was unhelpful
 to them. This could have been mitigated simply by editing that
 page temporarily to say Our servers are undergoing scheduled maintenance,
 which has resulted in some downtime. This should be concluded by 14:00 UTC,
 please be patient whilst the maintenance progesses.

We did not really know when we will fix it :) 

 And this is the extent of my suggestion to improve our communication with
 readers.

IMO we're discussing completely wrong things here. Site was down, doesn't 
really matter in what way ;-) 
I'm sure we'd look much more professional if our downtime message would always 
say planned maintenance in process! ;-)

Domas
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Thomas Morton
Huh? The downtime was expected during 13:00 and 14:00 UTC, or at least there
was an email warning of such things the day before... hardly unplanned or
unknown.

Tom

On 25 May 2011 11:12, Domas Mituzas midom.li...@gmail.com wrote:

 
  As you can see it refers to some unknown error. In this case the
  maintentance was known and* pre-planned* for several days.

 technically this was unknown problem :)

  A lot of people were confused by the outage and the error page was
 unhelpful
  to them. This could have been mitigated simply by editing that
  page temporarily to say Our servers are undergoing scheduled
 maintenance,
  which has resulted in some downtime. This should be concluded by 14:00
 UTC,
  please be patient whilst the maintenance progesses.

 We did not really know when we will fix it :)

  And this is the extent of my suggestion to improve our communication with
  readers.

 IMO we're discussing completely wrong things here. Site was down, doesn't
 really matter in what way ;-)
 I'm sure we'd look much more professional if our downtime message would
 always say planned maintenance in process! ;-)

 Domas
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Austin Hair
On Wed, May 25, 2011 at 12:09 PM, Thomas Morton
morton.tho...@googlemail.com wrote:
 This is the error message that appeared for me (and apparently others):
 http://nomulous.com/blog/wp-content/uploads/2009/09/wikipedia_error.png

I won't continue arguing about whether or not it should say planned,
but I do have to say that I love probably temporary.

(That, or Wikipedia has gone offline FOREVER.)

Austin

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Bence Damokos
It might be more worthwhile to put downtime status updates on
status.wikimedia.org as a logical page to display the status of the servers,
and link to it from the default error messages.

Given that status.wm.org is an external service, it would hopefully not be
affected by any outages and the Watchmouse service probably should have the
functionality to host informational messages like this and explanations for
outages (like appstatus.google.com does) even if only after the fact when
the ops team has time to write down what is happening.


Best regards,
Bence
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Domas Mituzas
Hi!

 Huh? The downtime was expected during 13:00 and 14:00 UTC, or at least there
 was an email warning of such things the day before... hardly unplanned or
 unknown.

there's a bit of a difference between maintenance window and expected downtime 
during it.

Domas
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Thomas Morton
The maintenance was planned, downtime was noted as possible. An error
message that reflects that seems, frankly, a good idea.

The response to what I thought to be a helpful suggestion in improving
communication with readership has been... incredibly disappointing. I wish I
hadn't bothered. :( I was just passing on comments from people who  came to
IRC and basically said oh, well why didn't the site just say that then.

Of course; if we are ignoring our readers' concerns now, then fine.

Tom

On 25 May 2011 11:20, Domas Mituzas midom.li...@gmail.com wrote:

 Hi!

  Huh? The downtime was expected during 13:00 and 14:00 UTC, or at least
 there
  was an email warning of such things the day before... hardly unplanned or
  unknown.

 there's a bit of a difference between maintenance window and expected
 downtime during it.

 Domas
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Domas Mituzas
Hi!

 The maintenance was planned, downtime was noted as possible. An error
 message that reflects that seems, frankly, a good idea.

There're lots of great ideas around the world, feeding the hungry and curing 
the cancer among them. 

 The response to what I thought to be a helpful suggestion in improving
 communication with readership has been... incredibly disappointing.

Well, you were complaining about confusion at first, probably we indeed should 
not show any technical details about anything. 
Site is down, bye! might be better choice, I guess. 

 I wish I hadn't bothered. :( I was just passing on comments from people who  
 came to
 IRC and basically said oh, well why didn't the site just say that then.

If we knew what would fail to put an appropriate error message there, we'd 
probably fix the problem beforehand. :-)

 Of course; if we are ignoring our readers' concerns now, then fine.

Nobody is ignoring any concerns, they are carefully weighted, hehehe, unlike 
your negativism.

Cheers,
Domas
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Thomas Morton
 If we knew what would fail to put an appropriate error message there, we'd
probably fix the problem beforehand. :-)

That's... completely missing the point. Yes the specific errors faced were
unexpected or unforseen, BUT they were a* direct result* of the maintenance
between 13:00 and 14:00. I am simply passing on the feeling of our
readership; which was that the situation was badly communicated to them.

I am trying to share my experience here as a sysadmin and website operator;
users hate downtime/maintenance, and will complain about it endlessly.
Improving our communication of planned maintenance is definitely a good
idea.

 Nobody is ignoring any concerns, they are carefully weighted, hehehe,
unlike your negativism.

I'm trying to be positive, but it seems to simply be dismissed (incorrectly)
as well we didn't know what was going to happen.

Tom

On 25 May 2011 11:33, Domas Mituzas midom.li...@gmail.com wrote:

 Hi!

  The maintenance was planned, downtime was noted as possible. An error
  message that reflects that seems, frankly, a good idea.

 There're lots of great ideas around the world, feeding the hungry and
 curing the cancer among them.

  The response to what I thought to be a helpful suggestion in improving
  communication with readership has been... incredibly disappointing.

 Well, you were complaining about confusion at first, probably we indeed
 should not show any technical details about anything.
 Site is down, bye! might be better choice, I guess.

  I wish I hadn't bothered. :( I was just passing on comments from people
 who  came to
  IRC and basically said oh, well why didn't the site just say that then.

 If we knew what would fail to put an appropriate error message there, we'd
 probably fix the problem beforehand. :-)

  Of course; if we are ignoring our readers' concerns now, then fine.

 Nobody is ignoring any concerns, they are carefully weighted, hehehe,
 unlike your negativism.

 Cheers,
 Domas
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Thomas Morton
Tim,

When I originally wrote:

during maintenance is it possible in future to update the error message to
explain that maintenance is ongoing?
That was a bit of a silly moment from me :) I see how that implies
in-maintenance updates.

In fact my suggestion was to update the error message to mention the planned
maintenance and the timeframes.

Sorry for the confusion!

Tom

On 25 May 2011 08:15, Tim Starling tstarl...@wikimedia.org wrote:

 On 24/05/11 23:32, Thomas Morton wrote:
  So, just a quick thought for future reference - during maintenance is it
  possible in future to update the error message to explain that
 maintenance
  is ongoing?
 
  Seeing as how widely WMF projects are used by a non-technical project the
  current MySQL connection error I am seeing on Commons is just going to
 cause
  confusion :) And the standard error page WIkipedia was showing a minute
 ago
  is not particularly helpful/explanatory in this specific situation.

 Database connection errors were not an anticipated consequence of the
 scheduled router upgrades. The people who might have been able to
 change the error message were busy diagnosing and fixing the problem.

 When we have a lengthy period of downtime, more sysadmins arrive
 online, and a wider perspective on the problem develops, including
 attention to community impact and communication. But since the
 downtime in this case was only half an hour, there was not enough time
 for this to happen.

 -- Tim Starling


 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread me
Domas, what are you trying to achieve with your comments on Tom's  
suggestions? He just said that if we know that maintenance is done and  
could cause outages we should put up an error message that informs the  
reader about the maintenance work and tells him not to worry. That's  
obviously a good thing.

The sensible reaction (from a person who is involved in the  
maintenance) would be:
Oh, sorry, we were so much occupied with making the maintenance work  
as smooth and uninterruptive as possible that we totally didn't think  
about that. We will integrate it into our flow charts so we won't  
forget it the next time we need to do maintenance that could cause  
outages.

Everything else is not very goal-oriented.

Marcus Buck
User:Slomox



___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Domas Mituzas
Hi!

 That's... completely missing the point. Yes the specific errors faced were
 unexpected or unforseen, BUT they were a* direct result* of the maintenance
 between 13:00 and 14:00. I am simply passing on the feeling of our
 readership; which was that the situation was badly communicated to them.

As majority of our users are anons, who visit us once a day or two, we should 
probably have started a communication campaign at least two months before the 
maintenance. 
We practice a lot during fundraisers :-) 

OTOH, if there's no downtime, maybe we're causing quite some frustration with 
superfluous communication? :-) 

 I am trying to share my experience here as a sysadmin and website operator;

Oh, finally we got some sysadmins and website operators here. 
As a sysadmin you sure understand that in larger distributed systems which are 
not all built on a set of SPOFs there can be various failure modes, happening 
at various layers and various fuzziness. 
As a website operator you sure know that it is lots of effort to prepare 
boilerplates for every possible situation :-)

 users hate downtime/maintenance, and will complain about it endlessly.

You have some annoying users, our users are awesome and don't complain 
endlessly!

 Improving our communication of planned maintenance is definitely a good idea.

So is curing cancer. 

Marcus Buck wrote:
 Domas, what are you trying to achieve with your comments on Tom's  
 suggestions? 


Put some clue in? 

 The sensible reaction (from a person who is involved in the maintenance) 
 would be:

I know nobody likes this, but sensible reaction is to work on good operation 
rather than standing in front of a mirror and trying five hundred different 
I'm sorry phrases. 
You look too much from that single position, that communication is good, 
without weighting costs or other options. 

Cheers,
Domas
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Tim Starling
On 25/05/11 18:14, Thomas Morton wrote:
 IRC was flooded with people who didn't understand what was going on. And
 many didn't believe/understand that it was maintenance... so this is
 definitely an area worth improving.

Maybe we can replace the IRC link in the Squid error message with a
link to the WatchMouse page (status.wikimedia.org). That would reduce
the IRC flood.

-- Tim Starling


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread MZMcBride
Tim Starling wrote:
 Maybe we can replace the IRC link in the Squid error message with a
 link to the WatchMouse page (status.wikimedia.org). That would reduce
 the IRC flood.

* https://bugzilla.wikimedia.org/show_bug.cgi?id=16043
* https://bugzilla.wikimedia.org/show_bug.cgi?id=20079

MZMcBride



___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread MZMcBride
m...@marcusbuck.org wrote:
 The sensible reaction (from a person who is involved in the
 maintenance) would be:
 Oh, sorry, we were so much occupied with making the maintenance work
 as smooth and uninterruptive as possible that we totally didn't think
 about that. We will integrate it into our flow charts so we won't
 forget it the next time we need to do maintenance that could cause
 outages.

I'm kind of surprised that you think Wikimedia has flow charts for this kind
of thing.

MZMcBride



___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Thomas Morton
Maybe we can replace the IRC link in the Squid error message with a
link to the WatchMouse page
@Tim; that seems a good idea.

@Domas, I'm afraid you don't seem to have understood the premise of my
suggestion.. which is fine. But one fallacy is worth responding to:

 You have some annoying users, our users are awesome and don't complain
endlessly!

The first rule of a website people use regularly is: users will complain
endlessly

One of my business mentors has a good maxim about this: Just because you
can't see them complaining, don't simply assume they are not. Because they
are.

Twitter, Facebook, IRC and all sorts of other websites had people
complaining about the down time. That is just a fact of life :)

To wit: If that static error page cannot easily be changed prior to a
maintenance, then fine :) no worries

Tom

On 25 May 2011 12:10, Domas Mituzas midom.li...@gmail.com wrote:

 Hi!

  That's... completely missing the point. Yes the specific errors faced
 were
  unexpected or unforseen, BUT they were a* direct result* of the
 maintenance
  between 13:00 and 14:00. I am simply passing on the feeling of our
  readership; which was that the situation was badly communicated to them.

 As majority of our users are anons, who visit us once a day or two, we
 should probably have started a communication campaign at least two months
 before the maintenance.
 We practice a lot during fundraisers :-)

 OTOH, if there's no downtime, maybe we're causing quite some frustration
 with superfluous communication? :-)

  I am trying to share my experience here as a sysadmin and website
 operator;

 Oh, finally we got some sysadmins and website operators here.
 As a sysadmin you sure understand that in larger distributed systems which
 are not all built on a set of SPOFs there can be various failure modes,
 happening at various layers and various fuzziness.
 As a website operator you sure know that it is lots of effort to prepare
 boilerplates for every possible situation :-)

  users hate downtime/maintenance, and will complain about it endlessly.

 You have some annoying users, our users are awesome and don't complain
 endlessly!

  Improving our communication of planned maintenance is definitely a good
 idea.

 So is curing cancer.

 Marcus Buck wrote:
  Domas, what are you trying to achieve with your comments on Tom's
  suggestions?


 Put some clue in?

  The sensible reaction (from a person who is involved in the maintenance)
 would be:

 I know nobody likes this, but sensible reaction is to work on good
 operation rather than standing in front of a mirror and trying five hundred
 different I'm sorry phrases.
 You look too much from that single position, that communication is good,
 without weighting costs or other options.

 Cheers,
 Domas
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Theo10011
On Wed, May 25, 2011 at 4:40 PM, Domas Mituzas midom.li...@gmail.comwrote:

 Hi!

  That's... completely missing the point. Yes the specific errors faced
 were
  unexpected or unforseen, BUT they were a* direct result* of the
 maintenance
  between 13:00 and 14:00. I am simply passing on the feeling of our
  readership; which was that the situation was badly communicated to them.

 As majority of our users are anons, who visit us once a day or two, we
 should probably have started a communication campaign at least two months
 before the maintenance.
 We practice a lot during fundraisers :-)

 OTOH, if there's no downtime, maybe we're causing quite some frustration
 with superfluous communication? :-)

  I am trying to share my experience here as a sysadmin and website
 operator;

 Oh, finally we got some sysadmins and website operators here.
 As a sysadmin you sure understand that in larger distributed systems which
 are not all built on a set of SPOFs there can be various failure modes,
 happening at various layers and various fuzziness.
 As a website operator you sure know that it is lots of effort to prepare
 boilerplates for every possible situation :-)

  users hate downtime/maintenance, and will complain about it endlessly.

 You have some annoying users, our users are awesome and don't complain
 endlessly!

  Improving our communication of planned maintenance is definitely a good
 idea.

 So is curing cancer.

 Marcus Buck wrote:
  Domas, what are you trying to achieve with your comments on Tom's
  suggestions?


 Put some clue in?

  The sensible reaction (from a person who is involved in the maintenance)
 would be:

 I know nobody likes this, but sensible reaction is to work on good
 operation rather than standing in front of a mirror and trying five hundred
 different I'm sorry phrases.
 You look too much from that single position, that communication is good,
 without weighting costs or other options.

 Cheers,
 Domas
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


I have no idea what Domas is trying to say.

I agree with Thomas that there should be a better option to communicate with
users about downtime and possible performance issues. I don't know how one
would expect a user to discern between a planned downtime for maintenance
vs. actual performance issues. There has been several issues earlier this
year with performance and even temporary outages, not to mention there might
have been more pronounced performance issues in certain locations.

Instead of diverting users to IRC, how about an outage/error page with a
twitter/identi.ca feed with updates from the tech team, or at least a page
with customized message in case of previously planned outage. Most of the
tech staff already use Twitter/Identi.ca to update users, maybe we can look
for a way to incorporate that feed in the outage page itself or point them
to it.

How would someone who is not on any of the mailing lists, or has suppressed
the banners supposed to find out about the difference between these issues?


Theo
User:Theo10011
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread MZMcBride
Theo10011 wrote:
 Instead of diverting users to IRC, how about an outage/error page with a
 twitter/identi.ca feed with updates from the tech team, or at least a page
 with customized message in case of previously planned outage. Most of the
 tech staff already use Twitter/Identi.ca to update users, maybe we can look
 for a way to incorporate that feed in the outage page itself or point them
 to it.

Is it so much to ask that you read the mailing list thread before replying?
Nobody's asking you to memorize every word, but having some general idea of
what has been discussed would make your replies less redundant and/or
seemingly obtuse.

MZMcBride



___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Theo10011
On Wed, May 25, 2011 at 5:31 PM, MZMcBride z...@mzmcbride.com wrote:

 Theo10011 wrote:
  Instead of diverting users to IRC, how about an outage/error page with a
  twitter/identi.ca feed with updates from the tech team, or at least a
 page
  with customized message in case of previously planned outage. Most of the
  tech staff already use Twitter/Identi.ca to update users, maybe we can
 look
  for a way to incorporate that feed in the outage page itself or point
 them
  to it.

 Is it so much to ask that you read the mailing list thread before replying?


Yes! hehyou expect me to read Bugzilla?


 Nobody's asking you to memorize every word, but having some general idea of
 what has been discussed would make your replies less redundant and/or
 seemingly obtuse.


More Noise.



 MZMcBride



 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread K. Peachey
On Wed, May 25, 2011 at 10:09 PM, Theo10011 de10...@gmail.com wrote:
 On Wed, May 25, 2011 at 5:31 PM, MZMcBride z...@mzmcbride.com wrote:

 Theo10011 wrote:
  Instead of diverting users to IRC, how about an outage/error page with a
  twitter/identi.ca feed with updates from the tech team, or at least a
 page
  with customized message in case of previously planned outage. Most of the
  tech staff already use Twitter/Identi.ca to update users, maybe we can
 look
  for a way to incorporate that feed in the outage page itself or point
 them
  to it.

 Is it so much to ask that you read the mailing list thread before replying?


 Yes! hehyou expect me to read Bugzilla?
Where did Mz ever suggest to read Bz... I only see mention to
reading what was already suggested in this email thread.
-Peachey

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Strainu
What I understood from this thread is: if you have a planned
maintenance windows between 13 and 14 GMT, it would be appreciated if
you could:
- create a simple page that says: We are working on our servers
between 13 and 14 GMT and Wikipedia might be unavailable during that
time
- replace the usual error message with the newly created page as close
as possible to 12:59
- reinstate the usual error message at 14:01 (or whenever the maintenance ends)

Nobody (of the millions of anonymous users) really cares about whether
a certain db server is down or up at 13:49, or some router is
rebooting at 13:23. They just wanna know when they can come back to
read about spark plugs (sic!).

AFAIK, this is the way big websites like Yahoo do it.

It seems like a simple thing to do, so perhaps you could explain
calmly and without ironies where is the difficulty?

Strainu

2011/5/25 Domas Mituzas midom.li...@gmail.com:
 Hi!

 That's... completely missing the point. Yes the specific errors faced were
 unexpected or unforseen, BUT they were a* direct result* of the maintenance
 between 13:00 and 14:00. I am simply passing on the feeling of our
 readership; which was that the situation was badly communicated to them.

 As majority of our users are anons, who visit us once a day or two, we should 
 probably have started a communication campaign at least two months before the 
 maintenance.
 We practice a lot during fundraisers :-)

 OTOH, if there's no downtime, maybe we're causing quite some frustration with 
 superfluous communication? :-)

 I am trying to share my experience here as a sysadmin and website operator;

 Oh, finally we got some sysadmins and website operators here.
 As a sysadmin you sure understand that in larger distributed systems which 
 are not all built on a set of SPOFs there can be various failure modes, 
 happening at various layers and various fuzziness.
 As a website operator you sure know that it is lots of effort to prepare 
 boilerplates for every possible situation :-)

 users hate downtime/maintenance, and will complain about it endlessly.

 You have some annoying users, our users are awesome and don't complain 
 endlessly!

 Improving our communication of planned maintenance is definitely a good idea.

 So is curing cancer.

 Marcus Buck wrote:
 Domas, what are you trying to achieve with your comments on Tom's
 suggestions?


 Put some clue in?

 The sensible reaction (from a person who is involved in the maintenance) 
 would be:

 I know nobody likes this, but sensible reaction is to work on good operation 
 rather than standing in front of a mirror and trying five hundred different 
 I'm sorry phrases.
 You look too much from that single position, that communication is good, 
 without weighting costs or other options.

 Cheers,
 Domas
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Tim Starling
On 25/05/11 21:19, MZMcBride wrote:
 Tim Starling wrote:
 Maybe we can replace the IRC link in the Squid error message with a
 link to the WatchMouse page (status.wikimedia.org). That would reduce
 the IRC flood.
 
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=16043
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=20079

Maybe it's time to completely rewrite it. I noticed Austin joking
about the awkward probably temporary phrasing, and the idea that we
need to buy new hardware to avoid downtime is a bit dated.

The source is in Subversion, at /trunk/debs/squid/debian/errors. Maybe
if someone proposed some new text on meta.wikimedia.org, I could see
that it gets included in the next Squid update.

-- Tim Starling


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Thomas Morton
Tim,

Great, thanks for that. Seeing as it was me that raise this ;) I guess it's
only right I take up the gauntlet, so will try and find time later to
propose something.

Tom


On 25 May 2011 13:48, Tim Starling tstarl...@wikimedia.org wrote:

 On 25/05/11 21:19, MZMcBride wrote:
  Tim Starling wrote:
  Maybe we can replace the IRC link in the Squid error message with a
  link to the WatchMouse page (status.wikimedia.org). That would reduce
  the IRC flood.
 
  * https://bugzilla.wikimedia.org/show_bug.cgi?id=16043
  * https://bugzilla.wikimedia.org/show_bug.cgi?id=20079

 Maybe it's time to completely rewrite it. I noticed Austin joking
 about the awkward probably temporary phrasing, and the idea that we
 need to buy new hardware to avoid downtime is a bit dated.

 The source is in Subversion, at /trunk/debs/squid/debian/errors. Maybe
 if someone proposed some new text on meta.wikimedia.org, I could see
 that it gets included in the next Squid update.

 -- Tim Starling


 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Tim Starling
On 25/05/11 22:27, Strainu wrote:
 What I understood from this thread is: if you have a planned
 maintenance windows between 13 and 14 GMT, it would be appreciated if
 you could:
 - create a simple page that says: We are working on our servers
 between 13 and 14 GMT and Wikipedia might be unavailable during that
 time
 - replace the usual error message with the newly created page as close
 as possible to 12:59
 - reinstate the usual error message at 14:01 (or whenever the maintenance 
 ends)

There are dozens of places where error messages are generated. It's
not trivial to replace them all. Some of them are hard-coded in
compiled binaries, some are on the client side.

The error message in question comes from DBConnectionError in
Database.php in the MediaWiki source. It's hard-coded and the source
would have had to have been patched. Since no database problems were
anticipated, even if we had tried to implement your plan, we wouldn't
have thought to patch Database.php, and the result would have been the
same.

 Nobody (of the millions of anonymous users) really cares about whether
 a certain db server is down or up at 13:49, or some router is
 rebooting at 13:23. They just wanna know when they can come back to
 read about spark plugs (sic!).

There was no way to tell when the site was going to be back up, except
perhaps after the problem was isolated and the fix was halfway through
being implemented. But by that time there was only a few minutes of
downtime left. The maintenance window was 13:00 to 14:00, but after
things went wrong, there was no guarantee that all problems would be
fixed by 14:00.

Indeed, if it wasn't for Domas's help as a volunteer sysadmin, the
problem may have lasted much longer. Then there would have been plenty
of time for messaging and maybe we wouldn't be having this conversation.

-- Tim Starling


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread FT2
Me - no.
Readers who didn't know - yes.

Wikipedia going down without a temporary explanation page is roughly of the
same scale as apple.com going down with no explanation, google.com going
down with no explanation, microsoft.com going down with no explanation, and
so on.

Top 5 website means we have that kind of use, perception, stature -- and a
similar scale of response within the general public if it suddenly doesn't
work.  Most members of the public do not have the insight you or I would.

FT2


On Wed, May 25, 2011 at 10:53 AM, Austin Hair adh...@gmail.com wrote:

 On Wed, May 25, 2011 at 9:32 AM, FT2 ft2.w...@gmail.com wrote:
  I don't get this.
 
  Would it be possible in future, if the sites are unresponsive, or will be
  unresponsive due to planned maintenance, to establish a fallback that
 simply
  displays an explanatory status message to the public?

 Would it have changed anything for you?

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Strainu
2011/5/25 Tim Starling tstarl...@wikimedia.org:
 On 25/05/11 22:27, Strainu wrote:
 What I understood from this thread is: if you have a planned
 maintenance windows between 13 and 14 GMT, it would be appreciated if
 you could:
 - create a simple page that says: We are working on our servers
 between 13 and 14 GMT and Wikipedia might be unavailable during that
 time
 - replace the usual error message with the newly created page as close
 as possible to 12:59
 - reinstate the usual error message at 14:01 (or whenever the maintenance 
 ends)

 There are dozens of places where error messages are generated. It's
 not trivial to replace them all. Some of them are hard-coded in
 compiled binaries, some are on the client side.

[...]

I kind of anticipated that response, but it's nice to have it written
somewhere. I think it is now clear for everybody why there is a need
for more sysadmins and/or developers to handle such issues.

I believe much of this thread could have been avoided (and this means
less time wasted writing emails for you guys) if you had stated that
in your first email.

Strainu

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread FT2
As a non-tech, don't all reads (at least) pass through the squids, so we can
identify and report in a nice way a lot of connection errors at that point?
/ignoreifnaive

FT2


On Wed, May 25, 2011 at 2:18 PM, Tim Starling tstarl...@wikimedia.orgwrote:

  There are dozens of places where error messages are generated. It's
 not trivial to replace them all. Some of them are hard-coded in
 compiled binaries, some are on the client side.

 The error message in question comes from DBConnectionError in
 Database.php in the MediaWiki source. It's hard-coded and the source
 would have had to have been patched. Since no database problems were
 anticipated, even if we had tried to implement your plan, we wouldn't
 have thought to patch Database.php, and the result would have been the
 same.

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Thomas Morton
Just conceptualising...

I haven't played with Squid for a while (so am rusty) but the simplest
solution would probably be to catch all PHP errors somewhere in the
Mediawiki code and return a 500 status error code.

Then get Squid to map that to the static error page.

On the other hand throwing a catch any sort of error into an  application
isn't good practice. As Tim points out, errors can generate from all over
the place and it is better to catch them explicitly. So that would be a
non-trivial process.

Tom

On 25 May 2011 14:41, FT2 ft2.w...@gmail.com wrote:

 As a non-tech, don't all reads (at least) pass through the squids, so we
 can
 identify and report in a nice way a lot of connection errors at that point?
 /ignoreifnaive

 FT2


 On Wed, May 25, 2011 at 2:18 PM, Tim Starling tstarl...@wikimedia.org
 wrote:

   There are dozens of places where error messages are generated. It's
  not trivial to replace them all. Some of them are hard-coded in
  compiled binaries, some are on the client side.
 
  The error message in question comes from DBConnectionError in
  Database.php in the MediaWiki source. It's hard-coded and the source
  would have had to have been patched. Since no database problems were
  anticipated, even if we had tried to implement your plan, we wouldn't
  have thought to patch Database.php, and the result would have been the
  same.
 
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Domas Mituzas
 Wikipedia going down without a temporary explanation page is roughly of the
 same scale as apple.com going down with no explanation, google.com going
 down with no explanation, microsoft.com going down with no explanation, and
 so on.

WHOAH THERE IS QUITE SOME SELF ENTITLEMENT THERE.

Microsoft revenue: $62B (though you should look at their internet division 
losses) 
Google revenue: $29B
Apple revenue: $62B
Wikimedia revenue: ???

Tech staffing and such is somewhat proportional :) 

Oh, by the way, I don't know where you look, but I somewhat missed 
communication about maintenance events ongoing in Google or Microsoft or Apple 
- you think they have none? 
Did you get lots of clarification why your gmail was unreachable? 
Did you get explanation/information why search index was outdated? 
Do they use site-wide sitenotices for that or what? 

 Top 5 website means we have that kind of use, perception, stature -- and a
 similar scale of response within the general public if it suddenly doesn't
 work.  Most members of the public do not have the insight you or I would.

*shrug*, would be interesting if anyone would actually explain policies of 
other website incident handling. 

Domas
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Tim Starling
On 25/05/11 23:41, FT2 wrote:
 As a non-tech, don't all reads (at least) pass through the squids, so we can
 identify and report in a nice way a lot of connection errors at that point?
 /ignoreifnaive

Maybe it would be possible to identify error messages by their HTTP
response code, and replace the body with some other text, presumably
with the original text embedded somehow for debugging purposes. But I
don't think Squid has such a feature, and we have very little
development time to spend on this sort of thing.

-- Tim Starling


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Nathan
Domas, why so defensive? No one accused you of anything or blamed you
for the downtime. The comments suggesting more finely-tuned error
messages weren't critical of you or Tim or the developers in general,
they were just (reasonable) suggestions. Maybe adjusting all the
various error messages in anticipation of possible downtime is totally
unfeasible because of the work involved, but you can probably say that
without all the combative snark.

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Thomas Morton
Is the Squid configuration the foundation employs available publicly
somewhere (I'm scanning the SVN and not seeing it..)? Because I don't mind
having a look and filing a specific bugzilla correction with various bits of
code  changes.

It's about time I refreshed my Squid knowledge :)

Tom

On 25 May 2011 14:58, Tim Starling tstarl...@wikimedia.org wrote:

 On 25/05/11 23:41, FT2 wrote:
  As a non-tech, don't all reads (at least) pass through the squids, so we
 can
  identify and report in a nice way a lot of connection errors at that
 point?
  /ignoreifnaive

 Maybe it would be possible to identify error messages by their HTTP
 response code, and replace the body with some other text, presumably
 with the original text embedded somehow for debugging purposes. But I
 don't think Squid has such a feature, and we have very little
 development time to spend on this sort of thing.

 -- Tim Starling


 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Risker
On 25 May 2011 09:50, Domas Mituzas midom.li...@gmail.com wrote:

 Oh, by the way, I don't know where you look, but I somewhat missed
 communication about maintenance events ongoing in Google or Microsoft or
 Apple - you think they have none?
 Did you get lots of clarification why your gmail was unreachable?
 Did you get explanation/information why search index was outdated?
 Do they use site-wide sitenotices for that or what?




Ummyes, actually. My Gmail produces an error code or gives me advance
notice when there is scheduled maintenance, as does my hotmail (Microsoft),
and Google fairly frequently explains its technical problems (though
sometimes one has to look for it). Apple - I know nothing. And I'm realistic
enough not to expect that level of service from Wikimedia; there's simply
not the personnel to do it.

I think we all appreciate, Domas, that notifying customers is not the #1
priority when our excellent team of paid and volunteer developers are
fighting a pitched battle with wayward squids - all of us know getting the
system working is the top priority, and anyone who's sat back and watched
wikimedia-tech during a serious problem knows how incredibly diligent and
focused you all are. Wiki(p)(m)edians who forget what collaborative work
means should watch you folks when you're taking care of the serious business
for a free lesson.

It would be worthwhile, however, during a relatively quiet period to tweak
the error messages (perhaps make them more generic and all purpose?). There
are some useful ideas, particularly Tim's, in this thread, and it appears
Thomas has volunteered to do much of the heavy lifting on it.

Thanks to you and to all of the team who worked to address this situation
yesterday, you did a good job. I know we don't say that nearly often enough.

Risker/Anne
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Domas Mituzas
Hi!

 Domas, why so defensive?

I'm contrarian in this case :) 

 unfeasible because of the work involved, but you can probably say that
 without all the combative snark.

Well, as with every downtime, there are way more issues* that end up uncovered 
and have to be looked at, and yet largest email threads are about nicer error 
messages :-)
This will be my constructive contribution to the thread:

 FAIL WHALE!

W W  W
WW  W W
  '.  W  
  .--._ \ \.--|  
 /   -..__) .-'   
| _ /  
\'-.__,   .__.,'   
 `''._\--'  
V

Domas

* buggy forcedeth behavior, european DNS server was hanging before maintenance 
started, loadbalancer likes to throw errors on first slave failure and doesn't 
go to others, no auto-fallback to read-only mode, too long connect timeouts, etc
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread Thomas Morton
@Tim: Understood, I'll make sure I know this will work first so as not to
generate work for you. My initial idea might not be so workable given the
architecture used (and how Squid handles error codes). I'll roll up some
servers here at work and run some tests.

@Domos; echoing what Risker said... The intent wasn't to criticise your work
:) just to try constructively suggest improvements to something of minor
importance. Sorry if that intent got lost somewhere in my messages.

As you say; more critical issues appear to have cropped up internally in the
ops team. Don't hold our ignorance of these things against us! We're just
trying to contribute where we can. :)

Tom

On 25 May 2011 15:26, Tim Starling tstarl...@wikimedia.org wrote:

 On 26/05/11 00:05, Thomas Morton wrote:
  Is the Squid configuration the foundation employs available publicly
  somewhere (I'm scanning the SVN and not seeing it..)? Because I don't
 mind
  having a look and filing a specific bugzilla correction with various bits
 of
  code  changes.
 
  It's about time I refreshed my Squid knowledge :)

 No, it's private. I can give it to you if you want it, just give me
 your assurance offlist that it'll be worth my time to clean any
 private data out of it and tar it up.

 -- Tim Starling


 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread me

Zitat von MZMcBride z...@mzmcbride.com:

 m...@marcusbuck.org wrote:
 The sensible reaction (from a person who is involved in the
 maintenance) would be:
 Oh, sorry, we were so much occupied with making the maintenance work
 as smooth and uninterruptive as possible that we totally didn't think
 about that. We will integrate it into our flow charts so we won't
 forget it the next time we need to do maintenance that could cause
 outages.

 I'm kind of surprised that you think Wikimedia has flow charts for this kind
 of thing.

I'm not a native speaker of English and I don't know whether it was  
the right word. I meant a documentation that tells you which steps  
need to be taken before, while and after you touch a critical system.

Marcus Buck
User:Slomox



___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread phoebe ayers
On Tue, May 24, 2011 at 6:32 AM, Thomas Morton
morton.tho...@googlemail.com wrote:
 So, just a quick thought for future reference - during maintenance is it
 possible in future to update the error message to explain that maintenance
 is ongoing?

I work with lots of (library) databases, and standard practice for
these services is to display messages across the top or some other
visible space warning of scheduled maintenance ahead of time. I guess
we could do that with centralnotice, but /goes back to reading
centralnotice thread  :-)

-- phoebe

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-25 Thread MZMcBride
Domas Mituzas wrote:
  FAIL WHALE!
 
 W W  W   
 WW  W W
   '.  W
   .--._ \ \.--|
  /   -..__) .-'
 | _ /
 \'-.__,   .__.,' 
  `''._\--'   
 V

http://en.wikipedia.org/wiki/User:MZMcBride/Blame_wheel 3

MZMcBride



___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-24 Thread Thomas Morton
So, just a quick thought for future reference - during maintenance is it
possible in future to update the error message to explain that maintenance
is ongoing?

Seeing as how widely WMF projects are used by a non-technical project the
current MySQL connection error I am seeing on Commons is just going to cause
confusion :) And the standard error page WIkipedia was showing a minute ago
is not particularly helpful/explanatory in this specific situation.

Indeed, given the almost certainty of downtime from this maintenance, would
it not just be best to bite the bullet (in such cases) and take the affected
sites off-line with a useful maintenance message? It's essentially the same
end point.

What's the best way of addressing this suggestion to the right people? (i.e.
the ops team?)

Tom

On 23 May 2011 21:19, Guillaume Paumier gpaum...@wikimedia.org wrote:

 Dear all,

 The Wikimedia Foundation will be performing network maintenance on
 Tuesday, May 24 between 13:00 and 14:00 (UTC) (see other timezones on
 timeanddate.com: http://ur1.ca/49cl2 ).

 During the maintenance period, you may experience intermittent
 connection issues to Wikimedia Foundation websites, including
 wikipedia.org.

 We have been experiencing router networking issues (and as a direct
 result, latency issues) since last week. After much investigation, and
 temporary fixes, the Operations team decided to update the router
 software and tune the configuration.

 We apologize for the inconvenience.

 --
 Guillaume Paumier

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-24 Thread Itzik Edri
I totally agree with Thomas.

On Tue, May 24, 2011 at 4:32 PM, Thomas Morton morton.tho...@googlemail.com
 wrote:

 So, just a quick thought for future reference - during maintenance is it
 possible in future to update the error message to explain that maintenance
 is ongoing?

 Seeing as how widely WMF projects are used by a non-technical project the
 current MySQL connection error I am seeing on Commons is just going to
 cause
 confusion :) And the standard error page WIkipedia was showing a minute ago
 is not particularly helpful/explanatory in this specific situation.

 Indeed, given the almost certainty of downtime from this maintenance, would
 it not just be best to bite the bullet (in such cases) and take the
 affected
 sites off-line with a useful maintenance message? It's essentially the same
 end point.

 What's the best way of addressing this suggestion to the right people?
 (i.e.
 the ops team?)

 Tom

 On 23 May 2011 21:19, Guillaume Paumier gpaum...@wikimedia.org wrote:

  Dear all,
 
  The Wikimedia Foundation will be performing network maintenance on
  Tuesday, May 24 between 13:00 and 14:00 (UTC) (see other timezones on
  timeanddate.com: http://ur1.ca/49cl2 ).
 
  During the maintenance period, you may experience intermittent
  connection issues to Wikimedia Foundation websites, including
  wikipedia.org.
 
  We have been experiencing router networking issues (and as a direct
  result, latency issues) since last week. After much investigation, and
  temporary fixes, the Operations team decided to update the router
  software and tune the configuration.
 
  We apologize for the inconvenience.
 
  --
  Guillaume Paumier
 
  ___
  foundation-l mailing list
  foundation-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Scheduled intermittent downtime on all Wikimedia projects on May 24

2011-05-24 Thread Liam Wyatt
Speaking of WP downtime, you might be particularly interested in today's
XKCD:
http://xkcd.com/903/


wittylama.com/blog
Peace, love  metadata


On 24 May 2011 21:35, Itzik Edri it...@infra.co.il wrote:

 I totally agree with Thomas.

 On Tue, May 24, 2011 at 4:32 PM, Thomas Morton 
 morton.tho...@googlemail.com
  wrote:

  So, just a quick thought for future reference - during maintenance is it
  possible in future to update the error message to explain that
 maintenance
  is ongoing?
 
  Seeing as how widely WMF projects are used by a non-technical project the
  current MySQL connection error I am seeing on Commons is just going to
  cause
  confusion :) And the standard error page WIkipedia was showing a minute
 ago
  is not particularly helpful/explanatory in this specific situation.
 
  Indeed, given the almost certainty of downtime from this maintenance,
 would
  it not just be best to bite the bullet (in such cases) and take the
  affected
  sites off-line with a useful maintenance message? It's essentially the
 same
  end point.
 
  What's the best way of addressing this suggestion to the right people?
  (i.e.
  the ops team?)
 
  Tom
 
  On 23 May 2011 21:19, Guillaume Paumier gpaum...@wikimedia.org wrote:
 
   Dear all,
  
   The Wikimedia Foundation will be performing network maintenance on
   Tuesday, May 24 between 13:00 and 14:00 (UTC) (see other timezones on
   timeanddate.com: http://ur1.ca/49cl2 ).
  
   During the maintenance period, you may experience intermittent
   connection issues to Wikimedia Foundation websites, including
   wikipedia.org.
  
   We have been experiencing router networking issues (and as a direct
   result, latency issues) since last week. After much investigation, and
   temporary fixes, the Operations team decided to update the router
   software and tune the configuration.
  
   We apologize for the inconvenience.
  
   --
   Guillaume Paumier
  
   ___
   foundation-l mailing list
   foundation-l@lists.wikimedia.org
   Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
  
  ___
  foundation-l mailing list
  foundation-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l