Re: [LEAPSECS] Leapseconds, more evidence

Rob Seaman Mon, 02 Jul 2012 13:40:30 -0700

Interesting.  A few questions spring to mind.  Let me preface them by stating 
that this is far from my area of expertise and I'd be delighted to be educated 
here.

1) What are the units on the y-axis?

2) This shows the surge continuing for at least half a day after the event with 
no significant downward jumps from servers being rebooted, fixes being applied 
(as described in previous emails), or other apparent attempts at addressing the 
situation.  Was nobody monitoring the data center?  Weren't alerts sent out to 
offsite staff?  Did they attempt to implement any fixes?

2a) Do we know if any of the datacenter customers had prepared their 
workflows/systems in advance of the leap second?  That is, do we know whether 
the left hand side might be lower than normal?

3) There is a shallow decreasing trend after the event.  Is this what one would 
expect from affected servers left on their own?  Does the problem(s) resolve 
itself eventually or is operator interaction required?

3a) What happened after this plot was made?  Perhaps staff finally arrived and 
gave it a kick?  Did the power ramp back down to normal?  Judging from the 
trend in the plot, they will otherwise return to normal around midday tomorrow.

4) The upward jump is very rapid.  Hard to tell from the scale, but it appears 
to be a facility-wide 15% jump in, say, one minute or less.  Is this 
supportable by the power infrastructure onsite or on the local grid (assuming 
it is on the grid)?  Is the plot smoothed in any way?

5) This is a complex figure-of-merit and only distantly related to server 
processing load.  Might one expect the power usage to exhibit ringing behaviors 
from the rapid jump?  What is the typical mix of power consumption in a data 
center between CPUs, disks, and cooling, etc?  Wouldn't these each have 
different time constants that a naive viewer (i.e., me) might think would be 
visible?

6) The small scale structure remains similar before-and-after the event.  
Wouldn't one expect the system loading to interact in some complex way with the 
actual workflows the data center is in business to serve?  One might think the 
small scale would become either more variable or perhaps even flatten out as 
CPUs pegged.  As the CPUs pegged wouldn't disk I/O decrease (or at least change 
in some fashion)?

7) This is a 24 hour plot.  I guess the hourly peaks would be some sort of 
housekeeping workflows, and there surely would be load-balancing to squeeze the 
most out of the datacenter - but is it normal to otherwise see no diurnal 
variation?  And if there is load-balancing, is that the 2-3 day ramp we're 
seeing after the event?  One would think the time-constant would be much more 
rapid.

8) Is a typical day flat (outside small scale structure) at just about 910000 
units?  Is there usually a difference between a Saturday night and a Sunday 
morning?

Provenance would be appreciated.  Whatever our positions on the issues, they'll 
be strengthened by something more reliable than "I'm told that".  For instance, 
what's the mix of host OSes - is there otherwise reason to believe the 
datacenter was a candidate for the various issues described to date?

I don't suppose this particular datacenter is used by Amadeus?  :-)

Rob
--

On Jul 2, 2012, at 11:53 AM, Poul-Henning Kamp wrote:

> I'm told that this is the power usage from one of Hetzner's data centers over 
> the leap-second:
> 
>       http://imgur.com/a/ykoup

_______________________________________________
LEAPSECS mailing list
LEAPSECS@leapsecond.com
http://six.pairlist.net/mailman/listinfo/leapsecs

Re: [LEAPSECS] Leapseconds, more evidence

Reply via email to