date:20120702

[LEAPSECS] Leapseconds, more evidence

2012-07-02 Thread Poul-Henning Kamp


I'm told that this is the power usage from one of Hetzner's datacenters
over the leap-second:

http://imgur.com/a/ykoup

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
LEAPSECS mailing list
LEAPSECS@leapsecond.com
http://six.pairlist.net/mailman/listinfo/leapsecs

Re: [LEAPSECS] Leapseconds, more evidence

2012-07-02 Thread Rob Seaman

Interesting.  A few questions spring to mind.  Let me preface them by stating 
that this is far from my area of expertise and I'd be delighted to be educated 
here.

1) What are the units on the y-axis?

2) This shows the surge continuing for at least half a day after the event with 
no significant downward jumps from servers being rebooted, fixes being applied 
(as described in previous emails), or other apparent attempts at addressing the 
situation.  Was nobody monitoring the data center?  Weren't alerts sent out to 
offsite staff?  Did they attempt to implement any fixes?

2a) Do we know if any of the datacenter customers had prepared their 
workflows/systems in advance of the leap second?  That is, do we know whether 
the left hand side might be lower than normal?

3) There is a shallow decreasing trend after the event.  Is this what one would 
expect from affected servers left on their own?  Does the problem(s) resolve 
itself eventually or is operator interaction required?

3a) What happened after this plot was made?  Perhaps staff finally arrived and 
gave it a kick?  Did the power ramp back down to normal?  Judging from the 
trend in the plot, they will otherwise return to normal around midday tomorrow.

4) The upward jump is very rapid.  Hard to tell from the scale, but it appears 
to be a facility-wide 15% jump in, say, one minute or less.  Is this 
supportable by the power infrastructure onsite or on the local grid (assuming 
it is on the grid)?  Is the plot smoothed in any way?

5) This is a complex figure-of-merit and only distantly related to server 
processing load.  Might one expect the power usage to exhibit ringing behaviors 
from the rapid jump?  What is the typical mix of power consumption in a data 
center between CPUs, disks, and cooling, etc?  Wouldn't these each have 
different time constants that a naive viewer (i.e., me) might think would be 
visible?

6) The small scale structure remains similar before-and-after the event.  
Wouldn't one expect the system loading to interact in some complex way with the 
actual workflows the data center is in business to serve?  One might think the 
small scale would become either more variable or perhaps even flatten out as 
CPUs pegged.  As the CPUs pegged wouldn't disk I/O decrease (or at least change 
in some fashion)?

7) This is a 24 hour plot.  I guess the hourly peaks would be some sort of 
housekeeping workflows, and there surely would be load-balancing to squeeze the 
most out of the datacenter - but is it normal to otherwise see no diurnal 
variation?  And if there is load-balancing, is that the 2-3 day ramp we're 
seeing after the event?  One would think the time-constant would be much more 
rapid.

8) Is a typical day flat (outside small scale structure) at just about 91 
units?  Is there usually a difference between a Saturday night and a Sunday 
morning?

Provenance would be appreciated.  Whatever our positions on the issues, they'll 
be strengthened by something more reliable than I'm told that.  For instance, 
what's the mix of host OSes - is there otherwise reason to believe the 
datacenter was a candidate for the various issues described to date?

I don't suppose this particular datacenter is used by Amadeus?  :-)

Rob
--

On Jul 2, 2012, at 11:53 AM, Poul-Henning Kamp wrote:

 I'm told that this is the power usage from one of Hetzner's data centers over 
 the leap-second:
 
   http://imgur.com/a/ykoup

___
LEAPSECS mailing list
LEAPSECS@leapsecond.com
http://six.pairlist.net/mailman/listinfo/leapsecs

Re: [LEAPSECS] Leapseconds, more evidence

2012-07-02 Thread Poul-Henning Kamp

In message eeebea0e-71cf-490d-98e5-3d01ab131...@noao.edu, Rob Seaman writes:

Interesting.  A few questions spring to mind.  Let me preface them
by stating that this is far from my area of expertise and I'd be
delighted to be educated here.

The original source may be this:

https://plus.google.com/117024231055768477646/posts/2pkWbDiEDQG


1) What are the units on the y-axis?

Watt.

4) The upward jump is very rapid.  Hard to tell from the scale,
but it appears to be a facility-wide 15% jump in, say, one minute
or less.  Is this supportable by the power infrastructure onsite
or on the local grid (assuming it is on the grid)?  Is the plot
smoothed in any way?

It's only 135 kW, not really a big deal for a data-center, much
less of a deal for the power-grid.

I don't suppose this particular datacenter is used by Amadeus?  :-)

Well, Kristian Köhntopp works for booking.com...

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
LEAPSECS mailing list
LEAPSECS@leapsecond.com
http://six.pairlist.net/mailman/listinfo/leapsecs

Re: [LEAPSECS] Leapseconds, more evidence

2012-07-02 Thread Zefram

Rob Seaman wrote:
7) This is a 24 hour plot.  I guess the hourly peaks would be some sort
of housekeeping workflows,

Hourly cron jobs.  Most people are idiots and schedule them all for the
top of the hour.  The graph also shows discernible regular peaks for
aligned cycles of 30, 15, 10, and 5 minutes.  Of course, for the same
reason, daily cron jobs are mst often scheduled for midnight, so it's
expected that there be an especially big peak at midnight.  I'd like
to see a normal midnight's graph as well, because that's a confounding
effect that complicates interpretation of this graph.

The downward trend seen in the hours following midnight rather resembles
the smaller downward trend seen within each hour.  Particularly obvious
in the last three hours before UT midnight, but visible in nearly every
hour, there's a long-lived jump at the top of each hour, at the same
time as (but distinct from) the short-lived hourly peak.  Is this the
normal structure at that scale?

-zefram
___
LEAPSECS mailing list
LEAPSECS@leapsecond.com
http://six.pairlist.net/mailman/listinfo/leapsecs

Re: [LEAPSECS] Leapseconds, more evidence

2012-07-02 Thread Rob Seaman

On Jul 2, 2012, at 1:47 PM, Poul-Henning Kamp wrote:

 The original source may be this:
 
   https://plus.google.com/117024231055768477646/posts/2pkWbDiEDQG

You'd think google+ would let you run google translate…

 1) What are the units on the y-axis?
 
 Watt.

…

 It's only 135 kW, not really a big deal for a data-center, much
 less of a deal for the power-grid.

Ok - from the Hetzner web page I was imagining a much larger operation.  The 
other questions about the waveform remain in play.

Thanks!

Rob

___
LEAPSECS mailing list
LEAPSECS@leapsecond.com
http://six.pairlist.net/mailman/listinfo/leapsecs

Re: [LEAPSECS] Leapseconds, more evidence

2012-07-02 Thread Rob Seaman

Zefram wrote:

 Hourly cron jobs.  Most people are idiots and schedule them all for the
 top of the hour.  The graph also shows discernible regular peaks for
 aligned cycles of 30, 15, 10, and 5 minutes.  Of course, for the same
 reason, daily cron jobs are mst often scheduled for midnight, so it's
 expected that there be an especially big peak at midnight.

Good point.  Though at an observatory the midnight issue often translates to 
noon.

Interesting notion about dithering crontab and init scheduling in general.  
Grabbing a crontab from a random server here - one that likely several folks 
have added to - many entries are on even cycles, but there are some interesting 
choices like 47 minutes after the hour, or activities in the middle of the 
afternoon.

I was taken by the 17 minute cycle quoted for one of the issues in several 
reports (though perhaps borrowing from the same source).  Anybody know what 
that was about?  Something that was programmed in, or emergent behavior like 
locusts?

 I'd like to see a normal midnight's graph as well, because that's a 
 confounding
 effect that complicates interpretation of this graph.

Yes!

 The downward trend seen in the hours following midnight rather resembles
 the smaller downward trend seen within each hour.  Particularly obvious
 in the last three hours before UT midnight, but visible in nearly every
 hour, there's a long-lived jump at the top of each hour, at the same
 time as (but distinct from) the short-lived hourly peak.  Is this the
 normal structure at that scale?

The version on google+ is a bit more readable.  Not sure how much analysis is 
useful without more context.

Rob

___
LEAPSECS mailing list
LEAPSECS@leapsecond.com
http://six.pairlist.net/mailman/listinfo/leapsecs

Re: [LEAPSECS] Leapseconds, more evidence

2012-07-02 Thread Zefram

Rob Seaman wrote:
but there are some interesting choices like 47 minutes after the hour,

I make a point of randomising the phase of all my cron jobs.  Cron makes
it easy to schedule a job for the same time every hour or every day, so
I tend to use hourly or daily cycles but with a static random selection
of phase.  E.g., my job to check for a new version of the Olson timezone
database runs at 7 minutes past each hour, because 7 is what came out
of echo $((RANDOM%60)) (zsh) when I set it up.  Another technique
I've seen is to put a random sleep first thing in the job's commands,
so that it's dithered per run.

I was taken by the 17 minute cycle quoted for one of the issues
in several reports (though perhaps borrowing from the same source).
Anybody know what that was about?

That's ntpd.  It uses power-of-two numbers of seconds for the pauses
between polls of each peer, and so tends to update tracking parameters
on that cycle.  (It doesn't maintain a strict cycle time; you get an
approximate cycle dominated by that pause time.)  It goes up to 1024 s,
which is 17 min + 4 s.  ntpq -c pe shows the selected pause time for
each peer in the poll column.

-zefram
___
LEAPSECS mailing list
LEAPSECS@leapsecond.com
http://six.pairlist.net/mailman/listinfo/leapsecs

[LEAPSECS] Good description of Linux kernel bugs related to leap seconds

2012-07-02 Thread Hal Murray

From Steven Bellovin, on NANOG:
  See http://landslidecoding.blogspot.com/2012/07/linuxs-leap-second-deadlocks
.html

I've been hoping somebody would post a good summary like that.  It covers 5 
different bugs.

I'd split them into 3 clumps.  The first clump is kernel deadlocks.  As often 
happens with hard problems, fixing it here breaks it over there.  That's the 
first 3 bugs.

The 4th bug explains the CPU load spikes.  A kernel bug broke futex-es.  They 
are typically used in spin-lock like loops from user code.  I don't 
understand the details.  Setting the time (to any value) fixes the kernel 
problem so everything starts to work normally again.  (No need to reboot.)

The 5th bug is still a mystery.


Also from NANOG:
 If folks have not read it, I would suggest reading Normal Accidents by
 Charles Perrow.

I agree.  This is a good excuse to find my copy and read it again.  Yes, it's 
very good, but not directly related to leap seconds.



-- 
These are my opinions.  I hate spam.



___
LEAPSECS mailing list
LEAPSECS@leapsecond.com
http://six.pairlist.net/mailman/listinfo/leapsecs

[LEAPSECS] the next future of utc

2012-07-02 Thread Steve Allen

In the aftermath of the 2012 leap second we announce the next meeting

http://futureofutc.org/

Requirements for UTC and Civil Timekeeping on Earth

A Colloquium Addressing a Continuous Time Standard
to be held at the University of Virginia, Charlottesville, VA
May 29-31, 2013.

--
Steve Allen s...@ucolick.orgWGS-84 (GPS)
UCO/Lick Observatory--ISB   Natural Sciences II, Room 165Lat  +36.99855
1156 High StreetVoice: +1 831 459 3046   Lng -122.06015
Santa Cruz, CA 95064http://www.ucolick.org/~sla/ Hgt +250 m
___
LEAPSECS mailing list
LEAPSECS@leapsecond.com
http://six.pairlist.net/mailman/listinfo/leapsecs

[LEAPSECS] Leapseconds, more evidence

Re: [LEAPSECS] Leapseconds, more evidence

Re: [LEAPSECS] Leapseconds, more evidence

Re: [LEAPSECS] Leapseconds, more evidence

Re: [LEAPSECS] Leapseconds, more evidence

Re: [LEAPSECS] Leapseconds, more evidence

Re: [LEAPSECS] Leapseconds, more evidence

[LEAPSECS] Good description of Linux kernel bugs related to leap seconds

[LEAPSECS] the next future of utc

9 matches

Site Navigation

Mail list logo

Footer information