Re: [blfs-dev] NTP

2012-02-16 Thread Andrew Benton
On Wed, 15 Feb 2012 18:47:37 -0800
Qrux qrux@gmail.com wrote:

   * So, I propose turning -x off.

I agree, I run ntpd -g
However, I also think the ntpd bootscript will work fine for most
people and for those (like me) who think it should be done differently
it's trivial to edit the bootscript; your distro, your rules and all
that ;)

Andy
-- 
http://linuxfromscratch.org/mailman/listinfo/blfs-dev
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page


Re: [blfs-dev] NTP

2012-02-16 Thread Matthew Burgess
On Thu, 16 Feb 2012 11:16:12 +, Andrew Benton b3n...@gmail.com wrote:
 On Wed, 15 Feb 2012 18:47:37 -0800
 Qrux qrux@gmail.com wrote:
 
  * So, I propose turning -x off.
 
 I agree, I run ntpd -g
 However, I also think the ntpd bootscript will work fine for most
 people and for those (like me) who think it should be done differently
 it's trivial to edit the bootscript; your distro, your rules and all
 that ;)

It probably doesn't affect many LFSers, but Oracle's RAC installation/
configuration wizard explicitly checks for '-x' in the ntpd options.

It does this because you really don't want your database server's time
from jumping backwards, and '-x' (or 'tinker step 0' in /etc/ntp.conf)
is the only way to guarantee that won't happen.  Interestingly,
apparently Dovecot doesn't like time going backwards either; I'm sure
there are other servers that prefer a uni-directional arrow of time too.

For more 'normal' setups, I'd agree that calling 'ntpd -g -q' to do an
initial time sync at bootup, followed by ntpd without any other options
would be sufficient; the odds that the ntp pool servers most people use
are going to jump backwards are so small, I don't think it's worth
guarding against by using the '-x' option by default.

Regards,

Matt.

-- 
http://linuxfromscratch.org/mailman/listinfo/blfs-dev
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page


Re: [blfs-dev] NTP

2012-02-16 Thread Qrux

On Feb 16, 2012, at 4:38 AM, Matthew Burgess wrote:

 On Thu, 16 Feb 2012 11:16:12 +, Andrew Benton b3n...@gmail.com wrote:
 On Wed, 15 Feb 2012 18:47:37 -0800
 Qrux qrux@gmail.com wrote:
 
 * So, I propose turning -x off.
 
 I agree, I run ntpd -g
 However, I also think the ntpd bootscript will work fine for most
 people and for those (like me) who think it should be done differently
 it's trivial to edit the bootscript; your distro, your rules and all
 that ;)
 
 It probably doesn't affect many LFSers, but Oracle's RAC installation/
 configuration wizard explicitly checks for '-x' in the ntpd options.
 
 It does this because you really don't want your database server's time
 from jumping backwards, and '-x' (or 'tinker step 0' in /etc/ntp.conf)
 is the only way to guarantee that won't happen.

Interesting!  Sounds like Oracle...

As for the issue--I still stand by my original position that defaults should be 
sensible, and obey least surprise.  Running NTP by default with -x is 
surprising.  I'll leave the 'why' below the fold.

Q


* * *

Technical details follow, for those who are jumping up and down saying:

My app cares about time!  So, running NTP with -x protects me!

In case anyone has forgotten, NTP gives slewing by default.  The question is 
not whether monotonically increasing time is good.  You get that with OR 
WITHOUT -x.  The issue is, -x doesn't guarantee anything.  Man page:

-x, --slew: Slew up to 600 seconds.  Normally, the time is 
slewed...and stepped if above the threshold.  This option sets the threshold to 
600 s, which is well within the accuracy window to set the clock manually.

It simply raises the threshold to 600s, from 128ms.  And, in cases where you 
clock is drifting by more than 10 minutes in the polling interval (and you're 
saying your app cares about time?) then it wants YOU TO MANUALLY ADJUST THE 
TIME, before running NTP again.  I want to see you do that by hand, and keep 
things monotonically increasing, especially if you drifted forward.  I 
know...you'll shut down your production machine until those 600 s have elapsed, 
right?  And, in that same situation where you've drifted beyond 600s, if you 
combine -x with -g, you simply get a big step that doesn't shutdown ntpd--but, 
the point is, you get a STEP.  Lacking the -g, ntpd simply stops itself.

I, too, care about time in my apps.  So, I've looked into it.  And, in the 
little I know, -x protects nothing.  People spend all kinds of time worrying 
about various other minutiae (MTBF of hard drives, vibration in their systems 
causing bad feedback on platters, dual-redundant power supplies, etc, etc, etc) 
and they want absolutely order-dependent mission-critical applications to 
depend on the same technology that powers their Timex from 1982?  No.  Real 
apps that *really* care about time go out of their way to make sure their time 
hardware is as good as anything else.  They get crystal clocks enclosed inside 
a temperature-controlled, vibration-dampened enclosure with electronic 
conditioning. And, if they're careful, they use the CO as a *counter*, not as a 
*clock*.  Monotonicity is about counting ticks on a counter, not getting time 
from a clock.

So, -x is not a guarantee.  It's a stop-gap, for when your clock (or the 
environment around your clock) is failing miserably.  If you're in a situation 
where you're drifting for more than 600 seconds in a single polling interval, 
NTP is going to step you anyway, forward or back.  Or, it will simply quit.  
And let you do it.  At which point...What happens in your situation?  You shut 
down your high-volume production machine because you lost access to your 
timesource?

Plus, this is completely missing the point.  It's not about whether or not 
slewing is good.  It's about choosing between:

* (A) slew beyond 128ms drift

* (B) using a kernel discipline

The issue is, if you care about timekeeping (Oracle default installs don't give 
a flying crap), you don't let your clock drift more than (and I'm averaging 
here), 43 minutes/day.  Why 43?  NTP already keeps monotonically increasing 
time by slewing single deltas less than 128ms--and that all happens without -x. 
 43 minutes is simply the aggregate of the total number of 128ms drifts that 
NTP can correct BY DEFAULT (i.e., without -x) in a given day.  The 
arithmetic--if you accept the fact that the typical Unix slew rate is limited 
to 0.5 ms/s, a 128ms drift will take 256 seconds to amortize.  So, if you lose 
less 128ms every 256 seconds, that's fine, because THE DEFAULT SLEWING WILL 
TAKE CARE OF YOU.  And, 128ms every 256 seconds totals to 43 seconds per day.  
And, up to that amount of drift, the default slew will take care of it.

There is an exception, which is where you get single drifts in a polling 
interval past 128ms.  The default maximum polling interval is 1024 s.  Which 
means your clock would have to have a stability of less than 1 part in 

Re: [blfs-dev] NTP

2012-02-16 Thread Matt Burgess
On Thu, 2012-02-16 at 14:13 -0800, Qrux wrote:
 On Feb 16, 2012, at 4:38 AM, Matthew Burgess wrote:
 
  On Thu, 16 Feb 2012 11:16:12 +, Andrew Benton b3n...@gmail.com wrote:
  On Wed, 15 Feb 2012 18:47:37 -0800
  Qrux qrux@gmail.com wrote:
  
* So, I propose turning -x off.
  
  I agree, I run ntpd -g
  However, I also think the ntpd bootscript will work fine for most
  people and for those (like me) who think it should be done differently
  it's trivial to edit the bootscript; your distro, your rules and all
  that ;)
  
  It probably doesn't affect many LFSers, but Oracle's RAC installation/
  configuration wizard explicitly checks for '-x' in the ntpd options.
  
  It does this because you really don't want your database server's time
  from jumping backwards, and '-x' (or 'tinker step 0' in /etc/ntp.conf)
  is the only way to guarantee that won't happen.

 
 In case anyone has forgotten, NTP gives slewing by default.  The question is 
 not whether monotonically increasing time is good.  You get that with OR 
 WITHOUT -x.  The issue is, -x doesn't guarantee anything.

Good, I'm glad you said that, because that was my understanding from
reading the man page as well, which made me wonder why Oracle demands
'-x'.  Sometimes though, in order to get past their 1st/2nd line
support, it's easier to just give in and do what they want rather than
what is technically correct :-)

 So, getting back to your RAC system...Sure, it can check for it.  But let's 
 hope your database app doesn't stop operating when you can't find a 
 timesource.

In the particular environment that I was directly involved in, our time
source was the RAC server's default gateway (or more accurately, the NTP
service running on the Cisco switch, which the RAC servers were directly
connected to).  If they lost their time source, we'd have much bigger
issues than their notion of what the correct time was :-)

Regards,

Matt.

-- 
http://linuxfromscratch.org/mailman/listinfo/blfs-dev
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page


Re: [blfs-dev] NTP

2012-02-16 Thread Bruce Dubbs
Qrux wrote:

 It also wasn't the question I was asking.  I run ntpd in daemon mode,
 because I want it to keep correcting my time after boot, and that's
 where the slewing/stepping behavior is relevant.

Yes daemon mode is the script default.

 * So, I propose turning -x off.

OK, I won't make a special commit for it, but it will be that way with 
the next bootscript commit.

   -- Bruce
-- 
http://linuxfromscratch.org/mailman/listinfo/blfs-dev
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page


[blfs-dev] NTP

2012-02-15 Thread Qrux
Is there a reason ntpd is run with -x?

The big slew is nice, but is there a reason it's preferred over the kernel 
discipline?

Q

-- 
http://linuxfromscratch.org/mailman/listinfo/blfs-dev
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page


Re: [blfs-dev] NTP

2012-02-15 Thread Bruce Dubbs
Qrux wrote:
 Is there a reason ntpd is run with -x?
 
 The big slew is nice, but is there a reason it's preferred over the kernel 
 discipline?

When you are booting, there is probably nothing else really depending on 
timestamps.  We might as well just slew the time to be correct.  In most 
cases, the time offset should be small.

   -- Bruce
-- 
http://linuxfromscratch.org/mailman/listinfo/blfs-dev
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page


Re: [blfs-dev] NTP

2012-02-15 Thread Qrux

On Feb 15, 2012, at 5:00 PM, Bruce Dubbs wrote:

 Qrux wrote:
 Is there a reason ntpd is run with -x?
 
 The big slew is nice, but is there a reason it's preferred over the kernel 
 discipline?
 
 When you are booting, there is probably nothing else really depending on 
 timestamps.

Whether or not things run at boot time are sensitive to timestamps is 
irrelevant.  Because, if nothing cares, then it doesn't matter whether you step 
or slew.

It also wasn't the question I was asking.  I run ntpd in daemon mode, because I 
want it to keep correcting my time after boot, and that's where the 
slewing/stepping behavior is relevant.  From the man page for ntpd, about -x:


Slew up to 600 seconds.

Normally, the time is slewed if the offset is less than the step threshold, 
which is 128 ms by default, and stepped if above the threshold...Since the slew 
rate of typical Unix kernels is limited to 0.5 ms/s, each second of adjustment 
requires an amortization interval of 2000s.
===

If the kernel slew rate is limited to 0.5 ms/s, then your clock had better not 
drift by more than ~43 seconds/day, because no amount of slew will correct 
this.  So, to me, this is kind of silly.  Turning slew up to 600 s is kinda 
meaningless, unless you can also adjust the slew rate (and I don't see any 
mention about kconfig parameter to change that).  I would bet that a 43 s/d 
drift is rare on reasonably current hardware, and that if you're seeing it, 
you're doing something silly like chaining UPSs or keeping your PCs in a bad 
thermal environment (clock oscillators are very sensitive to temperature).

* Most people probably don't drift by more than 43 s/d.

* If they did, -x isn't their solution; it just hides a bigger issue.

* Kernel discipline (which -x disables) handles leap-seconds better.

* So, I propose turning -x off.

In addition, the BLFS ntpd is also run with -g.  Long story short, it's better 
to step while you can (i.e., before anything time-sensitive starts, like your 
application stack with database servers or network authentication servers like 
LDAP or Kerberos).  In fact, the kernel does it anyway, when it loads the 
reference time from the CMOS RTC.

Last leap-second was in 2008.  The next leap-second was originally scheduled 
for June 2012.  I heard back in Jan that might be postponed.  Either way, I 
think '-x' should not be the default.

Q




* * * Additional Info * * *

Getting back to -x...I guess slewing is fine if you really need a slew of up to 
600 seconds, and you have the kernel support to do it.  But, why choose that as 
the default over kernel discipine?  The situation where -x would benefit you 
would be the most rare of situations where you either have a one-time error and 
could afford a 14d slew, or you see this kind of drift often enough and could 
adjust the kernel slew rate to deal with it.  In fact, if your system needs -x, 
you probably don't care about good time anyway--or, should be depending on it.  
If you need to run ntpd with -x, you probably have bigger fish to fry, first.

Time discipline is about who gets to discipline the clock, and how.  NTP can do 
it through adjtime()--with microsecond resolution--or through adjtimex() which 
allows much higher precision (at the cost of portability, since, AFAIK, that 
system call is only available on Linux  FreeBSD).  The latter (adjtimex) 
requires kernel support.  In addition to precision (though, at the cost of 
slightly lower accuracy due to less algorithmic sophistication), there is 
another benefit to kernel discipline...

There is a time when kernel discipline is better than always slewing by the 
default slew rate limit...Which is during leap seconds.  During a leap-second, 
using a non-kernel discipline and a slow monotonic slew, time will go forward a 
little bit faster, but very slowly.  Which means, using -x, the full extent of 
that leap second won't be registered until 2000 seconds later.  Practically 
speaking, it comes down to: when the next leap second hits, do you want to be 
off by over half-a-second over a period of 1000 s, or would you rather have 
each timestamp to be off by a few dozen microseconds (arguably the difference 
between the higher-accuracy-NTP discipline, or the kernel's own 
slightly-less-than-fanatically-accurate discipline).  Using the kernel 
discipline, which can overcome the default slew rate, it will be registered 
very-near immediately.

I would think leap-second-correctness  
possibly-absurdly-high-accuracy-that-may-not-matter.  You could reframe my 
original question as: Why is the BLFS default choice to opt for a 
possibly-more-accurate time in place of a more-correct-time?  The NTP slewing 
is maybe more sophisticated than the kernel's.  But, it won't handle 
leap-seconds as well.

-- 
http://linuxfromscratch.org/mailman/listinfo/blfs-dev
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page