Re: Stratum one autonomy and assumptions about GPS

2016-08-29 Thread Eric S. Raymond
Gary E. Miller :
> > 1. GPS outage length and frequencies are decreasing
> 
> Don't care.  If you need your NTP to work, you need to know it is working.
> Otherwise failure are not noticed.

OK, the test for "know it is working" is: you have lock, or you had lock
less than x seconds ago where x is a worst-case of your drift model to
whatever confidence interfal you want to fix.

> > 3. There's a lower bound below which outages don't matter; we may be
> > there.
> 
> I don't agree.  I monitor all my services 24x7, and I do get NTP
> problems in my logs.

And you also said in recent mail that you don't work with the kind of
hardware a serious autonomy-seeker would use.  So *your* NTP problems
are not determinative, though they could be useful input data for
improving error-estimation techniques.

> > Any given fixed accuracy target for deviation from UTC, combined with
> > a maximum crystal drift rate, defines a longest tolerable GPS outage. 
> 
> Not the majority failure mode.

That's an interesting statement.  What *is*, in your experience, the
dominant failure mode.

> > We may already be at a technological place where GPS outages don't
> > bust the tolerable-error budget, even with cheap hardware. If we
> > aren't, we'll probably be there soon. 
> 
> We can't define a single tolerable error budget.  We can provide some
> ranges of options for the user.

And that's exactly what I've been pushing towards - to develop some
statistical modeling on the basis of which we can make estimates to whatever
confidence bound the user wants to set as a parameter.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond


signature.asc
Description: PGP signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel

Re: Stratum one autonomy and assumptions about GPS

2016-08-29 Thread Gary E. Miller
Yo Hal!

On Thu, 25 Aug 2016 15:30:25 -0700
Hal Murray  wrote:

> e...@thyrsus.com said:
> > I have a USB thermometer on order, they're cheap.  Might I suggest
> > you get one and repeat this experiment, actually plotting your
> > temperature variation?   
> 
> Most CPU chips include a temperature sensor.

Which I have found does not correlate with any of my NTP data.

I now have a lot of data, just need to finish up the ntpviz temp
module.

RGDS
GARY
---
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
g...@rellim.com  Tel:+1 541 382 8588


pgps41ssZY_xi.pgp
Description: OpenPGP digital signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel

Re: Stratum one autonomy and assumptions about GPS

2016-08-29 Thread Gary E. Miller
Yo Eric!

On Thu, 25 Aug 2016 00:19:46 -0400
"Eric S. Raymond"  wrote:

> This was going to be a note to just Hal originally, but it will do the
> rest of the team no harm to know more about the scenarios and
> assumptions driving some of my design choices.
> 
> Hal objected (off list) to me drawing a conclusion from today's
> offset multiplot that check servers aren't necessary when you have
> a local GPS - a Stratum 1 really can run autonomously. He said,
> correctly of course, that the check servers aren't there to improve
> time accuracy when the GPS has sat lock, but to backstop the GPS when
> it flakes out.
> 
> I shall now discuss three interlocking reasons this possibility does
> not loom as large in my mind as it does in Hal's.
> 
> 1. GPS outage length and frequencies are decreasing

Don't care.  If you need your NTP to work, you need to know it is working.
Otherwise failure are not noticed.

> 2. The autonomy scenarios I think about are not hobbyist-budget
> productions

Yeah, and the big biys REALLY need to know their NTP is right.

> 3. There's a lower bound below which outages don't matter; we may be
> there.

I don't agree.  I monitor all my services 24x7, and I do get NTP
problems in my logs.

> Any given fixed accuracy target for deviation from UTC, combined with
> a maximum crystal drift rate, defines a longest tolerable GPS outage. 

Not the majority failure mode.

> We may already be at a technological place where GPS outages don't
> bust the tolerable-error budget, even with cheap hardware. If we
> aren't, we'll probably be there soon. 

We can't define a single tolerable error budget.  We can provide some
ranges of options for the user.

RGDS
GARY
---
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
g...@rellim.com  Tel:+1 541 382 8588


pgpaaxhDoxlFQ.pgp
Description: OpenPGP digital signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel

Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-08-29 Thread Eric S. Raymond
Processing old mail...

Hal Murray :
> > I believe you're right that these platforms don't have it.  The question is,
> > how important is that fact?  Is the performance hit from synchronous DNS
> > really a showstopper?  I don't know the answer. 
> 
> There are two cases I know of where ntpd does a DNS lookup after it gets 
> started.
> 
> One is the try again when DNS for the normal server case doesn't work during 
> initialization.  It will try again occasionally until it gets an answer. 
> (which might be negative)
> 
> The main one is the pool code trying for a new server.  I think we should be 
> extending this rather than dropping it.  There are several possibles in this 
> area.  The main one would be to verify that a server you are using is still 
> in the pool.  (There isn't a way to do that yet - the pool doesn't have any 
> DNS support for that.)  The other would be to try replacing the poorest 
> server rather than only replacing dead servers.
> 
> DNS lookups can take a LONG time.  I think I've seen 40 seconds on a failing 
> case.
> 
> If we get the recv time stamp from the OS, I think the DNS delays won't 
> introduce any lies on the normal path.  We could test that by putting a sleep 
> in the main loop.  (There is a filter to reject packets that take too long, 
> but I think that's time-in-flight and excludes time sitting on the server.)
> 
> There are two cases I can think of where a pause in ntpd would cause 
> troubles.  One is that it would mess up refclocks.  The other is that packets 
> will get dropped if too many of them arrive.
> 
> I think that means we could use the pool command on a system without 
> refclocks.  That covers end nodes and maybe lightly loaded servers.
> 
> ---
> 
> It's worth checking out the input buffering side of things.  There may be 
> some code there that we don't need.  I think there is a pool of buffers.  
> Where can a buffer sit other than on the free queue.   Why do we need a pool?

The project has more important priorities than chasing this down.  But: I have
edited this text, adding a few details I have learned since, into a new
section for the internals tour (devel/tour.txt).  That will give somebody
a better-than-nothing place to start if we ever again try something like
the cAres replacement.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel