Re: [Pool] Monitoring concerns

Kradorex Xeron Sun, 30 Oct 2011 21:14:01 -0700

Apologies for the belated reply, however comments inline:

On 18/10/2011 21:57, Ask Bjørn Hansen <[email protected]> wrote:


<snip>

> Originally I thought this was a problem, but it turns out that at the
> precision we target ("better than 100ms") it's a complete non-issue even
> for monitoring servers on slow and weird internet connections on the other
> side of the planet from here.

I agree. 100ms is a good threshold to target.

> 
> >> How do they cope with issues with routing failures on the internet
> >> between them and the NTP hosts, where such failures may not be
> >> immediately rectified by intermediary ISPs?
> > 
> > They get their ISP to fix it.
> > 
> > If the monitoring network is down, it's also likely that the NTP pool DNS
> > servers are also unreachable or not providing fresh results, but that's
> > hardly the end of the world-- other nameservers across the net would be
> > caching the results for some time, and existing NTP clients would
> > continue to operate without change.
> 
> That's right.  It's basically not as much of a problem as it sounds.
> 

My concern isn't as much at either the monitoring network' ISP's end OR the 
NTP servers' ISPs' end, but rather "Somewhere in the middle", say a failed 
route due to a cut cable that hasn't been routed around yet or a bad core 
router. It can also be difficult if a router is giving  you horrible timing 
(I've seen traceroutes on bad routes where a single hop between a router and 
the next jumped a few hundred ms or caused outright stars)

It can be difficult to get an ISP to who you aren't a customer to fix their 
routing or get ETAs on downtime.

<snip>
> 
> ... and when it does the plan is that a failure from any of the monitoring
> nodes will subtract from your score; so it'll mean everyone will be MORE
> likely to have lower scores, not less.  :-)

I'd advocate for an internal scoring for internal use on monitoring nodes. 
e.g. each monitoring node having a score on their own accuracy. e.g. if a 
single monitoring node disagrees with the rest, that monitoring node's score 
is reduced, if a monitoring node falls below a certain threshold, it's 
factoring is no longer factored into decisions until it becomes more accurate 
again. This takes into account the fluid nature of routing on the internet and 
how routing can fail between specific hosts without the immediate ISPs at 
either the NTP server OR the monitoring node being at fault.

Thanks.

> 
> 
> 
>  - ask
--
Kradorex Xeron <[email protected]>
Founder, Chief Administrative Officer
IP Network Specialist, Systems and Networks Administrator
Digibase Operations, Research and Development <http://digibase.ca/>
_______________________________________________
pool mailing list
[email protected]
http://lists.ntp.org/listinfo/pool

Re: [Pool] Monitoring concerns

Reply via email to