Re: [Pool] RIPE Atlas monitoring pool

Philip Homburg Tue, 02 Jun 2015 19:30:07 -0700

In your letter dated Mon, 01 Jun 2015 22:06:29 +0200 you wrote:
>> Here are a couple of reasons:
>> - client side monitoring, certainly with an as diverse set of nodes as Atlas
>>   probes may reveal all kinds of things normal monitoring doesn't see.
>
>I can imagine this. Therefore, I'd be interested in the results.
>
>For how long do you want to do this additional probing?

At first until after the leap second.

>From the Atlas side, we have the resources to continue after that. But of
course we should discuss what is the best way to run such a measurement.

>> - independent reporting is usually appreciated by users of a system
>
>This might lead to too much confidence in our pool system. We already have had
>first aid services asking for help as they could sometimes not synchronise the
>ir
>clocks very well with our pool. Sync'd clocks was essential in their system to
> make
>only one car (and at least one...) go to a crash site. We (at least I ;-) )
>definitely don't want people to get the idea this is a highly monitored, highl
>y
>accurate and high availability service.

I would very much like to have a highly accurate and high available time
service. It is not like we say, yeah those BGP and the DNS thingies, they sort
of work but actually it is not all that reliable.

But I understand your concerns.

>> - a longer term goal for Atlas, finding out if one-way delay measurements ar
>e
>>   possible between selected nodes.
>
>This is possible, depending on the relative accuracy of the time between the t
>wo
>nodes (they both may have the same unknown offset from absolute time). I canno
>t
>imagine this is new to you. So, what method to measure this one way delay do y
>ou
>have in mind? I am interested in all kinds of new methods to measure things :-
>)

The main thing is, do Atlas probes in interesting locations have good enough
time synchronisation that this measurement makes sense.

It doesn't work if the difference in delay that the operator community is
interested in is lower than the accuracy you can get from syncing Atlas probes
to pool.

>>> Personally I don't mind this, as long as you don't have all 8000 probes 
>>> start at once but randomly distribute the load over the 15 minutes.
>>> 8000 probes multiplied with 3 packets at once is about 1.7 Megabyte and 
>>> could easily annoy servers with weaker links.
>> 
>> The spread is expected to be 400 seconds.
>
>Each probe is doing a new DNS query every 15 minutes, right? Do you mean it sp
>reads
>out the three packets to the single probed server over 400 seconds (200 s on
>average between packets)?

No. For each probe, the three packets are just round trip times apart. But
when each probe starts to perform the measurement will be spread out over
a 400 second interval.

>> Tim Bray wrote:
>>> In my head, the atlas probes won't monitor the whole pool.
>> 
>> Quite possible. From a client point of view it is usually enough to know
>> what you can expect, this is not meant to replace normal monitoring.
>
>The normal pool returns 'nearby' servers, being in DNS rotation according to t
>heir
>set bandwidth. Therefore, you most likely end up with 90% of the measurements 
>for a
>single probe being from the nearest large bandwidth server. On the other hand,
> we
>have less than 3000 active servers in the IPv4 network. So, on average, each s
>erver
>is probed by 2.8 probes. For our 1000(!) IPv6 servers the ratio is 8 to 1.

There are roughly 2000 probes that have IPv6, so that is not a big difference.

>How about adding all 8000 probes as (low bandwidth) NTP servers to the pool? :
>-D

Part of how we 'sell' probes to probe hosts is that the probes don't have
any ports open.

However, we do have more than 100 Atlas anchors. So those, in places
where it makes sense, can probably be added to pool.

>>> They will just monitor whatever whether you get a good time from the
>>> first pool server that looks up?
>> 
>> The probes will perform a DNS lookup each time they perform a measurement,
>> so it depends a bit on what variation the DNS server returns.
>
>Clear, but what do you look at? Is it offset (relative to local time at the pr
>obe),
>stratum, delay at server, round trip delay, reachability, leap indication, ...
>?

Basically we decode the NTP reply and store it as JSON. See
https://atlas.ripe.net/docs/data_struct/#v4660_ntp 
for the docs.

Then everybody can download the measurement results and analyse them.

One thing is to look at rtt to see what is the limit of what can be
achieved. But also what percentage of servers in pool less accurated than
expected or required, etc.

_______________________________________________
pool mailing list
[email protected]
http://lists.ntp.org/listinfo/pool

Re: [Pool] RIPE Atlas monitoring pool

Reply via email to