On 06/24/2015 19:14, Arnold Schekkerman wrote:
On 24-06-15 23:54, [email protected] wrote:
If you look at the CSV log link from your monitor page, your server's time is
excellent but the monitoring station is not getting a response periodically,
causing your score to decrease by 5 every time because it failed to respond. Is
your server being DDoS'd or using more bandwidth than it's allocated? Your
stress testing could be causing this as well. Can you ping your server
externally to check its reachability continuously and check for intermittent
downtime?
In other words, the problem isn't between your server and torix and your
server's ability to keep accurate time, but reachability between client and your
server is very poor. I would also check dmesg for any Ethernet issues, IP
conflicts, hardware issues, etc. When in doubt, reboot.
With the CSV-log you can see at what times the monitor sent a time request to
your
server (or you can extrapolate at what time you can expect the next request).
You
can use tcpdump to see if you receive those time requests and if your server is
sending a valid response. The monitor uses an IP in the net 207.171.x.x/16 so
you
can set tcpdump to only dump those NTP packages.
If you don't receive a time request at the times the CSV-log indicates, then you
have a firewall issue at your server or with an upstream internet service
provider.
If you get the requests and send valid time responses, then there is most
likely an
issue with an upstream ISP as well.
If you get the request, but you don't send a valid time response (no response at
all, a KOD, no-sync (both leap bits set), etc.) then it is something in your
local
configuration.
I hope this helps you to debug the issue further.
Arnold
I did some monitoring, which over the weekend the situation seemed to
have improved, but I am not receiving a DoS attack however I too seem to
occasionally be missing the monitoring packets being even transmitted to
the network border my time server runs on.
I have also been monitoring my router's memory usage and the connection
sessions to which none have spiked above 10,000 sessions. This indicates
to me that this is not an issue with the router either.
I have also performed the appropreate tcpdump and occasionally the
monitoring packets do not arrive to the network.
I have also done substancial pings through the core router over to the
border and no loss there — so no hardware failure is in play.
I have run an mtr to one of the NTP pool monitoring servers and seem to
be getting wild latency jumps just before entering Phyber's network though:
Host Loss% Snt Last Avg
Best Wrst StDev
1. cplexus.unimatrix01.digibase.ca 0.0% 265 1.5 0.8
0.2 1.6 0.0
2. ???
3. 233-5-226-24.rev.cgocable.net 0.0% 265 12.1 12.4
8.2 22.2 1.8
4. 1-6-226-24.rev.cgocable.net 0.0% 265 12.6 12.4
8.6 21.9 1.8
5. te0-0-0-7.rcr21.yhm01.atlas.cogentco.com 0.0% 265 12.1 11.7
8.7 19.3 1.1
6. be2622.ccr22.yyz02.atlas.cogentco.com 0.0% 265 13.6 12.6
9.8 20.8 1.0
7. be2597.ccr22.cle04.atlas.cogentco.com 0.0% 265 19.1 19.6
17.2 30.4 1.2
8. be2185.ccr42.ord01.atlas.cogentco.com 0.0% 265 26.6 26.8
24.5 33.9 1.2
9. be2157.ccr22.mci01.atlas.cogentco.com 0.0% 265 39.1 39.2
35.7 51.3 2.0
10. be2010.ccr22.dfw01.atlas.cogentco.com 0.0% 265 48.7 48.9
45.6 57.9 1.4
11. be2146.ccr22.iah01.atlas.cogentco.com 0.0% 265 54.2 54.6
52.3 58.5 0.9
12. be2066.ccr22.lax01.atlas.cogentco.com 0.0% 265 90.8 89.6
87.1 95.1 0.9
13. be2017.ccr21.lax04.atlas.cogentco.com 0.0% 265 91.2 90.0
87.4 102.5 1.6
14. 38.88.197.82 0.0% 265 90.7 99.1
85.9 295.5 37.0
15. te7-4.r02.lax2.phyber.com 0.0% 264 88.2 98.4
86.8 384.3 34.6
16. ntplax12.ntppool.net 0.0% 264 90.7 89.2
86.9 99.1 1.3
Which seems to indicate a congestion issue on Phyber's network resulting
in losses of the monitoring packets.
--
/s/
Kradorex Xeron <[email protected]>
Executive Director,
Digibase Operations, Research and Development <http://digibase.ca>
_______________________________________________
pool mailing list
[email protected]
http://lists.ntp.org/listinfo/pool