Hi Jake, thanks for responding!

So this sounds like a problem of the clock switching around the TAI/UTC 
conversion, and then ptp4l later tries to correct this by maxing the frequency 
slew..?

Indeed, that is exactly what it looks like. But then, why does it switch around 
the conversion from TAI/UTC when PXE booting other servers. When later trying 
to correct this by maxing the frequency slew and at the moment it is nearly 
synchronised again: PTP receives clock checks which makes PTP to never recover 
itself.

clock check messages should only be happening if some external process is also 
tuning the clock.

Exactly what I thought, but it makes no sense since because there is no other 
time service running or whatsoever. Besides, when the clock check messages 
start to occur, they come very fast. With that I mean every master offset 
output is followed by almost 5 clock check messages in between, like the clock 
check messages are returned every 0.2 seconds. This seems like a very weird 
behaviour.

And, like I said before, when increasing the tx_timestamp_timeout value to 
200ms there is no any problem, also no clock check messages.. So I doubt what 
could be the problem there..

Jord

On 2 Aug 2018, at 01:55, Keller, Jacob E 
<jacob.e.kel...@intel.com<mailto:jacob.e.kel...@intel.com>> wrote:

-----Original Message-----
From: Jord Pool [mailto:jord.p...@outlook.com]
Sent: Wednesday, August 01, 2018 12:27 AM
To: Richard Cochran 
<richardcoch...@gmail.com<mailto:richardcoch...@gmail.com>>; Keller, Jacob E
<jacob.e.kel...@intel.com<mailto:jacob.e.kel...@intel.com>>; Cliff Spradlin 
<csprad...@waymo.com<mailto:csprad...@waymo.com>>; Chris Caudle
<ch...@chriscaudle.org<mailto:ch...@chriscaudle.org>>; Cliff Spradlin via 
Linuxptp-users <linuxptp-
us...@lists.sourceforge.net<mailto:us...@lists.sourceforge.net>>
Subject: PXE Boot PTP Issues

Good morning !

As I explained the issues with the PTP slave which is a PXE Boot server at the
same time last week, where the message occurs which says to increase the
tx_timestamp_timeout or the issue being likely a driver bug, I have installed 
the
latest sourceforge e1000e driver (version 3.4.1.1) which does not solve the
problem.

Now as we said it is not per se a driver bug. Due to the increase in network 
traffic
(we assumed) the PTP slave instance will be interrupted. However, I am still 
left
with a question.

At the moment the PTP slave gets interrupted due to its increase in network
traffic being sent from the same server, the PTP slave instance everytime
receives an offset of 36 seconds (TAI / UTC conversion?). Then; the PTP instance
tries to slew this down but right at the moment it is nearly properly aligned 
again
there occurs a  ‘clock check’ message and the offset shoots up to 70+ seconds
and won’t recover anymore; returning only clock check messages every second
and offsets which only drift further away.


So this sounds like a problem of the clock switching around the TAI/UTC 
conversion, and then ptp4l later tries to correct this by maxing the frequency 
slew..?

This latter described behaviour is what bugs us the most, that PTP is unable to
recover itself and is only drifting even further away. How come PTP is 
interrupted
by network increase, gaining a 36 second offset, slewing down and then when it
nearly recovers returns a clock check message and shoots up its offset and never
recovers again?

clock check messages should only be happening if some external process is also 
tuning the clock.


If anyone could help me out on this that would be great! I am already working on
this for several days and can’t find a clue on how to solve this..

Jord

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users

Reply via email to