I'm unable to update the kernel on my system due to other external 
requirements. I am running an out-of-tree driver for the i40e that was released 
in Jan 2021 (v2.14.13).

I have been looking for other solutions and it appears that I have been able to 
resolve the issue with two changes.
Please let me know if you have any thoughts or if it looks like I'm heading for 
trouble.

I applied this patch from Miroslav Lichvar that I found on the mailing list, 
which seems to resolve the issue with the ports getting stuck in UNCALIBRATED
https://sourceforge.net/p/linuxptp/mailman/message/37293303/

Don't perform the sanity check on receive timestamps from ports in
non-slave states to avoid false positives in the jbod mode, where
the timestamps can be generated by different clocks.

Reviewed-by: Jacob Keller <jacob.e.keller@...>
Signed-off-by: Miroslav Lichvar <mlichvar@...>
---
 port.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/port.c b/port.c
index 10bb9e1..fb420fb 100644
--- a/port.c
+++ b/port.c
@@ -2744,7 +2744,10 @@ static enum fsm_event bc_event(struct port *p, int 
fd_index)
        }
        if (msg_sots_valid(msg)) {
                ts_add(&msg->hwts.ts, -p->rx_timestamp_offset);
-               clock_check_ts(p->clock, tmv_to_nanoseconds(msg->hwts.ts));
+               if (p->state == PS_SLAVE) {
+                       clock_check_ts(p->clock,
+                                      tmv_to_nanoseconds(msg->hwts.ts));
+               }
        }
 
        switch (msg_type(msg)) {
-- 
2.26.3


I also had to comment out this block in phc2sys in order to get all of the ptp 
devices to update properly when the local time is behind the reference time.
It appears that with this code present, phc2sys selects the first interface as 
the default clock before any port locks to the GM and then synchronizes all the 
others to it. 
This causes a problem when the clock is far behind because after the port 
properly locks to the GM and updates, phc2sys has already performed its first 
step operation and all the other clocks are left slewing their clocks by a 
massive amount to catch up. 

With it removed, phc2sys appears to wait for a port to actually enter the SLAVE 
state before doing the first step for other clocks.

diff --git a/phc2sys.c b/phc2sys.c
index 15f8d75..f4014d9 100644
--- a/phc2sys.c
+++ b/phc2sys.c
@@ -466,19 +466,19 @@ static void reconfigure(struct node *node)
                }
                last = c;
        }
-       if (dst_cnt > 1 && !src) {
-               if (!rt || rt->dest_only) {
-                       node->master = last;
-                       /* Reset to original state in next reconfiguration. */
-                       node->master->new_state = node->master->state;
-                       node->master->state = PS_SLAVE;
-                       if (rt)
-                               rt->state = PS_SLAVE;
-                       pr_info("no source, selecting %s as the default clock",
-                               last->device);
-                       return;
-               }
-       }
+       // if (dst_cnt > 1 && !src) {
+       //      if (!rt || rt->dest_only) {
+       //              node->master = last;
+       //              /* Reset to original state in next reconfiguration. */
+       //              node->master->new_state = node->master->state;
+       //              node->master->state = PS_SLAVE;
+       //              if (rt)
+       //                      rt->state = PS_SLAVE;
+       //              pr_info("no source, selecting %s as the default clock",
+       //                      last->device);
+       //              return;
+       //      }
+       // }
        if (src_cnt > 1) {
                pr_info("multiple master clocks available, postponing sync...");
                node->master = NULL;


Thanks for your help and feedback,

Cole



From: Cole Walker <ce.wal...@live.com>
Sent: June 14, 2021 1:19 PM
To: Keller, Jacob E <jacob.e.kel...@intel.com>; Richard Cochran 
<richardcoch...@gmail.com>
Cc: linuxptp-users@lists.sourceforge.net <linuxptp-users@lists.sourceforge.net>
Subject: Re: [Linuxptp-users] ptp4l cannot sync when local time is far behind 
reference time 
 
Thank you for the responses. I will try this out on a newer kernel report back.

Regards,
Cole

From: Keller, Jacob E <jacob.e.kel...@intel.com>
Sent: June 14, 2021 1:15 PM
To: Richard Cochran <richardcoch...@gmail.com>; Cole Walker <ce.wal...@live.com>
Cc: linuxptp-users@lists.sourceforge.net <linuxptp-users@lists.sourceforge.net>
Subject: RE: [Linuxptp-users] ptp4l cannot sync when local time is far behind 
reference time 
 


> -----Original Message-----
> From: Richard Cochran <richardcoch...@gmail.com>
> Sent: Sunday, June 13, 2021 11:53 PM
> To: Cole Walker <ce.wal...@live.com>
> Cc: linuxptp-users@lists.sourceforge.net
> Subject: Re: [Linuxptp-users] ptp4l cannot sync when local time is far behind
> reference time
> 
> On Thu, Jun 10, 2021 at 09:59:46PM +0000, Cole Walker wrote:
> 
> > host1:~# cat /etc/centos-release
> > CentOS Linux release 7.6.1810 (Core)
> >
> > host1:~# uname -r
> > 3.10.0-1160.15.2.rt56.1152.el7.tis.4.x86_64
> 
> Pretty old kernel, ...
> 
> > host1:~# ethtool -i ens787f3
> > driver: i40e
> > version: 2.14.13
> 
> and once again i40e raises its head suspiciously.  I guess this is a
> buggy back port.  I would try a more recent kernel that natively
> supports i40e hardware.
> 
> HTH,
> Richard
> 
> 

Please do try a newer kernel. There have been many fixes to i40e in the last 
years that likely haven't made it back into whatever kernel this 3.10.0-based 
kernel is.

It is a CentOS kernel.. so perhaps you could also try a newer release of CentOS 
instead, (RedHat does a pretty good job backporting fixes so perhaps something 
newer has the more recent fixes in it)

Thanks,
Jake


_______________________________________________
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users

Reply via email to