I'm unable to update the kernel on my system due to other external requirements. I am running an out-of-tree driver for the i40e that was released in Jan 2021 (v2.14.13).
I have been looking for other solutions and it appears that I have been able to resolve the issue with two changes. Please let me know if you have any thoughts or if it looks like I'm heading for trouble. I applied this patch from Miroslav Lichvar that I found on the mailing list, which seems to resolve the issue with the ports getting stuck in UNCALIBRATED https://sourceforge.net/p/linuxptp/mailman/message/37293303/ Don't perform the sanity check on receive timestamps from ports in non-slave states to avoid false positives in the jbod mode, where the timestamps can be generated by different clocks. Reviewed-by: Jacob Keller <jacob.e.keller@...> Signed-off-by: Miroslav Lichvar <mlichvar@...> --- port.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/port.c b/port.c index 10bb9e1..fb420fb 100644 --- a/port.c +++ b/port.c @@ -2744,7 +2744,10 @@ static enum fsm_event bc_event(struct port *p, int fd_index) } if (msg_sots_valid(msg)) { ts_add(&msg->hwts.ts, -p->rx_timestamp_offset); - clock_check_ts(p->clock, tmv_to_nanoseconds(msg->hwts.ts)); + if (p->state == PS_SLAVE) { + clock_check_ts(p->clock, + tmv_to_nanoseconds(msg->hwts.ts)); + } } switch (msg_type(msg)) { -- 2.26.3 I also had to comment out this block in phc2sys in order to get all of the ptp devices to update properly when the local time is behind the reference time. It appears that with this code present, phc2sys selects the first interface as the default clock before any port locks to the GM and then synchronizes all the others to it. This causes a problem when the clock is far behind because after the port properly locks to the GM and updates, phc2sys has already performed its first step operation and all the other clocks are left slewing their clocks by a massive amount to catch up. With it removed, phc2sys appears to wait for a port to actually enter the SLAVE state before doing the first step for other clocks. diff --git a/phc2sys.c b/phc2sys.c index 15f8d75..f4014d9 100644 --- a/phc2sys.c +++ b/phc2sys.c @@ -466,19 +466,19 @@ static void reconfigure(struct node *node) } last = c; } - if (dst_cnt > 1 && !src) { - if (!rt || rt->dest_only) { - node->master = last; - /* Reset to original state in next reconfiguration. */ - node->master->new_state = node->master->state; - node->master->state = PS_SLAVE; - if (rt) - rt->state = PS_SLAVE; - pr_info("no source, selecting %s as the default clock", - last->device); - return; - } - } + // if (dst_cnt > 1 && !src) { + // if (!rt || rt->dest_only) { + // node->master = last; + // /* Reset to original state in next reconfiguration. */ + // node->master->new_state = node->master->state; + // node->master->state = PS_SLAVE; + // if (rt) + // rt->state = PS_SLAVE; + // pr_info("no source, selecting %s as the default clock", + // last->device); + // return; + // } + // } if (src_cnt > 1) { pr_info("multiple master clocks available, postponing sync..."); node->master = NULL; Thanks for your help and feedback, Cole From: Cole Walker <ce.wal...@live.com> Sent: June 14, 2021 1:19 PM To: Keller, Jacob E <jacob.e.kel...@intel.com>; Richard Cochran <richardcoch...@gmail.com> Cc: linuxptp-users@lists.sourceforge.net <linuxptp-users@lists.sourceforge.net> Subject: Re: [Linuxptp-users] ptp4l cannot sync when local time is far behind reference time Thank you for the responses. I will try this out on a newer kernel report back. Regards, Cole From: Keller, Jacob E <jacob.e.kel...@intel.com> Sent: June 14, 2021 1:15 PM To: Richard Cochran <richardcoch...@gmail.com>; Cole Walker <ce.wal...@live.com> Cc: linuxptp-users@lists.sourceforge.net <linuxptp-users@lists.sourceforge.net> Subject: RE: [Linuxptp-users] ptp4l cannot sync when local time is far behind reference time > -----Original Message----- > From: Richard Cochran <richardcoch...@gmail.com> > Sent: Sunday, June 13, 2021 11:53 PM > To: Cole Walker <ce.wal...@live.com> > Cc: linuxptp-users@lists.sourceforge.net > Subject: Re: [Linuxptp-users] ptp4l cannot sync when local time is far behind > reference time > > On Thu, Jun 10, 2021 at 09:59:46PM +0000, Cole Walker wrote: > > > host1:~# cat /etc/centos-release > > CentOS Linux release 7.6.1810 (Core) > > > > host1:~# uname -r > > 3.10.0-1160.15.2.rt56.1152.el7.tis.4.x86_64 > > Pretty old kernel, ... > > > host1:~# ethtool -i ens787f3 > > driver: i40e > > version: 2.14.13 > > and once again i40e raises its head suspiciously. I guess this is a > buggy back port. I would try a more recent kernel that natively > supports i40e hardware. > > HTH, > Richard > > Please do try a newer kernel. There have been many fixes to i40e in the last years that likely haven't made it back into whatever kernel this 3.10.0-based kernel is. It is a CentOS kernel.. so perhaps you could also try a newer release of CentOS instead, (RedHat does a pretty good job backporting fixes so perhaps something newer has the more recent fixes in it) Thanks, Jake _______________________________________________ Linuxptp-users mailing list Linuxptp-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-users