Problem: pmc incorrectly reporting Grandmasters connected when in fact they are physically disconnected from the LAN. Only fixed after multiple restarts.
Scenario: I have two PTP clients (RHEL 7) each using two NICs to sync to two Grandmasters (Zyfer Gsyncs) using linuxptp-3.1.1. Has been working fine for a year in all versions of linuxptp. [Why am I running two ptp4l processes? To sync two NIC PHCs which are used by NTP as refclocks. ] I use pmc to check if I have synchronization to a Grandmaster. *Yesterday the two Grandmasters were disconnected from the local Ethernet switch.* On one PTP slave pmc correctly reported the disconnect: pmc -i enp10s0f2 "GET TIME_STATUS_NP" sending: GET TIME_STATUS_NP b49691.fffe.37fe82-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP master_offset 0 ingress_time 0 cumulativeScaledRateOffset +0.000000000 scaledLastGmPhaseChange 0 gmTimeBaseIndicator 0 lastGmPhaseChange 0x0000'0000000000000000.0000 gmPresent false <------ gmIdentity b49691.fffe.37fe82 <----------- client On a second PTP client, *the Grandmaster is reported as still present:* pmc -i enp10s0f0 "GET TIME_STATUS_NP" sending: GET TIME_STATUS_NP b49691.fffe.35c204-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP master_offset 52 ingress_time 0 cumulativeScaledRateOffset +0.000000000 scaledLastGmPhaseChange 0 gmTimeBaseIndicator 0 lastGmPhaseChange 0x0000'0000000000000000.0000 gmPresent true gmIdentity 0019dd.fffe.002009 pmc -i enp10s0f2 "GET TIME_STATUS_NP" sending: GET TIME_STATUS_NP b49691.fffe.35c206-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP master_offset 113 ingress_time 0 cumulativeScaledRateOffset +0.000000000 scaledLastGmPhaseChange 0 gmTimeBaseIndicator 0 lastGmPhaseChange 0x0000'0000000000000000.0000 gmPresent true <------ gmIdentity 0019dd.fffe.001ffb <------ Grandmaster pmc -i enp10s0f0 "GET TIME_STATUS_NP" sending: GET TIME_STATUS_NP b49691.fffe.35c204-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP master_offset 52 ingress_time 0 cumulativeScaledRateOffset +0.000000000 scaledLastGmPhaseChange 0 gmTimeBaseIndicator 0 lastGmPhaseChange 0x0000'0000000000000000.0000 gmPresent true gmIdentity 0019dd.fffe.002009 However "systemctl -l status ptp4l-1.service and ...ptp4l-2.service correctly reports the connections are down: systemctl -l status ptp4l-2 ● ptp4l-2.service - Precision Time Protocol (PTP) service second interface Loaded: loaded (/etc/systemd/system/ptp4l-2.service; enabled; vendor preset: disabled) Active: inactive (dead) since Tue 2022-09-27 20:56:02 UTC; 10s ago Process: 2527 ExecStart=/usr/local/linuxptp/sbin/ptp4l $OPTIONS2 (code=exited, status=0/SUCCESS) Main PID: 2527 (code=exited, status=0/SUCCESS) Sep 27 20:54:55 dc-ntp01.rdte.usno.navy.mil ptp4l[2527]: ptp4l[1618.728]: selected local clock b49691.fffe.37fe82 as best master Sep 27 20:55:03 dc-ntp01.rdte.usno.navy.mil ptp4l[2527]: ptp4l[1626.969]: selected local clock b49691.fffe.37fe82 as best master Sep 27 20:55:13 dc-ntp01.rdte.usno.navy.mil ptp4l[2527]: ptp4l[1636.869]: selected local clock b49691.fffe.37fe82 as best master Sep 27 20:55:23 dc-ntp01.rdte.usno.navy.mil ptp4l[2527]: ptp4l[1646.184]: selected local clock b49691.fffe.37fe82 as best master So I restart my ptp4l services (several times) but still pmc reports that it sees the two Grandmasters. Next I reboot the server, but it still "sees" the two Grandmasters. Meanwhile the other server does not see them. I copy the pmc binary from the server that does not see the Grandmasters to the one that still does. Same result, systemctl reports no connect, but pmc still sees Grandmasters on one of my two clients. Eventually, following multiple stops and starts of ptp4l, it starts correctly reporting: enp10s0f0 "GET TIME_STATUS_NP" sending: GET TIME_STATUS_NP b49691.fffe.35c204-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP master_offset 0 ingress_time 0 cumulativeScaledRateOffset +0.000000000 scaledLastGmPhaseChange 0 gmTimeBaseIndicator 0 lastGmPhaseChange 0x0000'0000000000000000.0000 gmPresent false gmIdentity b49691.fffe.35c204 -bash-4.2# pmc -i enp10s0f2 "GET TIME_STATUS_NP" sending: GET TIME_STATUS_NP b49691.fffe.35c206-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP master_offset 0 ingress_time 0 cumulativeScaledRateOffset +0.000000000 scaledLastGmPhaseChange 0 gmTimeBaseIndicator 0 lastGmPhaseChange 0x0000'0000000000000000.0000 gmPresent false gmIdentity b49691.fffe.35c206 Baffled, Richard Schmidt Precise Time Dept US Naval Observatory -- *"We learn from history that we learn nothing from history." * *George Bernard Shaw * “The ideal subject of totalitarian rule is not the convinced Nazi or the convinced communist, but people for whom the distinction between fact and fiction . . . and the distinction between true and false . . . no longer exist.” —Hanna Arendt, “The Origins of Totalitarianism” (1951)
_______________________________________________ Linuxptp-users mailing list Linuxptp-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-users