Problem:
pmc incorrectly reporting Grandmasters connected when in fact they are
physically disconnected from the LAN.  Only fixed after multiple restarts.

Scenario:
I have two PTP clients (RHEL 7) each using two NICs to sync to two
Grandmasters (Zyfer Gsyncs) using linuxptp-3.1.1.  Has been working fine
for a year in all versions of linuxptp.
[Why am I running two ptp4l processes?  To sync two NIC PHCs which are used
by NTP as refclocks. ]
I use pmc to check if I have synchronization to a Grandmaster.

*Yesterday the two Grandmasters were disconnected from the local Ethernet
switch.*

On one PTP slave pmc correctly reported the disconnect:
 pmc -i enp10s0f2 "GET TIME_STATUS_NP"
sending: GET TIME_STATUS_NP
b49691.fffe.37fe82-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP
master_offset              0
ingress_time               0
cumulativeScaledRateOffset +0.000000000
scaledLastGmPhaseChange    0
gmTimeBaseIndicator        0
lastGmPhaseChange          0x0000'0000000000000000.0000
gmPresent                  false
                <------
gmIdentity                 b49691.fffe.37fe82
          <----------- client

On a second PTP client, *the Grandmaster is reported as still present:*

pmc -i enp10s0f0 "GET TIME_STATUS_NP"

sending: GET TIME_STATUS_NP

b49691.fffe.35c204-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP

master_offset              52

ingress_time               0

cumulativeScaledRateOffset +0.000000000

scaledLastGmPhaseChange    0

gmTimeBaseIndicator        0

lastGmPhaseChange          0x0000'0000000000000000.0000

gmPresent                  true

gmIdentity                 0019dd.fffe.002009


pmc -i enp10s0f2 "GET TIME_STATUS_NP"

sending: GET TIME_STATUS_NP

b49691.fffe.35c206-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP

master_offset              113

ingress_time               0

cumulativeScaledRateOffset +0.000000000

scaledLastGmPhaseChange    0

gmTimeBaseIndicator        0

lastGmPhaseChange          0x0000'0000000000000000.0000

gmPresent                  true                          <------

gmIdentity                 0019dd.fffe.001ffb            <------ Grandmaster


pmc -i enp10s0f0 "GET TIME_STATUS_NP"

sending: GET TIME_STATUS_NP

b49691.fffe.35c204-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP

master_offset              52

ingress_time               0

cumulativeScaledRateOffset +0.000000000

scaledLastGmPhaseChange    0

gmTimeBaseIndicator        0

lastGmPhaseChange          0x0000'0000000000000000.0000

gmPresent                  true

gmIdentity                 0019dd.fffe.002009


However "systemctl -l status ptp4l-1.service  and ...ptp4l-2.service
correctly reports the connections are down:


systemctl -l status ptp4l-2

● ptp4l-2.service - Precision Time Protocol (PTP) service second interface

   Loaded: loaded (/etc/systemd/system/ptp4l-2.service; enabled; vendor
preset: disabled)

   Active: inactive (dead) since Tue 2022-09-27 20:56:02 UTC; 10s ago

  Process: 2527 ExecStart=/usr/local/linuxptp/sbin/ptp4l $OPTIONS2
(code=exited, status=0/SUCCESS)

 Main PID: 2527 (code=exited, status=0/SUCCESS)


Sep 27 20:54:55 dc-ntp01.rdte.usno.navy.mil ptp4l[2527]: ptp4l[1618.728]:
selected local clock b49691.fffe.37fe82 as best master

Sep 27 20:55:03 dc-ntp01.rdte.usno.navy.mil ptp4l[2527]: ptp4l[1626.969]:
selected local clock b49691.fffe.37fe82 as best master

Sep 27 20:55:13 dc-ntp01.rdte.usno.navy.mil ptp4l[2527]: ptp4l[1636.869]:
selected local clock b49691.fffe.37fe82 as best master

Sep 27 20:55:23 dc-ntp01.rdte.usno.navy.mil ptp4l[2527]: ptp4l[1646.184]:
selected local clock b49691.fffe.37fe82 as best master



So I restart my ptp4l services (several times) but still pmc reports that
it sees the two Grandmasters.

Next I reboot the server, but it still "sees" the two Grandmasters.

Meanwhile the other server does not see them.


I copy the pmc binary from the server that does not see the Grandmasters to
the one that still does.  Same result, systemctl reports no connect, but
pmc still sees Grandmasters on one of my two clients.


Eventually, following multiple stops and starts of ptp4l,  it starts
correctly reporting:


enp10s0f0 "GET TIME_STATUS_NP"
sending: GET TIME_STATUS_NP
b49691.fffe.35c204-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP
master_offset              0
ingress_time               0
cumulativeScaledRateOffset +0.000000000
scaledLastGmPhaseChange    0
gmTimeBaseIndicator        0
lastGmPhaseChange          0x0000'0000000000000000.0000
gmPresent                  false
gmIdentity                 b49691.fffe.35c204
-bash-4.2# pmc -i enp10s0f2 "GET TIME_STATUS_NP"
sending: GET TIME_STATUS_NP
b49691.fffe.35c206-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP
master_offset              0
ingress_time               0
cumulativeScaledRateOffset +0.000000000
scaledLastGmPhaseChange    0
gmTimeBaseIndicator        0
lastGmPhaseChange          0x0000'0000000000000000.0000
gmPresent                  false
gmIdentity                 b49691.fffe.35c206



Baffled,


Richard Schmidt

Precise Time Dept

US Naval Observatory
-- 

*"We learn from history that we learn nothing from history." *

*George Bernard Shaw *

“The ideal subject of totalitarian rule is not the convinced Nazi or the
convinced communist, but people for whom the distinction between fact and
fiction . . . and the distinction between true and false . . . no longer
exist.” —Hanna Arendt, “The Origins of Totalitarianism” (1951)
_______________________________________________
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users

Reply via email to