Hello,

We see that if a 'certain' process is running, it causes ptp4l to switch
into faulty state at some point. After that, phc2sys never syncs back to
ptp4l.

This certain process is a proprietary application running as
non-root/unprivileged but, at a very high level, it receives UDP packets,
transforms the data, sends TCP packets. It does not perform any high
frequency scheduling or other unusual activity. The CPU load from this
application is 50-70%. We have verified that simply running a CPU stress
test with 100% load does not cause ptp4l to go into this faulty state so we
don't think CPU load is a contributor by itself.

We have also tried to increase `tx_timestamp_timeout` to 10ms but same
behavior was observed. Network driver is `ixgbe` on Linux kernel 5.4.

We are looking for any pointers to help resolve this issue and especially
figure out what's causing this transition to faulty state.

Log:
```
May 27 20:28:08 hostname phc2sys[534]: [1061.490] CLOCK_REALTIME phc offset
      131 s0 freq      +0 delay   1800
May 27 20:28:09 hostname ptp4l[629]: [1062.033] master offset        -68 s3
freq    -423 path delay      1779
May 27 20:28:09 hostname phc2sys[534]: [1062.490] CLOCK_REALTIME phc offset
       59 s0 freq      +0 delay   1807
May 27 20:28:10 hostname ptp4l[629]: [1062.666] master offset        -26 s3
freq    -401 path delay      1779
May 27 20:28:10 hostname ptp4l[629]: [1063.291] master offset         39 s3
freq    -344 path delay      1779
May 27 20:28:10 hostname ptp4l[629]: [1063.296] timed out while polling for
tx timestamp
May 27 20:28:10 hostname ptp4l[629]: [1063.296] increasing
tx_timestamp_timeout may correct this issue, but it is likely caused by a
driver bug
May 27 20:28:10 hostname ptp4l[629]: [1063.296] port 1: send peer delay
response failed
May 27 20:28:10 hostname ptp4l[629]: [1063.296] port 1: SLAVE to FAULTY on
FAULT_DETECTED (FT_UNSPECIFIED)
May 27 20:28:10 hostname phc2sys[534]: [1063.490] port 2046a1.fffe.06da8d-1
changed state
May 27 20:28:10 hostname phc2sys[534]: [1063.490] reconfiguring after port
state change
May 27 20:28:10 hostname phc2sys[534]: [1063.490] selecting eth0 for
synchronization
May 27 20:28:10 hostname phc2sys[534]: [1063.490] nothing to synchronize
May 27 20:28:26 hostname ptp4l[629]: [1079.456] port 1: FAULTY to LISTENING
on INIT_COMPLETE
May 27 20:28:28 hostname ptp4l[629]: [1080.548] timed out while polling for
tx timestamp
May 27 20:28:28 hostname ptp4l[629]: [1080.548] increasing
tx_timestamp_timeout may correct this issue, but it is likely caused by a
driver bug
May 27 20:28:28 hostname ptp4l[629]: [1080.548] port 1: send peer delay
response failed
May 27 20:28:28 hostname ptp4l[629]: [1080.548] port 1: LISTENING to FAULTY
on FAULT_DETECTED (FT_UNSPECIFIED)
May 27 20:28:44 hostname ptp4l[629]: [1096.716] port 1: FAULTY to LISTENING
on INIT_COMPLETE
May 27 20:28:45 hostname ptp4l[629]: [1097.779] port 1: new foreign master
020021.fffe.120104-28
May 27 20:28:47 hostname ptp4l[629]: [1099.822] selected best master clock
26cc35.fffe.82241b
May 27 20:28:47 hostname ptp4l[629]: [1099.822] running in a temporal vortex
May 27 20:28:47 hostname ptp4l[629]: [1099.822] port 1: LISTENING to
UNCALIBRATED on RS_SLAVE
May 27 20:28:47 hostname ptp4l[629]: [1100.322] master offset        232 s3
freq    -139 path delay      1779
May 27 20:28:47 hostname phc2sys[534]: [1100.492] port 2046a1.fffe.06da8d-1
changed state
May 27 20:28:47 hostname phc2sys[534]: [1100.492] reconfiguring after port
state change
May 27 20:28:47 hostname phc2sys[534]: [1100.492] master clock not ready,
waiting...
May 27 20:28:48 hostname ptp4l[629]: [1100.957] master offset        -28 s3
freq    -330 path delay      1780
May 27 20:28:49 hostname ptp4l[629]: [1101.581] master offset        -60 s3
freq    -370 path delay      1780
May 27 20:28:49 hostname ptp4l[629]: [1102.208] master offset       -112 s3
freq    -440 path delay      1780
May 27 20:28:50 hostname ptp4l[629]: [1102.833] master offset        -95 s3
freq    -457 path delay      1781
```
`/etc/linuxptp.cfg`:
```
[global]
#
# Options carried over from gPTP.
#
gmCapable 1
priority1 255
priority2 255
logSyncInterval -3
syncReceiptTimeout 3
neighborPropDelayThresh 800
min_neighbor_prop_delay -20000000
assume_two_step 1
path_trace_enabled 1
follow_up_info 1
transportSpecific 0x1
ptp_dst_mac 01:80:C2:00:00:0E
network_transport L2
delay_mechanism P2P
#
# Automotive Profile specific options
#
BMCA ptp
slaveOnly 1
inhibit_announce 1
asCapable               true
ignore_source_id 1
# Required to quickly correct Time Jumps in master
step_threshold          1
operLogSyncInterval     0
operLogPdelayReqInterval 2
msg_interval_request     1
servo_offset_threshold   30
servo_num_offset_values  10

[eth0]
```
Process arguments:
```
/usr/sbin/ptp4l -f /etc/linuxptp.cfg

/usr/sbin/phc2sys -a -r -E ntpshm -M 9 --transportSpecific=1
```

Sincerely,

Mehmet Akbulut

-- 
This email contains information belonging to Motional AD LLC or its 
affiliates and may contain confidential, proprietary, copyrighted and/or 
privileged information. Any unauthorized review, use, reliance, disclosure, 
distribution or copying is prohibited. If you are not the intended 
recipient, immediately destroy all copies of the original email and any 
attachments and contact the sender by reply email.
_______________________________________________
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users

Reply via email to