Hello, We see that if a 'certain' process is running, it causes ptp4l to switch into faulty state at some point. After that, phc2sys never syncs back to ptp4l.
This certain process is a proprietary application running as non-root/unprivileged but, at a very high level, it receives UDP packets, transforms the data, sends TCP packets. It does not perform any high frequency scheduling or other unusual activity. The CPU load from this application is 50-70%. We have verified that simply running a CPU stress test with 100% load does not cause ptp4l to go into this faulty state so we don't think CPU load is a contributor by itself. We have also tried to increase `tx_timestamp_timeout` to 10ms but same behavior was observed. Network driver is `ixgbe` on Linux kernel 5.4. We are looking for any pointers to help resolve this issue and especially figure out what's causing this transition to faulty state. Log: ``` May 27 20:28:08 hostname phc2sys[534]: [1061.490] CLOCK_REALTIME phc offset 131 s0 freq +0 delay 1800 May 27 20:28:09 hostname ptp4l[629]: [1062.033] master offset -68 s3 freq -423 path delay 1779 May 27 20:28:09 hostname phc2sys[534]: [1062.490] CLOCK_REALTIME phc offset 59 s0 freq +0 delay 1807 May 27 20:28:10 hostname ptp4l[629]: [1062.666] master offset -26 s3 freq -401 path delay 1779 May 27 20:28:10 hostname ptp4l[629]: [1063.291] master offset 39 s3 freq -344 path delay 1779 May 27 20:28:10 hostname ptp4l[629]: [1063.296] timed out while polling for tx timestamp May 27 20:28:10 hostname ptp4l[629]: [1063.296] increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug May 27 20:28:10 hostname ptp4l[629]: [1063.296] port 1: send peer delay response failed May 27 20:28:10 hostname ptp4l[629]: [1063.296] port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) May 27 20:28:10 hostname phc2sys[534]: [1063.490] port 2046a1.fffe.06da8d-1 changed state May 27 20:28:10 hostname phc2sys[534]: [1063.490] reconfiguring after port state change May 27 20:28:10 hostname phc2sys[534]: [1063.490] selecting eth0 for synchronization May 27 20:28:10 hostname phc2sys[534]: [1063.490] nothing to synchronize May 27 20:28:26 hostname ptp4l[629]: [1079.456] port 1: FAULTY to LISTENING on INIT_COMPLETE May 27 20:28:28 hostname ptp4l[629]: [1080.548] timed out while polling for tx timestamp May 27 20:28:28 hostname ptp4l[629]: [1080.548] increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug May 27 20:28:28 hostname ptp4l[629]: [1080.548] port 1: send peer delay response failed May 27 20:28:28 hostname ptp4l[629]: [1080.548] port 1: LISTENING to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) May 27 20:28:44 hostname ptp4l[629]: [1096.716] port 1: FAULTY to LISTENING on INIT_COMPLETE May 27 20:28:45 hostname ptp4l[629]: [1097.779] port 1: new foreign master 020021.fffe.120104-28 May 27 20:28:47 hostname ptp4l[629]: [1099.822] selected best master clock 26cc35.fffe.82241b May 27 20:28:47 hostname ptp4l[629]: [1099.822] running in a temporal vortex May 27 20:28:47 hostname ptp4l[629]: [1099.822] port 1: LISTENING to UNCALIBRATED on RS_SLAVE May 27 20:28:47 hostname ptp4l[629]: [1100.322] master offset 232 s3 freq -139 path delay 1779 May 27 20:28:47 hostname phc2sys[534]: [1100.492] port 2046a1.fffe.06da8d-1 changed state May 27 20:28:47 hostname phc2sys[534]: [1100.492] reconfiguring after port state change May 27 20:28:47 hostname phc2sys[534]: [1100.492] master clock not ready, waiting... May 27 20:28:48 hostname ptp4l[629]: [1100.957] master offset -28 s3 freq -330 path delay 1780 May 27 20:28:49 hostname ptp4l[629]: [1101.581] master offset -60 s3 freq -370 path delay 1780 May 27 20:28:49 hostname ptp4l[629]: [1102.208] master offset -112 s3 freq -440 path delay 1780 May 27 20:28:50 hostname ptp4l[629]: [1102.833] master offset -95 s3 freq -457 path delay 1781 ``` `/etc/linuxptp.cfg`: ``` [global] # # Options carried over from gPTP. # gmCapable 1 priority1 255 priority2 255 logSyncInterval -3 syncReceiptTimeout 3 neighborPropDelayThresh 800 min_neighbor_prop_delay -20000000 assume_two_step 1 path_trace_enabled 1 follow_up_info 1 transportSpecific 0x1 ptp_dst_mac 01:80:C2:00:00:0E network_transport L2 delay_mechanism P2P # # Automotive Profile specific options # BMCA ptp slaveOnly 1 inhibit_announce 1 asCapable true ignore_source_id 1 # Required to quickly correct Time Jumps in master step_threshold 1 operLogSyncInterval 0 operLogPdelayReqInterval 2 msg_interval_request 1 servo_offset_threshold 30 servo_num_offset_values 10 [eth0] ``` Process arguments: ``` /usr/sbin/ptp4l -f /etc/linuxptp.cfg /usr/sbin/phc2sys -a -r -E ntpshm -M 9 --transportSpecific=1 ``` Sincerely, Mehmet Akbulut -- This email contains information belonging to Motional AD LLC or its affiliates and may contain confidential, proprietary, copyrighted and/or privileged information. Any unauthorized review, use, reliance, disclosure, distribution or copying is prohibited. If you are not the intended recipient, immediately destroy all copies of the original email and any attachments and contact the sender by reply email.
_______________________________________________ Linuxptp-users mailing list Linuxptp-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-users