On Fri, Oct 08, 2021 at 03:22:10PM +0000, Brian.Hutchinson--- via Linuxptp-users wrote: > Hi, > > I'm using Christian's DSA patches > https://lkml.org/lkml/2020/10/19/633) on a NXP iMX8MM with a Microchip > ksz9567 with ptp4l.conf setup for E2E G.8275.2 profile. I'm running a > 1G RGMII interface and my GM and unit under test is connected via a 1G > Netgear dumb switch. > > Using 5.10.32 kernel with CONFIG_HZ_1000 and nohz=off on cmdline. > > I've been getting the "timed out while polling for tx timestamp" error > which causes linuxptp to restart. When linuxptp restarts my 1PPS > (generated from Microchip switch) walks all over the place on my O > Scope until linuxptp gets a good sync again and pulls 1PPS back into > sync with the GM sync out reference I'm also watching on the scope. > > Of course increasing tx_timestamp_timeout doesn't appear to help in > this case. I've tried values all the way up to 8000. > > But I can significantly reduce the frequency of the problem if I make > changes to some ptp4l.conf settings. > > With ptp4l.conf settings: > > logAnnounceInterval 1 > logSyncInterval 0 > logMinDelayReqInterval 0 > logMinPdelayReqInterval 0 > announceReceiptTimeout 2 > > I'll see the tx timestamp timeout probably 15 or so times running a > test overnight. > > If I set : > > logAnnounceInterval 1 > logSyncInterval 2 > logMinDelayReqInterval 2 > logMinPdelayReqInterval 2 > announceReceiptTimeout 2 > > ... then I might see tx timestamp only once or twice on an overnight run. > > I read a comment from Douglas Arnold from Meinberg that if basically > anything goes wrong with fulfilling a grant, message rate or grant > duration, or both, should be reduced. > > I've searched the archives and read all of the responses and a few > caught my attention. Most say it's a driver bug but some said it > could be a stack issue. So I'm wondering since I can significantly > decrease the occurrence of the tx timeout by modifying above settings, > what other settings would affect or tune this particular telco > profile? > > I'm still fairly new to all this and I understand the telco profiles > are a bit unique so I'm trying to understand what ptp4l.conf settings > I need to focus on for this particular profile. > > If this is a "stack" issue, what can I do to reduce the "message rate" > or "grant duration" if these are related to whatever a "stack" issue > is?
I'd be willing to put my money on a driver bug. But for that you'd need to confirm that the issue reproduces with the default.cfg and not just with the G.8275.2 profile. Don't try to run before you can walk. Make no mistake, there was a reason why the patches you've pointed to were not applied to the mainline kernel in their given form at the time. But regardless, which specific version of the patches have you applied? Your link points to the RFC (aka "barely works"), whereas the latest version, before being abandoned, was v5. https://patchwork.kernel.org/project/netdevbpf/cover/20201203102117.8995-1-cegg...@arri.de/ I specifically had a comment that TX timestamps would potentially get lost if user space would attempt timestamping of one frame while another was still in progress, and this only got fixed in v5 by the addition of a ksz9477_defer_xmit() function that waits until the in-flight skb has been timestamped. There might be other issues too. The logAnnounceInterval should not be making a difference, because the driver performs one-step timestamping for Sync messages, so their rate shouldn't matter, as the TX timestamp isn't reported to user space. Just the two-step TX timestamp of the Pdelay_Req frame is, and therefore, modulating the logMinPdelayReqInterval value is the only thing that should be able to modulate the behavior of your observed issue. [ also, don't be shy to also provide negative values to logMinPdelayReqInterval, for example -3 means 2^-3 seconds == 125 ms. We should see something really quickly with a setting like that ] Once you have a simple reproducer with the v5, maybe Christian would be able to tell you where to put some trace points in the kernel for a better understanding of what goes wrong with the Pdelay_Req messages. > Regards, > > Brian > > My complete ptp4l.conf settings. These settings will run with less " > timed out while polling for tx timestamp" occurrences but increases my > 1PPS jitter observed on O Scope by +/- 600ish ns. When I run with > first set of logXxx settings above the jitter is much better at +/- > 200ish ns. > > [global] > # > # Default Data Set > # > twoStepFlag 0 > slaveOnly 1 > priority1 128 > priority2 255 > domainNumber 44 > utc_offset 37 > #clockClass 248 > clockClass 255 > #step_window 3 > clockAccuracy 0xFE > offsetScaledLogVariance 0xFFFF > free_running 0 > freq_est_interval 1 > dscp_event 0 > dscp_general 0 > #dataset_comparison ieee1588 > #for G.8275.1 > dataset_comparison G.8275.x > G.8275.defaultDS.localPriority 128 > # > # Port Data Set > # > logAnnounceInterval 1 > logSyncInterval 2 > logMinDelayReqInterval 2 > logMinPdelayReqInterval 2 > announceReceiptTimeout 2 > syncReceiptTimeout 0 > delayAsymmetry 0 > fault_reset_interval -128 > #fault_reset_interval 4 > neighborPropDelayThresh 20000000 > masterOnly 0 > G.8275.portDS.localPriority 128 > # > # Run time options > # > assume_two_step 0 > logging_level 6 > path_trace_enabled 0 > follow_up_info 0 > hybrid_e2e 1 > inhibit_multicast_service 1 > net_sync_monitor 0 > tc_spanning_tree 0 > #tx_timestamp_timeout 300 > tx_timestamp_timeout 8000 > unicast_listen 1 > unicast_req_duration 300 > unicast_master_table 1 > use_syslog 1 > verbose 0 > summary_interval 4 > kernel_leap 1 > #check_fup_sync 0 > check_fup_sync 1 > # > # Servo Options > # > #write_phase_mode 1 > servo_offset_threshold 100 > servo_num_offset_values 64 > pi_proportional_const 0.0 > #pi_proportional_const 0.7 > pi_integral_const 0.0 > #pi_integral_const 0.3 > pi_proportional_scale 0.0 > pi_proportional_exponent -0.3 > pi_proportional_norm_max 0.7 > pi_integral_scale 0.0 > pi_integral_exponent 0.4 > pi_integral_norm_max 0.3 > step_threshold 0.0 > #step_threshold 0.00002 > first_step_threshold 0.00002 > max_frequency 900000000 > clock_servo pi > sanity_freq_limit 200000000 > ntpshm_segment 0 > # > # Transport options > # > transportSpecific 0x0 > ptp_dst_mac 01:1B:19:00:00:00 > p2p_dst_mac 01:80:C2:00:00:0E > udp_ttl 1 > #udp6_scope 0x0E > uds_address /var/run/ptp4l > # > # Default interface options > # > clock_type OC > network_transport UDPv4 > #delay_mechanism P2P > delay_mechanism E2E > time_stamping p2p1step > #time_stamping onestep > #time_stamping hardware > #tsproc_mode filter > tsproc_mode filter_weight > delay_filter moving_median > #delay_filter_length 10 > delay_filter_length 100 > egressLatency 0 > ingressLatency 0 > boundary_clock_jbod 0 > # > # Clock description > # > productDescription ;; > revisionData ;; > manufacturerIdentity 00:00:00 > userDescription ; > timeSource 0xA0 > maxStepsRemoved 255 > # > [unicast_master_table] > table_id 1 > logQueryInterval 2 > UDPv4 192.168.0.250 > #UDPv4 192.168.1.250 > # > [lan1] > unicast_master_table > > > > CONFIDENTIALITY NOTICE: This email and any attachments are for the > sole use of the intended recipient and may contain material that is > proprietary, confidential, privileged or otherwise legally protected > or restricted under applicable government laws. Any review, > disclosure, distributing or other use without expressed permission of > the sender is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies without > reading, printing, or saving. Am I an intended recipient? Let me know so I can delete the email if needed. What about the sourceforge mail archive? _______________________________________________ Linuxptp-users mailing list Linuxptp-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-users