Hi Vladimir,
> -----Original Message----- > From: Vladimir Oltean <olte...@gmail.com> > Sent: Tuesday, October 12, 2021 7:11 PM > To: Hutchinson, Brian (US) - PSPC <brian.hutchin...@l3harris.com> > Cc: linuxptp-users@lists.sourceforge.net; Christian Eggers <cegg...@arri.de> > Subject: [EXTERNAL] Re: [Linuxptp-users] Using G.8275.2 profile and getting > tx timestamp timeout, but changing logSyncInterval etc. changes how often > this happens > > On Fri, Oct 08, 2021 at 03:22:10PM +0000, Brian.Hutchinson--- via Linuxptp- > users wrote: > > Hi, > > > > I'm using Christian's DSA patches > > https://lkml.org/lkml/2020/10/19/633) on a NXP iMX8MM with a Microchip > > ksz9567 with ptp4l.conf setup for E2E G.8275.2 profile. I'm running a > > 1G RGMII interface and my GM and unit under test is connected via a 1G > > Netgear dumb switch. > > > > Using 5.10.32 kernel with CONFIG_HZ_1000 and nohz=off on cmdline. > > > > I've been getting the "timed out while polling for tx timestamp" error > > which causes linuxptp to restart. When linuxptp restarts my 1PPS > > (generated from Microchip switch) walks all over the place on my O > > Scope until linuxptp gets a good sync again and pulls 1PPS back into > > sync with the GM sync out reference I'm also watching on the scope. > > > > Of course increasing tx_timestamp_timeout doesn't appear to help in > > this case. I've tried values all the way up to 8000. > > > > But I can significantly reduce the frequency of the problem if I make > > changes to some ptp4l.conf settings. > > > > With ptp4l.conf settings: > > > > logAnnounceInterval 1 > > logSyncInterval 0 > > logMinDelayReqInterval 0 > > logMinPdelayReqInterval 0 > > announceReceiptTimeout 2 > > > > I'll see the tx timestamp timeout probably 15 or so times running a > > test overnight. > > > > If I set : > > > > logAnnounceInterval 1 > > logSyncInterval 2 > > logMinDelayReqInterval 2 > > logMinPdelayReqInterval 2 > > announceReceiptTimeout 2 > > > > ... then I might see tx timestamp only once or twice on an overnight run. > > > > I read a comment from Douglas Arnold from Meinberg that if basically > > anything goes wrong with fulfilling a grant, message rate or grant > > duration, or both, should be reduced. > > > > I've searched the archives and read all of the responses and a few > > caught my attention. Most say it's a driver bug but some said it > > could be a stack issue. So I'm wondering since I can significantly > > decrease the occurrence of the tx timeout by modifying above settings, > > what other settings would affect or tune this particular telco > > profile? > > > > I'm still fairly new to all this and I understand the telco profiles > > are a bit unique so I'm trying to understand what ptp4l.conf settings > > I need to focus on for this particular profile. > > > > If this is a "stack" issue, what can I do to reduce the "message rate" > > or "grant duration" if these are related to whatever a "stack" issue > > is? > > I'd be willing to put my money on a driver bug. But for that you'd need to > confirm that the issue reproduces with the default.cfg and not just with the > G.8275.2 profile. Don't try to run before you can walk. Ah, you are using my military saying of "craw, walk, run ... stumble, fall down" against me! I had this working with normal 1588 profile with P2P and don't recall having any linuxptp restarts due to tx timeouts. I think the problems I've noticed are only with E2E but I could be mistaken. > > Make no mistake, there was a reason why the patches you've pointed to > were not applied to the mainline kernel in their given form at the time. > > But regardless, which specific version of the patches have you applied? > Your link points to the RFC (aka "barely works"), whereas the latest version, > before being abandoned, was v5. > https://patchwork.kernel.org/project/netdevbpf/cover/20201203102117.899 > 5-1-cegg...@arri.de/ Oh I'm aware. I've been dealing with this Microchip long before I met Christian. I've been working with Christian since we are in same boat using Microchip switches. I'm using his very latest patch set. I've read the discussions you guys had about the patches. But I have no choice other than to use Christians patches and do whatever I can to prove them out and make improvements if possible as Microchip has nothing for us at this time. They have things in the works (they are busy on other things too) that may help one day but it doesn't help us now with our immediate need. I tried to get their proprietary patch set working on our platform with the help from Microchip and could never get anything to work. That's probably partly my fault but this is a direction I never really wanted to go in anyway. I only attempted it as I thought it was the shortest path to get what we needed working the quickest. But we will be changing kernels frequently (cyber guys have to make a living too you know) so I didn't want to continually have to figure out how to continue to apply Microchips proprietary patches to a continually moving target ... especially when their stuff will never be mainlined. I like to believe that going the DSA route and attempting to get something mainlined for the ksz is time better spent. Which may be naïve considering the previous comments regarding all this but I still think it's the right thing to do. > I specifically had a comment that TX timestamps would potentially get lost if > user space would attempt timestamping of one frame while another was still > in progress, and this only got fixed in v5 by the addition of a > ksz9477_defer_xmit() function that waits until the in-flight skb has been > timestamped. There might be other issues too. ... the version I have has ksz9477_defer_xmit. I noticed in the "sja1105_port_deferred_xmit" they protected theirs with a mutex and also do a check on a "clone" variable that looks to be associated with "dsa_skb_tx_timestamp" ... but the ksz dsa doesn't have that so don't know if I'm on the right trail with this or not. Unfortunately I've just recently got into all this so I don't have the knowledge and background you guys do so I probably only know enough to be dangerous. In reading the archives, I do enjoy reading your posts. Glad you chimed in and hope to learn something. > > The logAnnounceInterval should not be making a difference, because the > driver performs one-step timestamping for Sync messages, so their rate > shouldn't matter, as the TX timestamp isn't reported to user space. > Just the two-step TX timestamp of the Pdelay_Req frame is, and therefore, > modulating the logMinPdelayReqInterval value is the only thing that should > be able to modulate the behavior of your observed issue. > > [ also, don't be shy to also provide negative values to > logMinPdelayReqInterval, > for example -3 means 2^-3 seconds == 125 ms. We should see something > really quickly with a setting like that ] Oh I've tried negative values. The example Renesas G.8275.2 profile I found and followed had negative values so I used those at first. It makes things happen a lot faster and also makes linuxptp reset with this tx timestamp issue much more frequently. Which is why I dialed it back but results in less accurate 1PPS (aka more jitter). > > Once you have a simple reproducer with the v5, maybe Christian would be > able to tell you where to put some trace points in the kernel for a better > understanding of what goes wrong with the Pdelay_Req messages. Christian is quite busy with other things now so you're stuck with me 😉 > > > Regards, > > > > Brian > > > > My complete ptp4l.conf settings. These settings will run with less " > > timed out while polling for tx timestamp" occurrences but increases my > > 1PPS jitter observed on O Scope by +/- 600ish ns. When I run with > > first set of logXxx settings above the jitter is much better at +/- > > 200ish ns. > > > > [global] > > # > > # Default Data Set > > # > > twoStepFlag 0 > > slaveOnly 1 > > priority1 128 > > priority2 255 > > domainNumber 44 > > utc_offset 37 > > #clockClass 248 > > clockClass 255 > > #step_window 3 > > clockAccuracy 0xFE > > offsetScaledLogVariance 0xFFFF > > free_running 0 > > freq_est_interval 1 > > dscp_event 0 > > dscp_general 0 > > #dataset_comparison ieee1588 > > #for G.8275.1 > > dataset_comparison G.8275.x > > G.8275.defaultDS.localPriority 128 > > # > > # Port Data Set > > # > > logAnnounceInterval 1 > > logSyncInterval 2 > > logMinDelayReqInterval 2 > > logMinPdelayReqInterval 2 > > announceReceiptTimeout 2 > > syncReceiptTimeout 0 > > delayAsymmetry 0 > > fault_reset_interval -128 > > #fault_reset_interval 4 > > neighborPropDelayThresh 20000000 > > masterOnly 0 > > G.8275.portDS.localPriority 128 > > # > > # Run time options > > # > > assume_two_step 0 > > logging_level 6 > > path_trace_enabled 0 > > follow_up_info 0 > > hybrid_e2e 1 > > inhibit_multicast_service 1 > > net_sync_monitor 0 > > tc_spanning_tree 0 > > #tx_timestamp_timeout 300 > > tx_timestamp_timeout 8000 > > unicast_listen 1 > > unicast_req_duration 300 > > unicast_master_table 1 > > use_syslog 1 > > verbose 0 > > summary_interval 4 > > kernel_leap 1 > > #check_fup_sync 0 > > check_fup_sync 1 > > # > > # Servo Options > > # > > #write_phase_mode 1 > > servo_offset_threshold 100 > > servo_num_offset_values 64 > > pi_proportional_const 0.0 > > #pi_proportional_const 0.7 > > pi_integral_const 0.0 > > #pi_integral_const 0.3 > > pi_proportional_scale 0.0 > > pi_proportional_exponent -0.3 > > pi_proportional_norm_max 0.7 > > pi_integral_scale 0.0 > > pi_integral_exponent 0.4 > > pi_integral_norm_max 0.3 > > step_threshold 0.0 > > #step_threshold 0.00002 > > first_step_threshold 0.00002 > > max_frequency 900000000 > > clock_servo pi > > sanity_freq_limit 200000000 > > ntpshm_segment 0 > > # > > # Transport options > > # > > transportSpecific 0x0 > > ptp_dst_mac 01:1B:19:00:00:00 > > p2p_dst_mac 01:80:C2:00:00:0E > > udp_ttl 1 > > #udp6_scope 0x0E > > uds_address /var/run/ptp4l > > # > > # Default interface options > > # > > clock_type OC > > network_transport UDPv4 > > #delay_mechanism P2P > > delay_mechanism E2E > > time_stamping p2p1step > > #time_stamping onestep > > #time_stamping hardware > > #tsproc_mode filter > > tsproc_mode filter_weight > > delay_filter moving_median > > #delay_filter_length 10 > > delay_filter_length 100 > > egressLatency 0 > > ingressLatency 0 > > boundary_clock_jbod 0 > > # > > # Clock description > > # > > productDescription ;; > > revisionData ;; > > manufacturerIdentity 00:00:00 > > userDescription ; > > timeSource 0xA0 > > maxStepsRemoved 255 > > # > > [unicast_master_table] > > table_id 1 > > logQueryInterval 2 > > UDPv4 192.168.0.250 > > #UDPv4 192.168.1.250 > > # > > [lan1] > > unicast_master_table > > > > > > > > CONFIDENTIALITY NOTICE: This email and any attachments are for the > > sole use of the intended recipient and may contain material that is > > proprietary, confidential, privileged or otherwise legally protected > > or restricted under applicable government laws. Any review, > > disclosure, distributing or other use without expressed permission of > > the sender is strictly prohibited. If you are not the intended > > recipient, please contact the sender and delete all copies without > > reading, printing, or saving. > > Am I an intended recipient? Let me know so I can delete the email if needed. > What about the sourceforge mail archive? Ha, ha. Yeah, I usually use my gmail account as my work is a very large bureaucracy that is mostly defense contractor related so our IT puts once size fits all solutions on us even though we do private land mobile radio and public safety (police, fire, dispatch consoles etc.). So it is what it is. My work is all Open Source so forgive me as it's out of my control and just try to ignore it. I'm like most in Open Source and simply trying to push things along for the common good. The proprietary Gestapo isn't going to come after anyone 😉 Regards, Brian CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient and may contain material that is proprietary, confidential, privileged or otherwise legally protected or restricted under applicable government laws. Any review, disclosure, distributing or other use without expressed permission of the sender is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies without reading, printing, or saving. _______________________________________________ Linuxptp-users mailing list Linuxptp-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-users