Hi Vladimir,

> -----Original Message-----
> From: Vladimir Oltean <olte...@gmail.com>
> Sent: Tuesday, October 12, 2021 7:11 PM
> To: Hutchinson, Brian (US) - PSPC <brian.hutchin...@l3harris.com>
> Cc: linuxptp-users@lists.sourceforge.net; Christian Eggers <cegg...@arri.de>
> Subject: [EXTERNAL] Re: [Linuxptp-users] Using G.8275.2 profile and getting
> tx timestamp timeout, but changing logSyncInterval etc. changes how often
> this happens
> 
> On Fri, Oct 08, 2021 at 03:22:10PM +0000, Brian.Hutchinson--- via Linuxptp-
> users wrote:
> > Hi,
> >
> > I'm using Christian's DSA patches
> > https://lkml.org/lkml/2020/10/19/633) on a NXP iMX8MM with a Microchip
> > ksz9567 with ptp4l.conf setup for E2E G.8275.2 profile.  I'm running a
> > 1G RGMII interface and my GM and unit under test is connected via a 1G
> > Netgear dumb switch.
> >
> > Using 5.10.32 kernel with CONFIG_HZ_1000 and nohz=off on cmdline.
> >
> > I've been getting the "timed out while polling for tx timestamp" error
> > which causes linuxptp to restart.  When linuxptp restarts my 1PPS
> > (generated from Microchip switch) walks all over the place on my O
> > Scope until linuxptp gets a good sync again and pulls 1PPS back into
> > sync with the GM sync out reference I'm also watching on the scope.
> >
> > Of course increasing tx_timestamp_timeout doesn't appear to help in
> > this case.  I've tried values all the way up to 8000.
> >
> > But I can significantly reduce the frequency of the problem if I make
> > changes to some ptp4l.conf settings.
> >
> > With ptp4l.conf settings:
> >
> > logAnnounceInterval 1
> > logSyncInterval 0
> > logMinDelayReqInterval 0
> > logMinPdelayReqInterval 0
> > announceReceiptTimeout 2
> >
> > I'll see the tx timestamp timeout probably 15 or so times running a
> > test overnight.
> >
> > If I set :
> >
> > logAnnounceInterval 1
> > logSyncInterval 2
> > logMinDelayReqInterval 2
> > logMinPdelayReqInterval 2
> > announceReceiptTimeout 2
> >
> > ... then I might see tx timestamp only once or twice on an overnight run.
> >
> > I read a comment from Douglas Arnold from Meinberg that if basically
> > anything goes wrong with fulfilling a grant, message rate or grant
> > duration, or both, should be reduced.
> >
> > I've searched the archives and read all of the responses and a few
> > caught my attention.  Most say it's a driver bug but some said it
> > could be a stack issue.  So I'm wondering since I can significantly
> > decrease the occurrence of the tx timeout by modifying above settings,
> > what other settings would affect or tune this particular telco
> > profile?
> >
> > I'm still fairly new to all this and I understand the telco profiles
> > are a bit unique so I'm trying to understand what ptp4l.conf settings
> > I need to focus on for this particular profile.
> >
> > If this is a "stack" issue, what can I do to reduce the "message rate"
> > or "grant duration" if these are related to whatever a "stack" issue
> > is?
> 
> I'd be willing to put my money on a driver bug. But for that you'd need to
> confirm that the issue reproduces with the default.cfg and not just with the
> G.8275.2 profile. Don't try to run before you can walk.

Ah, you are using my military saying of "craw, walk, run ... stumble, fall 
down" against me!

I had this working with normal 1588 profile with P2P and don't recall having 
any linuxptp restarts due to tx timeouts.

I think the problems I've noticed are only with E2E but I could be mistaken.

> 
> Make no mistake, there was a reason why the patches you've pointed to
> were not applied to the mainline kernel in their given form at the time.
> 
> But regardless, which specific version of the patches have you applied?
> Your link points to the RFC (aka "barely works"), whereas the latest version,
> before being abandoned, was v5.
> https://patchwork.kernel.org/project/netdevbpf/cover/20201203102117.899
> 5-1-cegg...@arri.de/

Oh I'm aware.  I've been dealing with this Microchip long before I met 
Christian.  I've been working with Christian since we are in same boat using 
Microchip switches.  I'm using his very latest patch set.

I've read the discussions you guys had about the patches.  But I have no choice 
other than to use Christians patches and do whatever I can to prove them out 
and make improvements if possible as Microchip has nothing for us at this time. 
 They have things in the works (they are busy on other things too) that may 
help one day but it doesn't help us now with our immediate need.  I tried to 
get their proprietary patch set working on our platform with the help from 
Microchip and could never get anything to work.  That's probably partly my 
fault but this is a direction I never really wanted to go in anyway.  I only 
attempted it as I thought it was the shortest path to get what we needed 
working the quickest.  But we will be changing kernels frequently (cyber guys 
have to make a living too you know) so I didn't want to continually have to 
figure out how to continue to apply Microchips proprietary patches to a 
continually moving target ... especially when their stuff will never be 
mainlined.  I like to believe that going the DSA route and attempting to get 
something mainlined for the ksz is time better spent.  Which may be naïve 
considering the previous comments regarding all this but I still think it's the 
right thing to do. 

> I specifically had a comment that TX timestamps would potentially get lost if
> user space would attempt timestamping of one frame while another was still
> in progress, and this only got fixed in v5 by the addition of a
> ksz9477_defer_xmit() function that waits until the in-flight skb has been
> timestamped. There might be other issues too.

... the version I have has ksz9477_defer_xmit.

I noticed in the "sja1105_port_deferred_xmit" they protected theirs with a 
mutex and also do a check on a "clone" variable that looks to be associated 
with "dsa_skb_tx_timestamp" ... but the ksz dsa doesn't have that so don't know 
if I'm on the right trail with this or not.  Unfortunately I've just recently 
got into all this so I don't have the knowledge and background you guys do so I 
probably only know enough to be dangerous.  

In reading the archives, I do enjoy reading your posts.  Glad you chimed in and 
hope to learn something.

> 
> The logAnnounceInterval should not be making a difference, because the
> driver performs one-step timestamping for Sync messages, so their rate
> shouldn't matter, as the TX timestamp isn't reported to user space.
> Just the two-step TX timestamp of the Pdelay_Req frame is, and therefore,
> modulating the logMinPdelayReqInterval value is the only thing that should
> be able to modulate the behavior of your observed issue.
> 
> [ also, don't be shy to also provide negative values to
> logMinPdelayReqInterval,
>   for example -3 means 2^-3 seconds == 125 ms. We should see something
>   really quickly with a setting like that ]

Oh I've tried negative values.  The example Renesas G.8275.2 profile I found 
and followed had negative values so I used those at first.  It makes things 
happen a lot faster and also makes linuxptp reset with this tx timestamp issue 
much more frequently.  Which is why I dialed it back but results in less 
accurate 1PPS (aka more jitter).

> 
> Once you have a simple reproducer with the v5, maybe Christian would be
> able to tell you where to put some trace points in the kernel for a better
> understanding of what goes wrong with the Pdelay_Req messages.

Christian is quite busy with other things now so you're stuck with me 😉

> 
> > Regards,
> >
> > Brian
> >
> > My complete ptp4l.conf settings.  These settings will run with less "
> > timed out while polling for tx timestamp" occurrences but increases my
> > 1PPS jitter observed on O Scope by +/- 600ish ns.  When I run with
> > first set of logXxx settings above the jitter is much better at +/-
> > 200ish ns.
> >
> > [global]
> > #
> > # Default Data Set
> > #
> > twoStepFlag             0
> > slaveOnly               1
> > priority1               128
> > priority2               255
> > domainNumber            44
> > utc_offset             37
> > #clockClass              248
> > clockClass              255
> > #step_window            3
> > clockAccuracy           0xFE
> > offsetScaledLogVariance 0xFFFF
> > free_running            0
> > freq_est_interval       1
> > dscp_event              0
> > dscp_general            0
> > #dataset_comparison     ieee1588
> > #for G.8275.1
> > dataset_comparison      G.8275.x
> > G.8275.defaultDS.localPriority  128
> > #
> > # Port Data Set
> > #
> > logAnnounceInterval     1
> > logSyncInterval         2
> > logMinDelayReqInterval  2
> > logMinPdelayReqInterval 2
> > announceReceiptTimeout  2
> > syncReceiptTimeout      0
> > delayAsymmetry          0
> > fault_reset_interval    -128
> > #fault_reset_interval    4
> > neighborPropDelayThresh 20000000
> > masterOnly              0
> > G.8275.portDS.localPriority     128
> > #
> > # Run time options
> > #
> > assume_two_step         0
> > logging_level           6
> > path_trace_enabled      0
> > follow_up_info          0
> > hybrid_e2e              1
> > inhibit_multicast_service       1
> > net_sync_monitor        0
> > tc_spanning_tree        0
> > #tx_timestamp_timeout    300
> > tx_timestamp_timeout    8000
> > unicast_listen          1
> > unicast_req_duration    300
> > unicast_master_table    1
> > use_syslog              1
> > verbose                 0
> > summary_interval        4
> > kernel_leap             1
> > #check_fup_sync          0
> > check_fup_sync          1
> > #
> > # Servo Options
> > #
> > #write_phase_mode       1
> > servo_offset_threshold  100
> > servo_num_offset_values 64
> > pi_proportional_const   0.0
> > #pi_proportional_const   0.7
> > pi_integral_const       0.0
> > #pi_integral_const       0.3
> > pi_proportional_scale   0.0
> > pi_proportional_exponent        -0.3
> > pi_proportional_norm_max        0.7
> > pi_integral_scale       0.0
> > pi_integral_exponent    0.4
> > pi_integral_norm_max    0.3
> > step_threshold          0.0
> > #step_threshold         0.00002
> > first_step_threshold    0.00002
> > max_frequency           900000000
> > clock_servo             pi
> > sanity_freq_limit       200000000
> > ntpshm_segment          0
> > #
> > # Transport options
> > #
> > transportSpecific       0x0
> > ptp_dst_mac            01:1B:19:00:00:00
> > p2p_dst_mac            01:80:C2:00:00:0E
> > udp_ttl                 1
> > #udp6_scope             0x0E
> > uds_address             /var/run/ptp4l
> > #
> > # Default interface options
> > #
> > clock_type              OC
> > network_transport       UDPv4
> > #delay_mechanism         P2P
> > delay_mechanism         E2E
> > time_stamping           p2p1step
> > #time_stamping           onestep
> > #time_stamping           hardware
> > #tsproc_mode            filter
> > tsproc_mode             filter_weight
> > delay_filter           moving_median
> > #delay_filter_length    10
> > delay_filter_length    100
> > egressLatency           0
> > ingressLatency          0
> > boundary_clock_jbod     0
> > #
> > # Clock description
> > #
> > productDescription      ;;
> > revisionData            ;;
> > manufacturerIdentity    00:00:00
> > userDescription         ;
> > timeSource              0xA0
> > maxStepsRemoved         255
> > #
> > [unicast_master_table]
> > table_id                        1
> > logQueryInterval                2
> > UDPv4                           192.168.0.250
> > #UDPv4                           192.168.1.250
> > #
> > [lan1]
> > unicast_master_table
> >
> >
> >
> > CONFIDENTIALITY NOTICE: This email and any attachments are for the
> > sole use of the intended recipient and may contain material that is
> > proprietary, confidential, privileged or otherwise legally protected
> > or restricted under applicable government laws. Any review,
> > disclosure, distributing or other use without expressed permission of
> > the sender is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies without
> > reading, printing, or saving.
> 
> Am I an intended recipient? Let me know so I can delete the email if needed.
> What about the sourceforge mail archive?

Ha, ha.  Yeah,  I usually use my gmail account as my work is a very large 
bureaucracy that is mostly defense contractor related so our IT puts once size 
fits all solutions on us even though we do private land mobile radio and public 
safety (police, fire, dispatch consoles etc.).  So it is what it is.

My work is all Open Source so forgive me as it's out of my control and just try 
to ignore it.  I'm like most in Open Source and simply trying to push things 
along for the common good.  The proprietary Gestapo isn't going to come after 
anyone 😉

Regards,

Brian

  

CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of 
the intended recipient and may contain material that is proprietary, 
confidential, privileged or otherwise legally protected or restricted under 
applicable government laws. Any review, disclosure, distributing or other use 
without expressed permission of the sender is strictly prohibited. If you are 
not the intended recipient, please contact the sender and delete all copies 
without reading, printing, or saving.


_______________________________________________
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users

Reply via email to