Re: [Linuxptp-devel] Forwarding management messages to UDS port

2017-12-07 Thread Richard Cochran
On Thu, Dec 07, 2017 at 12:58:16PM +, Rommel, Albrecht wrote:
> Ideally, the UDS port meets the requirements of a "virtual PTP port" as 
> defined in ITU-T G.8275.

> A virtual PTP port supports all attributes as normally transported
> in Announce messages or general PTP headers, except that the timing
> information comes via technology proprietary methods, i.e. GPS like
> via 1PPS and ToD, instead via packet timestamps. In this context,
> having all management, responses, and signaling messages to appear
> at a UDS port as it would appear at normal PTP ports, the UDS could
> support the IWF between two PTP nodes, which are either configured
> for different PTP profiles each, or which are connected via
> non-Ethernet (i.e. transport/access technology specific methods).

Not sure what you are driving at.

The UDS port is for local control only. You can use it to allow a GPS,
etc service to talk to ptp4l just fine.  The GPS code could care less
about Announce messages.

We can already run two or more different profiles on two or more
different ports.  No need to abuse UDS for that.

WRT other transports, these should be added in the transport layer.
It already supports new methods by its modular design.

Thanks,
Richard



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] Forwarding management messages to UDS port

2017-12-07 Thread Richard Cochran
On Thu, Dec 07, 2017 at 10:13:07AM +0100, Miroslav Lichvar wrote:
> The trouble is that forwarding() is called twice from
> clock_forward_mgmt_msg(), once with the source port and then with the
> destination port. So, if it returned 0 for p == uds->port and action
> == REQUEST, requests from pmc would not be forwarded.

Oh.

> Should it pass NULL as a msg in the first call to avoid the action check?

Maybe the check should go into clock_do_forward_mgmt() instead?

That looks cleaner to me.


Thanks,
Richard

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] Despite the patch, "timed out while polling for tx timestamp" keeps happening

2017-12-07 Thread Keller, Jacob E
> -Original Message-
> From: Frantisek Rysanek [mailto:frantisek.rysa...@post.cz]
> Sent: Thursday, December 07, 2017 12:41 PM
> To: linuxptp-devel@lists.sourceforge.net
> Subject: Re: [Linuxptp-devel] Despite the patch, "timed out while polling for 
> tx
> timestamp" keeps happening
> 
> On 7 Dec 2017 at 18:47, Keller, Jacob E wrote:
> 
> > > => the Intel NIC hardware is possibly sensitive to "irrelevant"
> > > contents in the traffic. I can come up with the following candidate
> > > culprits/theories:
> > > - absence of the VLAN tag
> > > - correction values of 10-20 ms
> > > - other mcast traffic interfering
> > > - higher/different actual jitter in the messages?
> > >
> > > > Which device (and driver) are you using? (I can't see it in the 
> > > > history).
> > > >
> > > On the ptp4l client?
> > > The PC is a pre-production engineering sample panel PC by Arbor/TW,
> > > with Intel Skylake mobile, the NIC that I'm using is an i219LM
> > > integrated on the mothereboard (not sure if this has a MAC on chip
> > > within the PCH/south, or if it's a stand-alone NIC). Of the two Intel
> > > NIC chips, this one is more precise. The kernel is a fresh vanilla
> > > 4.13.12 and the e1000e driver came with it.
> > > I'm attaching a dump of dmesg and lspci. Ask for more if you want.
> > >
> > > Frank Rysanek
> >
> > Do you know the packet rate for Tx packets? (How often is it
> > requesting timestamps)? There was a recent-ish problem I believe we
> > fixed but it appears to be in 4.13: 5012863b7347 ("e1000e: fix race
> > condition around skb_tstamp_tx()", 2017-06-06), but that definitely
> > should be in the 4.13 kernel..
> >
> > There should also be statistics you can check in ethtool stats on the
> > device. Could you try checking if tx_hwtstamp_timeouts is
> > incrementing? Also whether tx_hwtstamp_skipped?
> >
> > Thanks,
> > Jake
> 
> Dear Mr. Keller, thanks for your immediate responses and for the job
> that you're doing on the drivers. You have my deepest respect.
> 
> Yes that patch is in my e1000e driver:
> https://patchwork.ozlabs.org/patch/758160/
> That's the patch mentioned in the subject of this e-mail thread :-)
> 
> ptp4l sends one PDelay Request per second, and answers one
> PDelay Request received from the upstream switch (per second).
> That's three PTP messages transmitted per second.
> There is no other TX traffic on that same port.
> 

Ok, so that means it probably isn't caused by too many requests for the device 
to handle.

> About ethtool stats - I now understand that you mean the output of
> ethtool -S, namely the lines
>  tx_hwtstamp_timeouts: 0
>  tx_hwtstamp_skipped: 0
>  rx_hwtstamp_cleared: 0
> This is what they look like now, that the error does not occur.
> In a few days I will probably have a chance to try it in the field
> again, on a PTP TC switch wih GOOSE flooding the network... that's
> where the misbehavior was most stubborn. Well now I know what to look
> at :-) I'll report more numbers when I have some.
> 

Ok good. If you see Tx timeouts again, try to measure the stats here and see if 
any of these increment. If they do, that's a sure indication that the driver 
was not able to obtain and send the timestamp to the stack. If they *do not* 
increment, then that means that the driver was likely too late when responding 
with the Tx timestamp, which is a separate problem. Oh.. It's possible that the 
device might be going to sleep too quickly.. can you check to see if it 
supports EEE? "ethtool --show-eee "? This causes the device to go into low 
power link mode which substantially increases the latency for actual Tx packets 
(when there's little to no traffic). That might be the reason under some 
circumstances why you see dropped timestamps, if EEE is enabled?

> BTW do you know what volume of RX buffers does the i219LM have on
> chip? Or its companion MAC integrated in the PCH, if the i219 is just
> a PHY.
> 
> Frank Rysanek
> 

The i219 is a MAC, however I don't know the volume of buffers on the chip 
unfortunately. Most of my work is on the i40e, fm10k, and ixgbe, though I've 
helped some of the work on PTP for other parts. (And, as far as I know, I'm the 
only one here who monitors the ptp4l list directly).

Thanks,
Jake


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] Despite the patch, "timed out while polling for tx timestamp" keeps happening

2017-12-07 Thread Frantisek Rysanek
On 7 Dec 2017 at 18:47, Keller, Jacob E wrote:

> > => the Intel NIC hardware is possibly sensitive to "irrelevant"
> > contents in the traffic. I can come up with the following candidate
> > culprits/theories:
> > - absence of the VLAN tag
> > - correction values of 10-20 ms
> > - other mcast traffic interfering
> > - higher/different actual jitter in the messages?
> > 
> > > Which device (and driver) are you using? (I can't see it in the history).
> > >
> > On the ptp4l client?
> > The PC is a pre-production engineering sample panel PC by Arbor/TW,
> > with Intel Skylake mobile, the NIC that I'm using is an i219LM
> > integrated on the mothereboard (not sure if this has a MAC on chip
> > within the PCH/south, or if it's a stand-alone NIC). Of the two Intel
> > NIC chips, this one is more precise. The kernel is a fresh vanilla
> > 4.13.12 and the e1000e driver came with it.
> > I'm attaching a dump of dmesg and lspci. Ask for more if you want.
> > 
> > Frank Rysanek
> 
> Do you know the packet rate for Tx packets? (How often is it
> requesting timestamps)? There was a recent-ish problem I believe we
> fixed but it appears to be in 4.13: 5012863b7347 ("e1000e: fix race
> condition around skb_tstamp_tx()", 2017-06-06), but that definitely
> should be in the 4.13 kernel.. 
> 
> There should also be statistics you can check in ethtool stats on the
> device. Could you try checking if tx_hwtstamp_timeouts is
> incrementing? Also whether tx_hwtstamp_skipped? 
> 
> Thanks,
> Jake

Dear Mr. Keller, thanks for your immediate responses and for the job 
that you're doing on the drivers. You have my deepest respect.

Yes that patch is in my e1000e driver:
https://patchwork.ozlabs.org/patch/758160/
That's the patch mentioned in the subject of this e-mail thread :-)

ptp4l sends one PDelay Request per second, and answers one
PDelay Request received from the upstream switch (per second).
That's three PTP messages transmitted per second.
There is no other TX traffic on that same port.

About ethtool stats - I now understand that you mean the output of 
ethtool -S, namely the lines
 tx_hwtstamp_timeouts: 0
 tx_hwtstamp_skipped: 0
 rx_hwtstamp_cleared: 0
This is what they look like now, that the error does not occur.
In a few days I will probably have a chance to try it in the field 
again, on a PTP TC switch wih GOOSE flooding the network... that's 
where the misbehavior was most stubborn. Well now I know what to look 
at :-) I'll report more numbers when I have some.

BTW do you know what volume of RX buffers does the i219LM have on 
chip? Or its companion MAC integrated in the PCH, if the i219 is just 
a PHY.

Frank Rysanek

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] Despite the patch, "timed out while polling for tx timestamp" keeps happening

2017-12-07 Thread Keller, Jacob E
> -Original Message-
> From: Frantisek Rysanek [mailto:frantisek.rysa...@post.cz]
> Sent: Thursday, December 07, 2017 8:28 AM
> To: linuxptp-devel@lists.sourceforge.net
> Subject: Re: [Linuxptp-devel] Despite the patch, "timed out while polling for 
> tx
> timestamp" keeps happening
>
> => the Intel NIC hardware is possibly sensitive to "irrelevant"
> contents in the traffic. I can come up with the following candidate
> culprits/theories:
> - absence of the VLAN tag
> - correction values of 10-20 ms
> - other mcast traffic interfering
> - higher/different actual jitter in the messages?
> 
> > Which device (and driver) are you using? (I can't see it in the history).
> >
> On the ptp4l client?
> The PC is a pre-production engineering sample panel PC by Arbor/TW,
> with Intel Skylake mobile, the NIC that I'm using is an i219LM
> integrated on the mothereboard (not sure if this has a MAC on chip
> within the PCH/south, or if it's a stand-alone NIC). Of the two Intel
> NIC chips, this one is more precise. The kernel is a fresh vanilla
> 4.13.12 and the e1000e driver came with it.
> I'm attaching a dump of dmesg and lspci. Ask for more if you want.
> 
> Frank Rysanek

Do you know the packet rate for Tx packets? (How often is it requesting 
timestamps)? There was a recent-ish problem I believe we fixed but it appears 
to be in 4.13: 5012863b7347 ("e1000e: fix race condition around 
skb_tstamp_tx()", 2017-06-06), but that definitely should be in the 4.13 
kernel..

There should also be statistics you can check in ethtool stats on the device. 
Could you try checking if tx_hwtstamp_timeouts is incrementing? Also whether 
tx_hwtstamp_skipped?

Thanks,
Jake

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] Despite the patch, "timed out while polling for tx timestamp" keeps happening

2017-12-07 Thread Frantisek Rysanek
Note: I'm forwarding this message with PNG attachments removed,
as I got politely and deservedly reminded that big attachments are a 
no-no in a mailing list. Here goes the message:

> > The "correction" field inserted by the RuggedCom switch contains
> > values between 10 and 20 million raw units, that's some 150 to 300ns.
> > Sounds about appropriate. Makes me wonder if the contents of the PTP
> > traffic can make the Intel hardware puke :-/ The actual jitter, or
> > the non-zero correction field... it's strange.
> > 
Actually... this is probably wrong. The value in the correction.ns 
field is about 10 to 20 million, i.e. 10 to 20 milliseconds. I can 
see the raw value in the frame (in hex) and that's what Wireshark and 
ptpTrackHound interpret, in unison. 
And, one vendor techsupport insists that a correction value of 20 ms
is perfectly allright in a TC switch, due to SW processing of the PTP 
packets. Yikes, what?
Or, is there any chance that my sniffing rig is broken? 

I've captured the PTP traffic by libpcap, 
A) either with ptp4l running in software mode as a client 
to a TC switch (with a Meinberg GM as the next upstream hop) 
B) or as a pure sniffer, listening to traffic between a 3rd-party
   client and the TC. The Intel NIC does have PTP support, but I 
   understand that it is turned off, at the time of the capture. 

Any chance that the Intel NIC hardware would mangle the correction 
field? (I hope not - after some debate in another thread, the 10-20ms 
really seem allright, even if spooky.)

I'll probably have to borrow a proper "meter" device anyway :-/

I have some other potentially interesting observations, relevant to 
ptp4l and Intel HW.

There are two GM's in play: 

GM A (older), which correlated with a problem reported on site by a 
particular 3rd-party PTP slave. Presumed buggy.

GM B (younger), whose deployment correlated with the 3rd-party slave 
becoming happy. Presumed healthy.

The 3rd-party slave is a black box, expensive, presumably 
high-quality implementation.
Let me focus on the behavior observed in ptp4l with HW accel.


I actually tried ptp4l with HW support under several slightly 
different scenaria. L2 Multicast and 2-step P2P mechanism were 
common, but details were different. 

1) with "grandmaster B", directly attached at 1 Gbps, configured for 
C37.238-2017 (including ALTERNATE_TIME_OFFSET_INDICATOR_TLV),
both ends without a VLAN tag, in my lab. That worked for the most 
part, ptp4l would throw maybe 8 TX timeouts during one night (10 
hours).

2) with "grandmaster B", on site, configured for C37.238-2017 
(including ALTERNATE_TIME_OFFSET_INDICATOR_TLV),
both ends without a VLAN tag, through a PTP-capable switch
(the one adding 10-20 ms of "correction").
Here the ptp4l with HW accel would never stop choking with TX 
timeouts. Sometimes it went for 3 to 10 PDelay transactions without a
timeout, sometimes it would run timeout after timeout.
There was 3rd-party multicast traffic on the network (IEC61850 
GOOSE).

3) with "grandmaster A", on site, direct attached, configured for 
C37.238-2011 (no local timezone TLV), but *with* a VLAN tag 
containing ID=0 configured on the GM, and *without* VLAN tag on the 
ptp4l client, the ptp4l would not sychronize to the GM. In the packet
trace I can see all the messages from the GM, and ptp4l does respond 
to the master's PDelay Requests, but the GM does *not* respond to 
ptp4l's PDelay Requests.
=> I consider this a misconfiguration on my part (PEBKAC),
even though... theoretically... VLAN ID=0 means "this packet has 
802.1p priority assigned, but does not belong to a VLAN".
The GM *could* be a little more tolerant / liberal in what it accepts
:-) Then again, I do not know the wording of the 2011 "power 
profile".

4) with "grandmaster A", direct attached, back home in the lab, 
configured for C37.238-2011 (no local timezone TLV), but *with* a 
VLAN tag containing ID=0 configured on the GM, and *with* a VLAN tag 
ID=0 on the ptp4l client (created a VLAN subinterface eth0.0), 
ptp4l now RUNS LIKE A CHEETAH FOR DAYS !
No TX timeouts in the log.

=> the Intel NIC hardware is possibly sensitive to "irrelevant" 
contents in the traffic. I can come up with the following candidate 
culprits/theories: 
- absence of the VLAN tag
- correction values of 10-20 ms
- other mcast traffic interfering
- higher/different actual jitter in the messages?

> Which device (and driver) are you using? (I can't see it in the history).
> 
On the ptp4l client?
The PC is a pre-production engineering sample panel PC by Arbor/TW, 
with Intel Skylake mobile, the NIC that I'm using is an i219LM 
integrated on the mothereboard (not sure if this has a MAC on chip 
within the PCH/south, or if it's a stand-alone NIC). Of the two Intel
NIC chips, this one is more precise. The kernel is a fresh vanilla 
4.13.12 and the e1000e driver came with it.
I'm attaching a dump of dmesg and lspci. Ask for more if you want.

Frank Rysanek



WPM$LMWC.PM$
Description: Mail 

Re: [Linuxptp-devel] Forwarding management messages to UDS port

2017-12-07 Thread Rommel, Albrecht
Hi all,

Ideally, the UDS port meets the requirements of a "virtual PTP port" as defined 
in ITU-T G.8275.
A virtual PTP port supports all attributes as normally transported in Announce 
messages or general PTP headers, except that the timing information comes via 
technology proprietary methods, i.e. GPS like via 1PPS and ToD, instead via 
packet timestamps. In this context, having all management, responses, and 
signaling messages to appear at a UDS port as it would appear at normal PTP 
ports, the UDS could support the IWF between two PTP nodes, which are either 
configured for different PTP profiles each, or which are connected via 
non-Ethernet (i.e. transport/access technology specific methods).

Best regards,
Albrecht


 

-Original Message-
From: Miroslav Lichvar [mailto:mlich...@redhat.com] 
Sent: Thursday, December 7, 2017 10:13 AM
To: Richard Cochran 
Cc: linuxptp-devel@lists.sourceforge.net
Subject: Re: [Linuxptp-devel] Forwarding management messages to UDS port

On Wed, Dec 06, 2017 at 08:02:56AM -0800, Richard Cochran wrote:
> On Wed, Dec 06, 2017 at 02:44:23PM +0100, Miroslav Lichvar wrote:
> > A better option might be to forward only responses to the UDS port. 
> > We don't expect a PTP clock to be listening there, right?
> 
> Right, but we should also forward the ACKNOWLEDGE messages.

Ok.

> Yes, but the check should be made in this function
> 
>   static int forwarding(struct clock *c, struct port *p)
> 
> by passing in the 'msg' as well.

The trouble is that forwarding() is called twice from clock_forward_mgmt_msg(), 
once with the source port and then with the destination port. So, if it 
returned 0 for p == uds->port and action == REQUEST, requests from pmc would 
not be forwarded.

Should it pass NULL as a msg in the first call to avoid the action check?

--
Miroslav Lichvar

--
Check out the vibrant tech community on one of the world's most engaging tech 
sites, Slashdot.org! http://sdm.link/slashdot 
___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] Forwarding management messages to UDS port

2017-12-07 Thread Miroslav Lichvar
On Wed, Dec 06, 2017 at 08:02:56AM -0800, Richard Cochran wrote:
> On Wed, Dec 06, 2017 at 02:44:23PM +0100, Miroslav Lichvar wrote:
> > A better option might be to forward only responses to the UDS port. We
> > don't expect a PTP clock to be listening there, right?
> 
> Right, but we should also forward the ACKNOWLEDGE messages.

Ok.

> Yes, but the check should be made in this function
> 
>   static int forwarding(struct clock *c, struct port *p)
> 
> by passing in the 'msg' as well.

The trouble is that forwarding() is called twice from
clock_forward_mgmt_msg(), once with the source port and then with the
destination port. So, if it returned 0 for p == uds->port and action
== REQUEST, requests from pmc would not be forwarded.

Should it pass NULL as a msg in the first call to avoid the action check?

-- 
Miroslav Lichvar

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel