Hi folks,
I'm running into this problem in P2P mode with our PTP capable NIC. I'd
appreciate if you could take a look at the details below and confirm if this is
an issue in ptp4l. The linuxptp version is: 1.4-00060-g93b7807.
After starting ptp4l, it fails to send sync after a few iterations right at
the beginning. The error is "No message of desired type" (ENOMSG - 42). After
a timeout of 16 seconds (fault clear timeout) it restarts and encounters
the same error and this repeats.
Here's the ptp4l command/args used:
ptp4l -f ptp2.cfg -HPm2 -l7 -i ptp2
Note that I've configured one-step sync that seems to be triggering this error
condition.
ptp4l[352396.491]: selected /dev/ptp4 as PTP clock
ptp4l[352396.493]: PI servo: sync interval 1.000 kp 0.700 ki 0.300000
ptp4l[352396.514]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[352397.514]: port 1: delay timeout
ptp4l[352397.514]: port 1: sending pdelay req seq(0)
ptp4l[352398.514]: port 1: delay timeout
ptp4l[352398.514]: port 1: sending pdelay req seq(1)
ptp4l[352398.544]: port 1: setting asCapable
ptp4l[352399.514]: port 1: delay timeout
ptp4l[352399.514]: port 1: sending pdelay req seq(2)
ptp4l[352400.514]: port 1: delay timeout
ptp4l[352400.514]: port 1: sending pdelay req seq(3)
ptp4l[352401.514]: port 1: delay timeout
ptp4l[352401.514]: port 1: sending pdelay req seq(4)
ptp4l[352402.514]: port 1: delay timeout
ptp4l[352402.515]: port 1: sending pdelay req seq(5)
ptp4l[352403.160]: port 1: announce timeout
ptp4l[352403.160]: port 1: LISTENING to MASTER on
ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
ptp4l[352403.161]: selected best master clock 0090fa.fffe.6c024e
ptp4l[352403.161]: assuming the grand master role
ptp4l[352403.162]: port 1: master tx announce timeout
ptp4l[352403.515]: port 1: delay timeout
ptp4l[352403.515]: port 1: sending pdelay req seq(6)
ptp4l[352404.161]: port 1: master sync timeout
ptp4l[352404.161]: p->timestamping (3)
ptp4l[352404.515]: port 1: delay timeout
ptp4l[352404.515]: port 1: sending pdelay req seq(7)
ptp4l[352405.161]: port 1: master sync timeout
ptp4l[352405.161]: p->timestamping (3)
ptp4l[352405.162]: port 1: master tx announce timeout
ptp4l[352405.515]: port 1: delay timeout
ptp4l[352405.515]: port 1: sending pdelay req seq(8)
ptp4l[352406.161]: port 1: master sync timeout
ptp4l[352406.161]: send failed: 42 No message of desired type
ptp4l[352406.161]: port 1: send sync failed
ptp4l[352406.161]: port 1: MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
ptp4l[352406.183]: waiting 2^{4} seconds to clear fault on port 1
ptp4l[352422.183]: clearing fault on port 1
Based on further debugging, the sequence of events leading to the error is
shown below:
- Driver/HW receives pdelay request and sync msgs alternately like this:
- pdelay-req
- sync
- pdelay-req
- sync <----- hits ENOMSG error
- Messages are sent and tx timestamps generated for each (all are event msgs).
- Driver makes a copy of skb with timestamp info into socket error queue.
- The first sync message in the error queue is not consumed by ptp4l.
- This is because port_tx_sync() calls transport_send() with event set to
TRANS_ONESTEP. The transport send routines check the event type to know
whether to call sk_receive() after sending the message to get a copy. They do
sk_receive() only if event == TRANS_EVENT.
- In this case since event is TRANS_ONESTEP, there's no call to sk_receive().
- This leaves the sync msg in socket error queue.
- Next pdelay-req is sent; driver clones this skb into error queue.
- ptp4l invokes sk_receive() to get a copy of this pdelay-req.
- The kernel function sock_recv_errqueue() or ip_recv_error() is invoked.
- There's a check in those functions to see if there's more skbs in err queue.
- The kernel function marks the socket in error(sets sk->sk_err) before return.
- The next sync message is sent down by ptp4l.
- Stack calls sock_alloc_send_pskb() to get a skb.
- There, it checks if the socket is in error and returns failure.
- The second sync send fails.
Note that the driver decides whether to make a copy into the error queue based
on SKBTX_HW_TSTAMP. This gets set for packets that are sent down the stack on
the event socket; i.e, the socket on which setsockopt(SO_TIMESTAMPING) has been
done. So the driver seems to be correctly handling the packets.
It looks like a change in ptp4l might be needed to address this. The transport
send functions (raw_send(), udp_send()...) should be modified to consume the
error queue skb also for the case when event is TRANS_ONESTEP. They might just
have to throw away the skb if the timestamp is not of interest since it is
one-step.
Thanks,
-Harsha
------------------------------------------------------------------------------
_______________________________________________
Linuxptp-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linuxptp-users