May be I didn't explain it clearly earlier.

I understand one-step mode is where HW support is needed and our hardware does
insert the timestamp. But because the protocol stack sets the SKBTX_HW_TSTAMP
bit in the skb, the driver (which is supposed to return the tx timestamp back
up the stack based on this flag) can't distinguish this as a special case for
which though the flag is set, it should *not* put a copy of the pkt back on the
error queue.

Here's the sequence:

        1. ptp4l sends down 1-Step sync packet on the event socket (i.e, one
            for which SO_TIMESTAMPING socket option is set).
        2.  Network stack sets the SKBTX_HW_TSTAMP in skb since socket is
            marked for SO_TIMESTAMPING.
        3. Stack sends down the skb to driver with SKBTX_HW_TSTAMP set.
        4. Driver sends down the packet to HW.
        5. HW inserts timestamp (yes, it is HW that's inserting, not driver !!).
        6. HW transmits the packet and generates a completion to driver.
        7. The driver looks up the corresponding skb and since SKBTX_HW_TSTAMP
            is set, makes a copy of skb into error queue.

It is the SKBTX_HW_TSTAMP flag that tells the driver whether it should or
shouldn't pass back a copy of pkt into error queue. In this case, the HW is
working correctly, both inserting the timestamp and providing that inserted
timestamp back to driver. But the driver's decision to pass it back up, is
based on the SKBTX_HW_TSTAMP flag which is driven by ptp4l and the stack. The
reason this flag is set is because ptp4l sent down the skb on a socket that is
marked for HW timestamp insertion/generation.

We could change the driver to make an exception and ignore the flag
SKBTX_HW_TSTAMP and not put a skb back into the error queue, if it's one-step
sync packet that got transmitted. But that doesn't seem like the right approach
as the driver ends up not honoring what the stack is asking it via the flag.
The driver essentially violates the driver "interface" with the stack.

This is why I think that the right way to fix this would be:

1. SKBTX_HW_TSTAMP should not be set on the skb if a copy of skb with
   tx-timestamp is not needed by the consumer. HW would still insert the
   timestamp in the packet while transmitting. But the driver won't copy into
   error queue since SKBTX_HW_TSTAMP is not set in skb.

        OR

2. If the consumer sets the flag SKBTX_HW_TSTAMP, then the consumer should be
   prepared to get a copy of the packet and read it.

I feel that approach 2 is better since SKBTX_HW_TSTAMP implies both inserting
a timestamp and getting back a copy with the timestamp in it.

But the setting of the flag on the skb/pkt is driven by ptp4l. Once that's set,
driver is supposed to take expected action. Like I already said in my previous
email this is also as per the kernel documentation.

Let me know if you still think it's something the driver should workaround.

Thanks,
-Harsha

-----Original Message-----
From: Keller, Jacob E [mailto:[email protected]] 
Sent: Wednesday, August 13, 2014 11:27 PM
To: Sriharsha Basavapatna
Cc: [email protected]
Subject: Re: [Linuxptp-users] One-step sync and P2P mode issue

On Wed, 2014-08-13 at 04:21 +0000, Sriharsha Basavapatna wrote:
> 
> -----Original Message-----
> From: Keller, Jacob E [mailto:[email protected]]
> Sent: Wednesday, August 13, 2014 1:35 AM
> To: [email protected]
> Subject: Re: [Linuxptp-users] One-step sync and P2P mode issue
> 
> On Tue, 2014-08-12 at 06:46 +0000, Sriharsha Basavapatna wrote:
> > Hi folks,
> > 
> >  
> > 
> > I'm running into this problem in P2P mode with our PTP capable NIC.
> > I'd
> > 
> > appreciate if you could take a look at the details below and confirm 
> > if this is
> > 
> > an issue in ptp4l. The linuxptp version is: 1.4-00060-g93b7807.
> > 
> >  
> > 
> > After starting ptp4l, it fails to send sync after a few iterations 
> > right at
> > 
> > the beginning. The error is "No message of desired type" (ENOMSG - 
> > 42). After
> > 
> > a timeout of 16 seconds (fault clear timeout) it restarts and 
> > encounters
> > 
> > the same error and this repeats.
> > 
> >  
> > 
> > Here's the ptp4l command/args used:
> > 
> >         ptp4l -f ptp2.cfg -HPm2 -l7 -i ptp2
> > 
> >  
> > 
> > Note that I've configured one-step sync that seems to be triggering 
> > this error
> > 
> > condition.
> > 
> >  
> > 
> > ptp4l[352396.491]: selected /dev/ptp4 as PTP clock
> > 
> > ptp4l[352396.493]: PI servo: sync interval 1.000 kp 0.700 ki 
> > 0.300000
> > 
> > ptp4l[352396.514]: port 1: INITIALIZING to LISTENING on INITIALIZE
> > 
> > ptp4l[352397.514]: port 1: delay timeout
> > 
> > ptp4l[352397.514]: port 1: sending pdelay req seq(0)
> > 
> >  
> > 
> > ptp4l[352398.514]: port 1: delay timeout
> > 
> > ptp4l[352398.514]: port 1: sending pdelay req seq(1)
> > 
> >  
> > 
> > ptp4l[352398.544]: port 1: setting asCapable
> > 
> > ptp4l[352399.514]: port 1: delay timeout
> > 
> > ptp4l[352399.514]: port 1: sending pdelay req seq(2)
> > 
> >  
> > 
> > ptp4l[352400.514]: port 1: delay timeout
> > 
> > ptp4l[352400.514]: port 1: sending pdelay req seq(3)
> > 
> >  
> > 
> > ptp4l[352401.514]: port 1: delay timeout
> > 
> > ptp4l[352401.514]: port 1: sending pdelay req seq(4)
> > 
> >  
> > 
> > ptp4l[352402.514]: port 1: delay timeout
> > 
> > ptp4l[352402.515]: port 1: sending pdelay req seq(5)
> > 
> >  
> > 
> > ptp4l[352403.160]: port 1: announce timeout
> > 
> > ptp4l[352403.160]: port 1: LISTENING to MASTER on 
> > ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
> > 
> > ptp4l[352403.161]: selected best master clock 0090fa.fffe.6c024e
> > 
> > ptp4l[352403.161]: assuming the grand master role
> > 
> > ptp4l[352403.162]: port 1: master tx announce timeout
> > 
> > ptp4l[352403.515]: port 1: delay timeout
> > 
> > ptp4l[352403.515]: port 1: sending pdelay req seq(6)
> > 
> >  
> > 
> > ptp4l[352404.161]: port 1: master sync timeout
> > 
> > ptp4l[352404.161]: p->timestamping (3)
> > 
> >  
> > 
> > ptp4l[352404.515]: port 1: delay timeout
> > 
> > ptp4l[352404.515]: port 1: sending pdelay req seq(7)
> > 
> >  
> > 
> > ptp4l[352405.161]: port 1: master sync timeout
> > 
> > ptp4l[352405.161]: p->timestamping (3)
> > 
> >  
> > 
> > ptp4l[352405.162]: port 1: master tx announce timeout
> > 
> > ptp4l[352405.515]: port 1: delay timeout
> > 
> > ptp4l[352405.515]: port 1: sending pdelay req seq(8)
> > 
> >  
> > 
> > ptp4l[352406.161]: port 1: master sync timeout
> > 
> > ptp4l[352406.161]: send failed: 42 No message of desired type
> > 
> > ptp4l[352406.161]: port 1: send sync failed
> > 
> > ptp4l[352406.161]: port 1: MASTER to FAULTY on FAULT_DETECTED
> > (FT_UNSPECIFIED)
> > 
> > ptp4l[352406.183]: waiting 2^{4} seconds to clear fault on port 1
> > 
> > ptp4l[352422.183]: clearing fault on port 1
> > 
> >  
> > 
> > Based on further debugging, the sequence of events leading to the 
> > error is
> > 
> > shown below:
> > 
> >  
> > 
> > - Driver/HW receives pdelay request and sync msgs alternately like
> > this:
> > 
> >         - pdelay-req
> > 
> >         - sync
> > 
> >         - pdelay-req
> > 
> >         - sync          <----- hits ENOMSG error
> > 
> > - Messages are sent and tx timestamps generated for each (all are 
> > event msgs).
> > 
> > - Driver makes a copy of skb with timestamp info into socket error 
> > queue.
> > 
> 
> I don't think we should even put a message into the socket error queue in the 
> first place if the transmit type was ONESTEP. That might be a driver bug? The 
> driver should be modifying the PTP packet and inserting the timestamp payload 
> for us, without then further notifying the stack.
> 
> If we are in ONESTEP mode, and the driver supports it, then the driver should 
> no longer be sending the timestamp clone back up the stack.
> 
> Thanks,
> Jake
> 
> 
> This would mean that the driver needs to inspect packets in the 
> transmit path to determine if it's 1) a PTP pkt, 2) Sync pkt and 3) 
> One-step flag is set. I feel this is more work in the driver and also 
> I don't see this specified as a requirement for driver support, in the 
> kernel documentation for PTP (Documentation/ 
> networking/timestamping.txt). The only information that the 
> kernel/stack provides to the driver is whether timestamp should be 
> generated for a packet, through SKBTX_HW_TSTAMP in skb_shinfo(skb)->tx_flags. 
> This is the only check that the driver is expected to do.
> 
> I looked at a couple of other drivers and they don't seem to have any 
> such extra checking to filter out error queue insertion of one-step 
> sync. The problem might show up with other drivers. So I think it'd be 
> better to filter it out in ptp4l by just reading back the packet in 
> error queue and dropping/ignoring it.
> 
> Thanks,
> -Harsha

That is exactly what it means. You cannot support one-step mode unless hardware 
supports it. One-step mode is where the hardware inserts the hardware timestamp 
into the packet just before sending it with as close accuracy as possible.

There is nothing to filter, or deal with because they simply do not call 
skb_tx_tstamp. In the case of one-step there is no returned timestamp.

If you use the ONESTEP mode of the hwtstamp ioctl, that is where you inform 
that all timestamped packets will be onestep mode.

Do any of those drivers you looked at actually support Onestep mode?

One-step mode is a specialized mode which requires hardware support.

It is not better to filter the packet out because it is a bug. The 
hardware/driver in question needs to be inserting the timestamp directly into 
the payload of the Sync packet it is timestamping. (else it's not "one step" 
mode).

If ptp4l silently ignored this packet it would make it harder to find the bug. 
In addition, if the hardware isn't inserting the timestamp directly into the 
packet and instead is passing it up the stack it won't work anyways!

Thanks,
Jake
------------------------------------------------------------------------------
_______________________________________________
Linuxptp-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linuxptp-users

Reply via email to