On Mon, Nov 08, 2021 at 02:03:56PM -0800, Christopher Wingert wrote:
> 
> 
> On 11/8/2021 1:43 PM, Vladimir Oltean wrote:
> > On Mon, Nov 08, 2021 at 01:28:03PM -0800, Christopher Wingert wrote:
> > > 
> > > On 11/8/2021 12:38 PM, Vladimir Oltean wrote:
> > > > On Mon, Nov 08, 2021 at 12:11:11PM -0800, Christopher Wingert wrote:
> > > > > Hi,
> > > > > 
> > > > > I am working with a Aquantia AQC 107 ethernet interface.  After the 
> > > > > announce
> > > > > message is sent on FD_GENERAL, a poll() of the the FD_GENERAL 
> > > > > descriptor
> > > > > generates a POLLERR.  I see 3 delay messages go out the interface on
> > > > > FD_EVENT (previous to the announce message) without issue (no socket 
> > > > > error
> > > > > on read on the FD_EVENT descriptor).
> > > > > 
> > > > > The only difference i see between the two sockets is how the 
> > > > > sock_filter is
> > > > > setup.
> > > > > 
> > > > > I am thinking this is an issue with the Aquantia driver, as the same 
> > > > > command
> > > > > on a Mellanox Connect X5 works fine.
> > > > > 
> > > > > Has anyone seen this issue or have a clue as to where I should start?
> > > > > 
> > > > > Thanks!
> > > > > Chris
> > > > > 
> > > > > 
> > > > > ptp4l command line : ptp4l -i els1 -H -P -2 -m
> > > > > Kernel is 4.18
> > > > > I downloaded the latest Atlantic driver from the Marvell website 
> > > > > 2.4.14.0
> > > > > I have upgraded the AQC 107 firmware to 3.1.121
> > > > I've no experience with this driver whatsoever, but generally, what
> > > > ptp4l receives on the error queue of a socket is a TX timestamp. What is
> > > > surprising is that there's a TX timestamp for a general (not event)
> > > > message, because ptp4l does not ask these to be timestamped.
> > > > 
> > > > Apart from the error messages, does the system otherwise behave ok?
> > > > 
> > > > You can try to read from the general message socket into a packet buffer
> > > > and hexdump it, put it in tcpdump and see what it is. Then the next step
> > > > might be to process its control messages (cmsg), although my first guess
> > > > would be that TX timestamping is what's going on.
> > > > 
> > > > There are plenty of things that could go wrong in a driver (especially
> > > > in one you downloaded from the vendor's website and not from 
> > > > kernel.org).
> > > > If you're handy with the source code, you can check what is the
> > > > condition based on which this driver offers hardware TX timestamps to
> > > > the stack. It should be if skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP
> > > > is set for that packet, AND hardware TX timestamping was requested
> > > > through HWTSTAMP_TX_ON.
> > > Thank you for the quick response!
> > > 
> > > This is what the current version from git looks like on the 107 without 
> > > any
> > > code changes (3 delay requests, 1 announce), this loops indefinitely and
> > > MASTER never gets enabled.
> > > ptp4l[506134.862]: selected /dev/ptp11 as PTP clock
> > > ptp4l[506134.889]: port 1 (els1): INITIALIZING to LISTENING on 
> > > INIT_COMPLETE
> > > ptp4l[506134.889]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on
> > > INIT_COMPLETE
> > > ptp4l[506134.889]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on
> > > INIT_COMPLETE
> > > ptp4l[506141.948]: port 1 (els1): LISTENING to MASTER on
> > > ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
> > > ptp4l[506141.948]: selected local clock ac1f6b.fffe.dce92d as best master
> > > ptp4l[506141.948]: port 1 (els1): assuming the grand master role
> > > ptp4l[506141.950]: port 1 (els1): unexpected socket error
> > > ptp4l[506141.950]: port 1 (els1): MASTER to FAULTY on FAULT_DETECTED
> > > (FT_UNSPECIFIED)
> > > 
> > > 
> > > I changed raw.c function raw_send() to the below code to get the timestamp
> > > on both sockets.
> > >     /*
> > >      * Get the time stamp right away.
> > >      */
> > >     // return event == TRANS_EVENT ? sk_receive(fd, pkt, len, NULL, hwts, 
> > > MSG_ERRQUEUE) : cnt;
> > >     if ( event == TRANS_EVENT ) return sk_receive(fd, pkt, len, NULL, 
> > > hwts, MSG_ERRQUEUE);
> > >     if ( event == TRANS_GENERAL ) return sk_receive(fd, pkt, len, NULL, 
> > > hwts, MSG_ERRQUEUE);
> > >     return cnt;
> > > 
> > > This is the result.
> > > ptp4l[506201.215]: selected /dev/ptp11 as PTP clock
> > > ptp4l[506201.245]: port 1 (els1): INITIALIZING to LISTENING on 
> > > INIT_COMPLETE
> > > ptp4l[506201.245]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on 
> > > INIT_COMPLETE
> > > ptp4l[506201.245]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING 
> > > on INIT_COMPLETE
> > > ptp4l[506208.757]: port 1 (els1): LISTENING to MASTER on 
> > > ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
> > > ptp4l[506208.757]: selected local clock ac1f6b.fffe.dce92d as best master
> > > ptp4l[506208.757]: port 1 (els1): assuming the grand master role
> > > ptp4l[506208.759]: poll for tx timestamp woke up on non ERR event
> > > ptp4l[506208.759]: port 1 (els1): send announce failed
> > > ptp4l[506208.759]: port 1 (els1): MASTER to FAULTY on FAULT_DETECTED
> > > (FT_UNSPECIFIED)
> > > 
> > > Unless there is something wrong in my code change, it doesn't seem to be a
> > > timestamp.
> > > 
> > > Are you saying that every POLLERR should be combined with a message in the
> > > Error Queue?
> > It's still implausible that it's not a timestamp (and I don't know what
> > it can be if that's not it). "man poll" only says:
> > 
> >         POLLERR
> >                Error condition (only returned in revents; ignored in
> >                events).  This bit is also set for a file descriptor
> >                referring to the write end of a pipe when the read end has
> >                been closed.
> > 
> > and since ptp4l does not open connection-oriented sockets for general
> > PTP messages, I don't think it can detect that the read end has been
> > closed.
> > 
> > What seems to be more likely to be going on is that you haven't made all
> > changes necessary for reading TX timestamps from the error queue of the
> > general socket. Have you called sk_timestamping_init?
> > 
> >     flags = 1;
> >     if (setsockopt(fd, SOL_SOCKET, SO_SELECT_ERR_QUEUE,
> >                    &flags, sizeof(flags)) < 0) {
> >             pr_warning("%s: SO_SELECT_ERR_QUEUE: %m", device);
> >             sk_events = 0;
> >             sk_revents = POLLERR;
> >     }
> > 
> > introduced by this kernel commit:
> > 
> > commit 7d4c04fc170087119727119074e72445f2bb192b
> > Author: Keller, Jacob E <jacob.e.kel...@intel.com>
> > Date:   Thu Mar 28 11:19:25 2013 +0000
> > 
> >      net: add option to enable error queue packets waking select
> > 
> >      Currently, when a socket receives something on the error queue it only 
> > wakes up
> >      the socket on select if it is in the "read" list, that is the socket 
> > has
> >      something to read. It is useful also to wake the socket if it is in 
> > the error
> >      list, which would enable software to wait on error queue packets 
> > without waking
> >      up for regular data on the socket. The main use case is for receiving
> >      timestamped transmit packets which return the timestamp to the socket 
> > via the
> >      error queue. This enables an application to select on the socket for 
> > the error
> >      queue only instead of for the regular traffic.
> > 
> >      -v2-
> >      * Added the SO_SELECT_ERR_QUEUE socket option to every architechture 
> > specific file
> >      * Modified every socket poll function that checks error queue
> > 
> >      Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
> >      Cc: Jeffrey Kirsher <jeffrey.t.kirs...@intel.com>
> >      Cc: Richard Cochran <richardcoch...@gmail.com>
> >      Cc: Matthew Vick <matthew.v...@intel.com>
> >      Signed-off-by: David S. Miller <da...@davemloft.net>
> > 
> > So you effectively cannot call poll() or select() on the error queue of
> > a socket without enabling this option. Also, I think the sk_receive()
> > function messes up quite badly, because of this incosistent mode in
> > which it's operating. See, it looks at this global variable called
> > sk_revents to figure out which events is poll() supposed to return. But
> > the code was written assuming that there's a single socket on which you
> > will poll for TX timestamps. And you have two, and configured
> > differently, at that: on one you call sk_timestamping_init() and on the
> > other you don't (or at least you don't mention that you do).
> 
> Again, thank you for the quick response.  I owe you a beer.
> 
> Adding the call to sk_timestamping_init() in raw_open() for FD_GENERAL
> allowed ptp to come online to a steady state and assume a MASTER role.
> 
> I am guessing the AQC 107 driver is getting confused on which socket
> actually wants timestamps.

Please read carefully my second reply. The test is invalid as it stands.
Either call only the setsockopt(SO_SELECT_ERR_QUEUE) portion instead of
the whole sk_timestamping_init() for the general message socket, or
remove the revents check from sk_receive().


_______________________________________________
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users

Reply via email to