On Mon, Nov 08, 2021 at 02:03:56PM -0800, Christopher Wingert wrote:
>
>
> On 11/8/2021 1:43 PM, Vladimir Oltean wrote:
> > On Mon, Nov 08, 2021 at 01:28:03PM -0800, Christopher Wingert wrote:
> > >
> > > On 11/8/2021 12:38 PM, Vladimir Oltean wrote:
> > > > On Mon, Nov 08, 2021 at 12:11:11PM -0800, Christopher Wingert wrote:
> > > > > Hi,
> > > > >
> > > > > I am working with a Aquantia AQC 107 ethernet interface. After the
> > > > > announce
> > > > > message is sent on FD_GENERAL, a poll() of the the FD_GENERAL
> > > > > descriptor
> > > > > generates a POLLERR. I see 3 delay messages go out the interface on
> > > > > FD_EVENT (previous to the announce message) without issue (no socket
> > > > > error
> > > > > on read on the FD_EVENT descriptor).
> > > > >
> > > > > The only difference i see between the two sockets is how the
> > > > > sock_filter is
> > > > > setup.
> > > > >
> > > > > I am thinking this is an issue with the Aquantia driver, as the same
> > > > > command
> > > > > on a Mellanox Connect X5 works fine.
> > > > >
> > > > > Has anyone seen this issue or have a clue as to where I should start?
> > > > >
> > > > > Thanks!
> > > > > Chris
> > > > >
> > > > >
> > > > > ptp4l command line : ptp4l -i els1 -H -P -2 -m
> > > > > Kernel is 4.18
> > > > > I downloaded the latest Atlantic driver from the Marvell website
> > > > > 2.4.14.0
> > > > > I have upgraded the AQC 107 firmware to 3.1.121
> > > > I've no experience with this driver whatsoever, but generally, what
> > > > ptp4l receives on the error queue of a socket is a TX timestamp. What is
> > > > surprising is that there's a TX timestamp for a general (not event)
> > > > message, because ptp4l does not ask these to be timestamped.
> > > >
> > > > Apart from the error messages, does the system otherwise behave ok?
> > > >
> > > > You can try to read from the general message socket into a packet buffer
> > > > and hexdump it, put it in tcpdump and see what it is. Then the next step
> > > > might be to process its control messages (cmsg), although my first guess
> > > > would be that TX timestamping is what's going on.
> > > >
> > > > There are plenty of things that could go wrong in a driver (especially
> > > > in one you downloaded from the vendor's website and not from
> > > > kernel.org).
> > > > If you're handy with the source code, you can check what is the
> > > > condition based on which this driver offers hardware TX timestamps to
> > > > the stack. It should be if skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP
> > > > is set for that packet, AND hardware TX timestamping was requested
> > > > through HWTSTAMP_TX_ON.
> > > Thank you for the quick response!
> > >
> > > This is what the current version from git looks like on the 107 without
> > > any
> > > code changes (3 delay requests, 1 announce), this loops indefinitely and
> > > MASTER never gets enabled.
> > > ptp4l[506134.862]: selected /dev/ptp11 as PTP clock
> > > ptp4l[506134.889]: port 1 (els1): INITIALIZING to LISTENING on
> > > INIT_COMPLETE
> > > ptp4l[506134.889]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on
> > > INIT_COMPLETE
> > > ptp4l[506134.889]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on
> > > INIT_COMPLETE
> > > ptp4l[506141.948]: port 1 (els1): LISTENING to MASTER on
> > > ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
> > > ptp4l[506141.948]: selected local clock ac1f6b.fffe.dce92d as best master
> > > ptp4l[506141.948]: port 1 (els1): assuming the grand master role
> > > ptp4l[506141.950]: port 1 (els1): unexpected socket error
> > > ptp4l[506141.950]: port 1 (els1): MASTER to FAULTY on FAULT_DETECTED
> > > (FT_UNSPECIFIED)
> > >
> > >
> > > I changed raw.c function raw_send() to the below code to get the timestamp
> > > on both sockets.
> > > /*
> > > * Get the time stamp right away.
> > > */
> > > // return event == TRANS_EVENT ? sk_receive(fd, pkt, len, NULL, hwts,
> > > MSG_ERRQUEUE) : cnt;
> > > if ( event == TRANS_EVENT ) return sk_receive(fd, pkt, len, NULL,
> > > hwts, MSG_ERRQUEUE);
> > > if ( event == TRANS_GENERAL ) return sk_receive(fd, pkt, len, NULL,
> > > hwts, MSG_ERRQUEUE);
> > > return cnt;
> > >
> > > This is the result.
> > > ptp4l[506201.215]: selected /dev/ptp11 as PTP clock
> > > ptp4l[506201.245]: port 1 (els1): INITIALIZING to LISTENING on
> > > INIT_COMPLETE
> > > ptp4l[506201.245]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on
> > > INIT_COMPLETE
> > > ptp4l[506201.245]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING
> > > on INIT_COMPLETE
> > > ptp4l[506208.757]: port 1 (els1): LISTENING to MASTER on
> > > ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
> > > ptp4l[506208.757]: selected local clock ac1f6b.fffe.dce92d as best master
> > > ptp4l[506208.757]: port 1 (els1): assuming the grand master role
> > > ptp4l[506208.759]: poll for tx timestamp woke up on non ERR event
> > > ptp4l[506208.759]: port 1 (els1): send announce failed
> > > ptp4l[506208.759]: port 1 (els1): MASTER to FAULTY on FAULT_DETECTED
> > > (FT_UNSPECIFIED)
> > >
> > > Unless there is something wrong in my code change, it doesn't seem to be a
> > > timestamp.
> > >
> > > Are you saying that every POLLERR should be combined with a message in the
> > > Error Queue?
> > It's still implausible that it's not a timestamp (and I don't know what
> > it can be if that's not it). "man poll" only says:
> >
> > POLLERR
> > Error condition (only returned in revents; ignored in
> > events). This bit is also set for a file descriptor
> > referring to the write end of a pipe when the read end has
> > been closed.
> >
> > and since ptp4l does not open connection-oriented sockets for general
> > PTP messages, I don't think it can detect that the read end has been
> > closed.
> >
> > What seems to be more likely to be going on is that you haven't made all
> > changes necessary for reading TX timestamps from the error queue of the
> > general socket. Have you called sk_timestamping_init?
> >
> > flags = 1;
> > if (setsockopt(fd, SOL_SOCKET, SO_SELECT_ERR_QUEUE,
> > &flags, sizeof(flags)) < 0) {
> > pr_warning("%s: SO_SELECT_ERR_QUEUE: %m", device);
> > sk_events = 0;
> > sk_revents = POLLERR;
> > }
> >
> > introduced by this kernel commit:
> >
> > commit 7d4c04fc170087119727119074e72445f2bb192b
> > Author: Keller, Jacob E <[email protected]>
> > Date: Thu Mar 28 11:19:25 2013 +0000
> >
> > net: add option to enable error queue packets waking select
> >
> > Currently, when a socket receives something on the error queue it only
> > wakes up
> > the socket on select if it is in the "read" list, that is the socket
> > has
> > something to read. It is useful also to wake the socket if it is in
> > the error
> > list, which would enable software to wait on error queue packets
> > without waking
> > up for regular data on the socket. The main use case is for receiving
> > timestamped transmit packets which return the timestamp to the socket
> > via the
> > error queue. This enables an application to select on the socket for
> > the error
> > queue only instead of for the regular traffic.
> >
> > -v2-
> > * Added the SO_SELECT_ERR_QUEUE socket option to every architechture
> > specific file
> > * Modified every socket poll function that checks error queue
> >
> > Signed-off-by: Jacob Keller <[email protected]>
> > Cc: Jeffrey Kirsher <[email protected]>
> > Cc: Richard Cochran <[email protected]>
> > Cc: Matthew Vick <[email protected]>
> > Signed-off-by: David S. Miller <[email protected]>
> >
> > So you effectively cannot call poll() or select() on the error queue of
> > a socket without enabling this option. Also, I think the sk_receive()
> > function messes up quite badly, because of this incosistent mode in
> > which it's operating. See, it looks at this global variable called
> > sk_revents to figure out which events is poll() supposed to return. But
> > the code was written assuming that there's a single socket on which you
> > will poll for TX timestamps. And you have two, and configured
> > differently, at that: on one you call sk_timestamping_init() and on the
> > other you don't (or at least you don't mention that you do).
>
> Again, thank you for the quick response. I owe you a beer.
>
> Adding the call to sk_timestamping_init() in raw_open() for FD_GENERAL
> allowed ptp to come online to a steady state and assume a MASTER role.
>
> I am guessing the AQC 107 driver is getting confused on which socket
> actually wants timestamps.
Please read carefully my second reply. The test is invalid as it stands.
Either call only the setsockopt(SO_SELECT_ERR_QUEUE) portion instead of
the whole sk_timestamping_init() for the general message socket, or
remove the revents check from sk_receive().
_______________________________________________
Linuxptp-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linuxptp-users