On Mon, Nov 08, 2021 at 11:43:55PM +0200, Vladimir Oltean wrote:
> On Mon, Nov 08, 2021 at 01:28:03PM -0800, Christopher Wingert wrote:
> >
> >
> > On 11/8/2021 12:38 PM, Vladimir Oltean wrote:
> > > On Mon, Nov 08, 2021 at 12:11:11PM -0800, Christopher Wingert wrote:
> > > > Hi,
> > > >
> > > > I am working with a Aquantia AQC 107 ethernet interface. After the
> > > > announce
> > > > message is sent on FD_GENERAL, a poll() of the the FD_GENERAL descriptor
> > > > generates a POLLERR. I see 3 delay messages go out the interface on
> > > > FD_EVENT (previous to the announce message) without issue (no socket
> > > > error
> > > > on read on the FD_EVENT descriptor).
> > > >
> > > > The only difference i see between the two sockets is how the
> > > > sock_filter is
> > > > setup.
> > > >
> > > > I am thinking this is an issue with the Aquantia driver, as the same
> > > > command
> > > > on a Mellanox Connect X5 works fine.
> > > >
> > > > Has anyone seen this issue or have a clue as to where I should start?
> > > >
> > > > Thanks!
> > > > Chris
> > > >
> > > >
> > > > ptp4l command line : ptp4l -i els1 -H -P -2 -m
> > > > Kernel is 4.18
> > > > I downloaded the latest Atlantic driver from the Marvell website
> > > > 2.4.14.0
> > > > I have upgraded the AQC 107 firmware to 3.1.121
> > > I've no experience with this driver whatsoever, but generally, what
> > > ptp4l receives on the error queue of a socket is a TX timestamp. What is
> > > surprising is that there's a TX timestamp for a general (not event)
> > > message, because ptp4l does not ask these to be timestamped.
> > >
> > > Apart from the error messages, does the system otherwise behave ok?
> > >
> > > You can try to read from the general message socket into a packet buffer
> > > and hexdump it, put it in tcpdump and see what it is. Then the next step
> > > might be to process its control messages (cmsg), although my first guess
> > > would be that TX timestamping is what's going on.
> > >
> > > There are plenty of things that could go wrong in a driver (especially
> > > in one you downloaded from the vendor's website and not from kernel.org).
> > > If you're handy with the source code, you can check what is the
> > > condition based on which this driver offers hardware TX timestamps to
> > > the stack. It should be if skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP
> > > is set for that packet, AND hardware TX timestamping was requested
> > > through HWTSTAMP_TX_ON.
> >
> > Thank you for the quick response!
> >
> > This is what the current version from git looks like on the 107 without any
> > code changes (3 delay requests, 1 announce), this loops indefinitely and
> > MASTER never gets enabled.
> > ptp4l[506134.862]: selected /dev/ptp11 as PTP clock
> > ptp4l[506134.889]: port 1 (els1): INITIALIZING to LISTENING on INIT_COMPLETE
> > ptp4l[506134.889]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on
> > INIT_COMPLETE
> > ptp4l[506134.889]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on
> > INIT_COMPLETE
> > ptp4l[506141.948]: port 1 (els1): LISTENING to MASTER on
> > ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
> > ptp4l[506141.948]: selected local clock ac1f6b.fffe.dce92d as best master
> > ptp4l[506141.948]: port 1 (els1): assuming the grand master role
> > ptp4l[506141.950]: port 1 (els1): unexpected socket error
> > ptp4l[506141.950]: port 1 (els1): MASTER to FAULTY on FAULT_DETECTED
> > (FT_UNSPECIFIED)
> >
> >
> > I changed raw.c function raw_send() to the below code to get the timestamp
> > on both sockets.
> > /*
> > * Get the time stamp right away.
> > */
> > // return event == TRANS_EVENT ? sk_receive(fd, pkt, len, NULL, hwts,
> > MSG_ERRQUEUE) : cnt;
> > if ( event == TRANS_EVENT ) return sk_receive(fd, pkt, len, NULL, hwts,
> > MSG_ERRQUEUE);
> > if ( event == TRANS_GENERAL ) return sk_receive(fd, pkt, len, NULL,
> > hwts, MSG_ERRQUEUE);
> > return cnt;
> >
> > This is the result.
> > ptp4l[506201.215]: selected /dev/ptp11 as PTP clock
> > ptp4l[506201.245]: port 1 (els1): INITIALIZING to LISTENING on INIT_COMPLETE
> > ptp4l[506201.245]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on
> > INIT_COMPLETE
> > ptp4l[506201.245]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on
> > INIT_COMPLETE
> > ptp4l[506208.757]: port 1 (els1): LISTENING to MASTER on
> > ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
> > ptp4l[506208.757]: selected local clock ac1f6b.fffe.dce92d as best master
> > ptp4l[506208.757]: port 1 (els1): assuming the grand master role
> > ptp4l[506208.759]: poll for tx timestamp woke up on non ERR event
> > ptp4l[506208.759]: port 1 (els1): send announce failed
> > ptp4l[506208.759]: port 1 (els1): MASTER to FAULTY on FAULT_DETECTED
> > (FT_UNSPECIFIED)
> >
> > Unless there is something wrong in my code change, it doesn't seem to be a
> > timestamp.
> >
> > Are you saying that every POLLERR should be combined with a message in the
> > Error Queue?
>
> It's still implausible that it's not a timestamp (and I don't know what
> it can be if that's not it). "man poll" only says:
>
> POLLERR
> Error condition (only returned in revents; ignored in
> events). This bit is also set for a file descriptor
> referring to the write end of a pipe when the read end has
> been closed.
>
> and since ptp4l does not open connection-oriented sockets for general
> PTP messages, I don't think it can detect that the read end has been
> closed.
>
> What seems to be more likely to be going on is that you haven't made all
> changes necessary for reading TX timestamps from the error queue of the
> general socket. Have you called sk_timestamping_init?
>
> flags = 1;
> if (setsockopt(fd, SOL_SOCKET, SO_SELECT_ERR_QUEUE,
> &flags, sizeof(flags)) < 0) {
> pr_warning("%s: SO_SELECT_ERR_QUEUE: %m", device);
> sk_events = 0;
> sk_revents = POLLERR;
> }
>
> introduced by this kernel commit:
>
> commit 7d4c04fc170087119727119074e72445f2bb192b
> Author: Keller, Jacob E <[email protected]>
> Date: Thu Mar 28 11:19:25 2013 +0000
>
> net: add option to enable error queue packets waking select
>
> Currently, when a socket receives something on the error queue it only
> wakes up
> the socket on select if it is in the "read" list, that is the socket has
> something to read. It is useful also to wake the socket if it is in the
> error
> list, which would enable software to wait on error queue packets without
> waking
> up for regular data on the socket. The main use case is for receiving
> timestamped transmit packets which return the timestamp to the socket via
> the
> error queue. This enables an application to select on the socket for the
> error
> queue only instead of for the regular traffic.
>
> -v2-
> * Added the SO_SELECT_ERR_QUEUE socket option to every architechture
> specific file
> * Modified every socket poll function that checks error queue
>
> Signed-off-by: Jacob Keller <[email protected]>
> Cc: Jeffrey Kirsher <[email protected]>
> Cc: Richard Cochran <[email protected]>
> Cc: Matthew Vick <[email protected]>
> Signed-off-by: David S. Miller <[email protected]>
>
> So you effectively cannot call poll() or select() on the error queue of
> a socket without enabling this option. Also, I think the sk_receive()
> function messes up quite badly, because of this incosistent mode in
> which it's operating. See, it looks at this global variable called
> sk_revents to figure out which events is poll() supposed to return. But
> the code was written assuming that there's a single socket on which you
> will poll for TX timestamps. And you have two, and configured
> differently, at that: on one you call sk_timestamping_init() and on the
> other you don't (or at least you don't mention that you do).
I think we are running around in circles. If you call sk_timestamping_init()
you are effectively requesting TX timestamps for general messages,
therefore changing the premise of the issue.
Can you instead remove that check for !(pfd.revents & sk_revents)?
_______________________________________________
Linuxptp-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linuxptp-users