On Mon, Nov 08, 2021 at 01:28:03PM -0800, Christopher Wingert wrote: > > > On 11/8/2021 12:38 PM, Vladimir Oltean wrote: > > On Mon, Nov 08, 2021 at 12:11:11PM -0800, Christopher Wingert wrote: > > > Hi, > > > > > > I am working with a Aquantia AQC 107 ethernet interface. After the > > > announce > > > message is sent on FD_GENERAL, a poll() of the the FD_GENERAL descriptor > > > generates a POLLERR. I see 3 delay messages go out the interface on > > > FD_EVENT (previous to the announce message) without issue (no socket error > > > on read on the FD_EVENT descriptor). > > > > > > The only difference i see between the two sockets is how the sock_filter > > > is > > > setup. > > > > > > I am thinking this is an issue with the Aquantia driver, as the same > > > command > > > on a Mellanox Connect X5 works fine. > > > > > > Has anyone seen this issue or have a clue as to where I should start? > > > > > > Thanks! > > > Chris > > > > > > > > > ptp4l command line : ptp4l -i els1 -H -P -2 -m > > > Kernel is 4.18 > > > I downloaded the latest Atlantic driver from the Marvell website 2.4.14.0 > > > I have upgraded the AQC 107 firmware to 3.1.121 > > I've no experience with this driver whatsoever, but generally, what > > ptp4l receives on the error queue of a socket is a TX timestamp. What is > > surprising is that there's a TX timestamp for a general (not event) > > message, because ptp4l does not ask these to be timestamped. > > > > Apart from the error messages, does the system otherwise behave ok? > > > > You can try to read from the general message socket into a packet buffer > > and hexdump it, put it in tcpdump and see what it is. Then the next step > > might be to process its control messages (cmsg), although my first guess > > would be that TX timestamping is what's going on. > > > > There are plenty of things that could go wrong in a driver (especially > > in one you downloaded from the vendor's website and not from kernel.org). > > If you're handy with the source code, you can check what is the > > condition based on which this driver offers hardware TX timestamps to > > the stack. It should be if skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP > > is set for that packet, AND hardware TX timestamping was requested > > through HWTSTAMP_TX_ON. > > Thank you for the quick response! > > This is what the current version from git looks like on the 107 without any > code changes (3 delay requests, 1 announce), this loops indefinitely and > MASTER never gets enabled. > ptp4l[506134.862]: selected /dev/ptp11 as PTP clock > ptp4l[506134.889]: port 1 (els1): INITIALIZING to LISTENING on INIT_COMPLETE > ptp4l[506134.889]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on > INIT_COMPLETE > ptp4l[506134.889]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on > INIT_COMPLETE > ptp4l[506141.948]: port 1 (els1): LISTENING to MASTER on > ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES > ptp4l[506141.948]: selected local clock ac1f6b.fffe.dce92d as best master > ptp4l[506141.948]: port 1 (els1): assuming the grand master role > ptp4l[506141.950]: port 1 (els1): unexpected socket error > ptp4l[506141.950]: port 1 (els1): MASTER to FAULTY on FAULT_DETECTED > (FT_UNSPECIFIED) > > > I changed raw.c function raw_send() to the below code to get the timestamp > on both sockets. > /* > * Get the time stamp right away. > */ > // return event == TRANS_EVENT ? sk_receive(fd, pkt, len, NULL, hwts, > MSG_ERRQUEUE) : cnt; > if ( event == TRANS_EVENT ) return sk_receive(fd, pkt, len, NULL, hwts, > MSG_ERRQUEUE); > if ( event == TRANS_GENERAL ) return sk_receive(fd, pkt, len, NULL, hwts, > MSG_ERRQUEUE); > return cnt; > > This is the result. > ptp4l[506201.215]: selected /dev/ptp11 as PTP clock > ptp4l[506201.245]: port 1 (els1): INITIALIZING to LISTENING on INIT_COMPLETE > ptp4l[506201.245]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on > INIT_COMPLETE > ptp4l[506201.245]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on > INIT_COMPLETE > ptp4l[506208.757]: port 1 (els1): LISTENING to MASTER on > ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES > ptp4l[506208.757]: selected local clock ac1f6b.fffe.dce92d as best master > ptp4l[506208.757]: port 1 (els1): assuming the grand master role > ptp4l[506208.759]: poll for tx timestamp woke up on non ERR event > ptp4l[506208.759]: port 1 (els1): send announce failed > ptp4l[506208.759]: port 1 (els1): MASTER to FAULTY on FAULT_DETECTED > (FT_UNSPECIFIED) > > Unless there is something wrong in my code change, it doesn't seem to be a > timestamp. > > Are you saying that every POLLERR should be combined with a message in the > Error Queue?
It's still implausible that it's not a timestamp (and I don't know what it can be if that's not it). "man poll" only says: POLLERR Error condition (only returned in revents; ignored in events). This bit is also set for a file descriptor referring to the write end of a pipe when the read end has been closed. and since ptp4l does not open connection-oriented sockets for general PTP messages, I don't think it can detect that the read end has been closed. What seems to be more likely to be going on is that you haven't made all changes necessary for reading TX timestamps from the error queue of the general socket. Have you called sk_timestamping_init? flags = 1; if (setsockopt(fd, SOL_SOCKET, SO_SELECT_ERR_QUEUE, &flags, sizeof(flags)) < 0) { pr_warning("%s: SO_SELECT_ERR_QUEUE: %m", device); sk_events = 0; sk_revents = POLLERR; } introduced by this kernel commit: commit 7d4c04fc170087119727119074e72445f2bb192b Author: Keller, Jacob E <jacob.e.kel...@intel.com> Date: Thu Mar 28 11:19:25 2013 +0000 net: add option to enable error queue packets waking select Currently, when a socket receives something on the error queue it only wakes up the socket on select if it is in the "read" list, that is the socket has something to read. It is useful also to wake the socket if it is in the error list, which would enable software to wait on error queue packets without waking up for regular data on the socket. The main use case is for receiving timestamped transmit packets which return the timestamp to the socket via the error queue. This enables an application to select on the socket for the error queue only instead of for the regular traffic. -v2- * Added the SO_SELECT_ERR_QUEUE socket option to every architechture specific file * Modified every socket poll function that checks error queue Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> Cc: Jeffrey Kirsher <jeffrey.t.kirs...@intel.com> Cc: Richard Cochran <richardcoch...@gmail.com> Cc: Matthew Vick <matthew.v...@intel.com> Signed-off-by: David S. Miller <da...@davemloft.net> So you effectively cannot call poll() or select() on the error queue of a socket without enabling this option. Also, I think the sk_receive() function messes up quite badly, because of this incosistent mode in which it's operating. See, it looks at this global variable called sk_revents to figure out which events is poll() supposed to return. But the code was written assuming that there's a single socket on which you will poll for TX timestamps. And you have two, and configured differently, at that: on one you call sk_timestamping_init() and on the other you don't (or at least you don't mention that you do). _______________________________________________ Linuxptp-users mailing list Linuxptp-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-users