The application's main event loop (clock_poll) is woken up by poll() and dispatches the socket receive queue events to the corresponding ports as needed.
So it is a bug if poll() wakes up the process for data availability on a socket's receive queue, and then recvmsg(), called immediately afterwards, goes to sleep trying to retrieve it. This patch will generate an error that will be propagated to the user if this condition happens. Can it happen? As of this patch, ptp4l uses the SO_SELECT_ERR_QUEUE socket option, which means that poll() will wake the process up, with revents == (POLLIN | POLLERR), if data is available in the error queue. But clock_poll() does not check POLLERR, just POLLIN, and draws the wrong conclusion that there is data available in the receive queue (when it is in fact available in the error queue). When the above condition happens, recvmsg() will sleep typically for a whole sync interval waiting for data on the event socket, and will be woken up when the new real frame arrives. It will not dequeue follow-up messages during this time (which are sent to the general message socket) and when it does, it will already be late for them (their seqid will be out of order). So it will drop them and everything that comes after. The synchronization process will fail. The above condition shouldn't typically happen, but exceptional kernel events will trigger it. It helps to be strict in ptp4l in order for those events to not blow up in even stranger symptoms unrelated to the root cause of the problem. Signed-off-by: Vladimir Oltean <olte...@gmail.com> --- Changes in v2: None. raw.c | 2 +- udp.c | 2 +- udp6.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/raw.c b/raw.c index 15c97561066e..0bd15b08e0a2 100644 --- a/raw.c +++ b/raw.c @@ -279,7 +279,7 @@ static int raw_recv(struct transport *t, int fd, void *buf, int buflen, buflen += hlen; hdr = (struct eth_hdr *) ptr; - cnt = sk_receive(fd, ptr, buflen, addr, hwts, 0); + cnt = sk_receive(fd, ptr, buflen, addr, hwts, MSG_DONTWAIT); if (cnt >= 0) cnt -= hlen; diff --git a/udp.c b/udp.c index 36802fb67b74..826bd124deef 100644 --- a/udp.c +++ b/udp.c @@ -210,7 +210,7 @@ no_event: static int udp_recv(struct transport *t, int fd, void *buf, int buflen, struct address *addr, struct hw_timestamp *hwts) { - return sk_receive(fd, buf, buflen, addr, hwts, 0); + return sk_receive(fd, buf, buflen, addr, hwts, MSG_DONTWAIT); } static int udp_send(struct transport *t, struct fdarray *fda, diff --git a/udp6.c b/udp6.c index 744a5bc8adcb..ba5482e3d4c9 100644 --- a/udp6.c +++ b/udp6.c @@ -227,7 +227,7 @@ no_event: static int udp6_recv(struct transport *t, int fd, void *buf, int buflen, struct address *addr, struct hw_timestamp *hwts) { - return sk_receive(fd, buf, buflen, addr, hwts, 0); + return sk_receive(fd, buf, buflen, addr, hwts, MSG_DONTWAIT); } static int udp6_send(struct transport *t, struct fdarray *fda, -- 2.25.1 _______________________________________________ Linuxptp-devel mailing list Linuxptp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-devel