The application's main event loop (clock_poll) is woken up by poll() and
dispatches the socket receive queue events to the corresponding ports as
needed.

So it is a bug if poll() wakes up the process for data availability on a
socket's receive queue, and then recvmsg(), called immediately
afterwards, goes to sleep trying to retrieve it. This patch will
generate an error that will be propagated to the user if this condition
happens.

Can it happen?

As of this patch, ptp4l uses the SO_SELECT_ERR_QUEUE socket option,
which means that poll() will wake the process up, with revents ==
(POLLIN | POLLERR), if data is available in the error queue. But
clock_poll() does not check POLLERR, just POLLIN, and draws the wrong
conclusion that there is data available in the receive queue (when it is
in fact available in the error queue).

When the above condition happens, recvmsg() will sleep typically for a
whole sync interval waiting for data on the event socket, and will be
woken up when the new real frame arrives. It will not dequeue follow-up
messages during this time (which are sent to the general message socket)
and when it does, it will already be late for them (their seqid will be
out of order). So it will drop them and everything that comes after. The
synchronization process will fail.

The above condition shouldn't typically happen, but exceptional kernel
events will trigger it. It helps to be strict in ptp4l in order for
those events to not blow up in even stranger symptoms unrelated to the
root cause of the problem.

Signed-off-by: Vladimir Oltean <olte...@gmail.com>
---
Changes in v2:
None.

 raw.c  | 2 +-
 udp.c  | 2 +-
 udp6.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/raw.c b/raw.c
index 15c97561066e..0bd15b08e0a2 100644
--- a/raw.c
+++ b/raw.c
@@ -279,7 +279,7 @@ static int raw_recv(struct transport *t, int fd, void *buf, 
int buflen,
        buflen += hlen;
        hdr = (struct eth_hdr *) ptr;
 
-       cnt = sk_receive(fd, ptr, buflen, addr, hwts, 0);
+       cnt = sk_receive(fd, ptr, buflen, addr, hwts, MSG_DONTWAIT);
 
        if (cnt >= 0)
                cnt -= hlen;
diff --git a/udp.c b/udp.c
index 36802fb67b74..826bd124deef 100644
--- a/udp.c
+++ b/udp.c
@@ -210,7 +210,7 @@ no_event:
 static int udp_recv(struct transport *t, int fd, void *buf, int buflen,
                    struct address *addr, struct hw_timestamp *hwts)
 {
-       return sk_receive(fd, buf, buflen, addr, hwts, 0);
+       return sk_receive(fd, buf, buflen, addr, hwts, MSG_DONTWAIT);
 }
 
 static int udp_send(struct transport *t, struct fdarray *fda,
diff --git a/udp6.c b/udp6.c
index 744a5bc8adcb..ba5482e3d4c9 100644
--- a/udp6.c
+++ b/udp6.c
@@ -227,7 +227,7 @@ no_event:
 static int udp6_recv(struct transport *t, int fd, void *buf, int buflen,
                     struct address *addr, struct hw_timestamp *hwts)
 {
-       return sk_receive(fd, buf, buflen, addr, hwts, 0);
+       return sk_receive(fd, buf, buflen, addr, hwts, MSG_DONTWAIT);
 }
 
 static int udp6_send(struct transport *t, struct fdarray *fda,
-- 
2.25.1



_______________________________________________
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Reply via email to