> -----Original Message-----
> From: Vladimir Oltean <olte...@gmail.com>
> Sent: Monday, December 16, 2019 3:10 PM
> To: richardcoch...@gmail.com
> Cc: linuxptp-devel@lists.sourceforge.net
> Subject: [Linuxptp-devel] [PATCH 0/3] More strict checking against kernel bugs
> 
> The reordering issue reported by me initially on the linuxptp-devel
> list [0] with the sja1105 DSA driver turned out not to be a reordering
> issue at all, in fact.
> 
> Due to a kernel bug described in this patch [1], the DSA driver was in
> race with the master Ethernet driver and would occasionally (very
> rarely) deliver 2 TX timestamps to ptp4l on the event socket.
> 
> The first TX timestamp is consumed in-band in the raw_send function.
> The second is caught by the main poll() syscall in the main ptp4l event
> loop - clock_poll().
> 
> When poll() sees the second TX-timestamped skb, it returns with revents
> == (POLLIN | POLLERR). But the main loop only checks for POLLIN, and
> says "yay, there's data!". So it proceeds to call recvmsg() with flags=0
> (instead of MSG_ERRQUEUE), so it doesn't see any data in
> sk->sk_receive_queue. So, surprise, false alarm, the data that woke it
> up was in sk->sk_error_queue. The ptp4l process goes to sleep waiting
> for data.
> 
> It sleeps for a whole sync interval.
> 
> When it wakes up, it wakes up with the next sync, even though the
> previous sync's follow-up may have arrived in the meantime.
> 
> Apparent reordering.
> 
> Ptp4l does not print anything, it just appears to freeze.
> 
> So this patch set aims to improves the error reporting in ptp4l, such
> that tracing back to the root cause is easier next time, and the problem
> does not blow up into other, completely unrelated things.
> 

Nice analysis.

Thanks,
Jake

> [0]: https://sourceforge.net/p/linuxptp/mailman/message/36773629/
> [1]: https://patchwork.ozlabs.org/patch/1210871/
> 
> Vladimir Oltean (3):
>   ptp4l: Call recvmsg() with the MSG_DONTWAIT flag
>   clock: Dump unexpected packets received on the error queues of sockets
>   port: Signal sync/follow-up mismatch events loudly
> 
>  clock.c | 11 +++++++++++
>  msg.c   | 12 ++++++++++++
>  msg.h   |  7 +++++++
>  port.c  | 21 +++++++++++++++++++++
>  raw.c   |  2 +-
>  udp.c   |  2 +-
>  udp6.c  |  2 +-
>  7 files changed, 54 insertions(+), 3 deletions(-)
> 
> --
> 2.17.1
> 
> 
> 
> _______________________________________________
> Linuxptp-devel mailing list
> Linuxptp-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


_______________________________________________
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Reply via email to