I just wanted to share the results of an investigation on TX timestamp timeout problems my project has been experiencing. The tl;dr is that the pfifo_fast qdisc suffers from data ordering bugs which can cause outgoing packets to get forgotten / stuck in the outgoing buffer. Those may have been fixed in newer kernel versions, but my solution was to switch to the pfifo qdisc which uses much less sophisticated codepaths.
Our system: - uses multi-TX-queue ethernet controllers (mostly Intel) - has a many core NUMA architecture What we found was that the ethernet driver and hardware was essentially never at fault. Instead, packets were getting stuck in the qdisc layer. Due to the consistent hashing load balancing scheme used by default in Linux for multiple TX queues, it's possible to have a busy link where one queue has almost all the traffic, and other queues are either idle or nearly idle. The issue is that sometimes Linux will forget that there's a packet to transmit in the qdisc. For a queue that is relatively busy, packets can experience little to no delay in transmission. But for idle or nearly idle queues, the packet can get stuck until another packet is sent on the same queue. After discovering this, I found a series of memory ordering bugs over the last few years involving "lockless" qdiscs. Here is just one example: https://lore.kernel.org/all/20220528101628.120193-1-gjf...@linux.alibaba.com/ I'm not sure exactly which one we were experiencing, but the fact that as recently as a month or two ago there have been new bugs fixed does not give me confidence that all of the bugs are now fixed. So instead I switched the default qdisc from pfifo_fast to pfifo, which is not a lockless qdisc, using sysctl.conf. This seems to have resolved nearly all the timeouts (from ~200 per day to ~1 per 3 days). There is still that occasional blip that seems attributable to the queueing layer; I'll have to do more investigation to get to the bottom of that one. Hope this helps! -cliff _______________________________________________ Linuxptp-users mailing list Linuxptp-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-users