Hi All, Let me do a quick clean recap of this issue.
On a Debian arm64 system with a 5.0rc8 kernel using the macb driver on zynqmp, enabling tx timestamping (1) breaks networking! The first and most noticeable way is that you can no longer connect with ssh. This is a serious bug somewhere and merits some attention. Trying to debug ssh is a possibility, but I was trying to debug with something easier and thus the netcat testing. The specific issue can be seen in the following strace. In this setup nc just connects to a server and tries to send two packets (2). The first packet goes through fine, but the second doesn't because nc is stuck forever trying to read from the socket. pselect6(4, [0 3], NULL, NULL, NULL, NULL) = 1 (in [0]) <-- waiting on stdin and UDP sock read(0, "c1\n", 8192) = 3 <-- read three chars from stdin write(3, "c1\n", 3) = 3 <-- write those out on the UDP sock pselect6(4, [0 3], NULL, NULL, NULL, NULL) = 1 (in [3]) <-- waiting on stdin and UDP sock read(3, <-- waits forever here as there is no data to read I've been reading more, an old patch and the timestamping.txt doc helped me understand a little more of what's going on: https://lore.kernel.org/netdev/20130328211925.7644.15781.st...@jekeller-hub.jf.intel.com/ https://www.kernel.org/doc/Documentation/networking/timestamping.txt So it is clear that if the SO_SELECT_ERR_QUEUE flag is set then in fact the select should return, but it is not set in this case. I can see everything that is going on in datagram_poll() in datagram.c. The main difference being that in the broken case the mask is 0x30c and in the working case it is 0x304. The difference is EPOLLERR, which is there clearly in the code if !skb_queue_empty(&sk->sk_error_queue). Then in select.c POLLIN_SET includes EPOLLERR. It almost looks as if it's behaving as it should (except that things break). My first question is should the sk_error_queue be empty if there is a tx timestamp available (in datagram_poll() in datagram.c)? If it's not empty I don't see what else SO_SELECT_ERR_QUEUE flag is doing for the select() and I don't see what would be different about the macb/arm64 setup? Any insight here would be very much appreciated. thanks, Paul (1) hwstamp_ctl -i eth0 -t 1 (2) The actual script to be able to run nc and strace from a single serial console is slightly clever: (sleep 3; echo "c1"; sleep 1; echo "c2") | nc -u 10.1.155.100 9999 & strace -p $(ps -A | grep nc | awk '{print $1}') _______________________________________________ Linuxptp-devel mailing list Linuxptp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-devel