Package: libtirpc3 Version: 1.1.4-0.4 Severity: normal Tags: patch Dear Maintainer,
My NIS setup stops working occasionally. The clients rely on subnet broadcast CALLIT requests to locate the NIS servers. The rpcbind process on the NIS server sees the requests but fails to send the reply. The strace output looks like this: recvmsg(6, {msg_name={sa_family=AF_INET, sin_port=htons(800), sin_addr=inet_addr("172.27.5.162")}, msg_namelen=128->16, msg_iov=[{iov_base="\0\2:\270\0\0\0\0\0\0\0\2\0\1\206\240\0\0\0\2\0\0\0\5\0\0\0\0\0\0\0\0"..., iov_len=9000}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("eth0"), ipi_spec_dst=inet_addr("172.27.8.1"), ipi_addr=inet_addr("172.27.63.255")}}], msg_controllen=32, msg_flags=0}, 0) = 64 ... sendmsg(7, {msg_name={sa_family=AF_INET, sin_port=htons(800), sin_addr=inet_addr("172.27.5.162")}, msg_namelen=16, msg_iov=[{iov_base="\0\2:\270\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\371\0\0\0\4"..., iov_len=36}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=0, ipi_spec_dst=inet_addr("127.0.0.1"), ipi_addr=inet_addr("127.0.0.1")}}], msg_controllen=32, msg_flags=0}, 0) = -1 EINVAL (Invalid argument) ... where I think the active ingredient is that ipi_spec_dst or ipi_addr is 127.0.0.1 rather than the 172.27.5.162 intended reply address. Once an rpcbind process has got into this state, it doesn't recover without being restarted. rpcbind is calling svc_sendreply on the xprt it created in create_rmtcall_fd, which isn't where the request originated. That calls svc_dg_reply which assumes: /* cmsg already set in svc_dg_recv */ ... as of: https://git.linux-nfs.org/?p=steved/libtirpc.git;a=commit;h=74ef3df0236c55185225c62fba34953f2582da72 (Try to ensure datagram replies come from the address requests were sent to.) That's been zero-initialized, so everything works fine until a port scan or some such sends a datagram to the same port. Its IP_PKTINFO gets remembered and used on every subsequent reply. I can demonstrate the problem without needing NIS. First enable remote call support. On Debian, that can be done with: sudo tee --append /etc/default/rpcbind <<EOF OPTIONS="$OPTIONS -r" EOF sudo /etc/init.d/rpcbind restart Then rpcinfo can be used to call the null request of the portmap program: rpcinfo -b 100000 4 When everything's working, this returns results for me like: 192.168.1.69.0.111 shuttle 192.168.1.69.0.111 shuttle fe80::82ee:73ff:fed9:23d5.0.111 shuttle fe80::ccbf:3f03:7b34:c5a1.0.111 shuttle 192.168.1.69.0.111 shuttle 192.168.1.69.0.111 shuttle fe80::82ee:73ff:fed9:23d5.0.111 shuttle fe80::ccbf:3f03:7b34:c5a1.0.111 shuttle Then find the ephemeral port rpcbind is using for remote calls, with eg: sudo lsof -p $(pidof rpcbind) The last two ports mentioned are the ones for me: rpcbind 7610 _rpc 10u IPv4 124008328 0t0 UDP *:55138 rpcbind 7610 _rpc 11u IPv6 124008329 0t0 UDP *:61854 I saw more convincing results with IPv6, so picking that last port number, I send it any old UDP request: ruby -we 'require "socket"; so = UDPSocket.new(Socket::AF_INET6); so.send("", 0, "::1", 61854)' Repeating my rpcinfo command, I then see just IPv4: 192.168.1.69.0.111 shuttle 192.168.1.69.0.111 shuttle 192.168.1.69.0.111 shuttle 192.168.1.69.0.111 shuttle By zeroing the msg_control information after the reply is sent I can make rpcbind recover after sending just one reply astray, making the output like this: 192.168.1.69.0.111 shuttle 192.168.1.69.0.111 shuttle fe80::ccbf:3f03:7b34:c5a1.0.111 shuttle 192.168.1.69.0.111 shuttle 192.168.1.69.0.111 shuttle fe80::82ee:73ff:fed9:23d5.0.111 shuttle fe80::ccbf:3f03:7b34:c5a1.0.111 shuttle That lost reply is distasteful but at least I haven't broken the intent of the "Try to ensure" patch. -- System Information: Debian Release: 10.10 APT prefers oldstable-proposed-updates-debug APT policy: (500, 'oldstable-proposed-updates-debug'), (500, 'oldstable-debug'), (500, 'oldstable') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.19.0-16-amd64 (SMP w/8 CPU cores) Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages libtirpc3:amd64 depends on: ii libc6 2.28-10 ii libcom-err2 1.44.5-1+deb10u3 ii libgssapi-krb5-2 1.17-3+deb10u2 ii libk5crypto3 1.17-3+deb10u2 ii libkrb5-3 1.17-3+deb10u2 ii libtirpc-common 1.1.4-0.4 libtirpc3:amd64 recommends no packages. libtirpc3:amd64 suggests no packages. -- no debconf information
--- src/svc_dg.c.orig 2021-09-19 18:24:32.462610751 -0700 +++ src/svc_dg.c 2021-09-19 18:14:52.271066229 -0700 @@ -278,6 +278,8 @@ if (su->su_cache) cache_set(xprt, slen); } + msg->msg_control = NULL; + msg->msg_controllen = 0; } return (stat); }