Package: qemu-system-x86 Version: 1:7.2+dfsg-7+deb12u5 Severity: normal X-Debbugs-Cc: g...@libero.it
I believe I spotted a race condition in virtio-net or qemu/kvm (but only when virtio-net is involved). To replicate, one needs a virtualization environment similar to Host: - debian 12 x86_64 - caching name server listening on 127.0.0.1 Guest: - linux/musl or linux/glibc or freebsd or openbsd - kvm acceleration - virtio netdev, configured in (default) user-mode - /etc/resolv.conf: nameserver 10.0.2.2 i.e. the caching dns in the host nameserver 192.168.1.123 non existent and run the attached program in the guest. The program opens a UDP socket, sends out a bunch of (dns) requests, poll()s on the socket, and then receives the responses. If a delay is inserted between the sendto() calls, the (unique) response from the host is received correctly: $ ./a.out 10.0.2.2 >/dev/null # to warm up the host cache $ ./a.out 10.0.2.2 delay 192.168.1.123 poll: 1 1 1 recvfrom() 45 <response packet> recvfrom() -1 If the sento()s are performed in short order, the response packet gets lost: $ ./a.out 10.0.2.2 192.168.1.123 poll: 0 1 0 recvfrom() -1 recvfrom() -1 A tcpdump capture on the host side shows no difference between the two cases. Tcpdump on the guest side is another story: in the good case, it looks like this 7:32:44.332 IP 10.0.2.15.43276 > 10.0.2.2.53: 33452+ A? example.com. (29) 7:32:44.333 IP 10.0.2.2.53 > 10.0.2.15.43276: 33452 1/0/0 A 93.184.216.34 (45) 7:32:44.349 IP 10.0.2.15.43276 > 192.168.1.123.53: 33452+ A? example.com. (29) while in the bad case it looks like this 7:32:55.358 IP 10.0.2.15.46537 > 10.0.2.2.53: 33452+ A? example.com. (29) 7:32:55.358 IP 10.0.2.15.46537 > 192.168.1.123.53: 33452+ A? example.com. (29) 7:32:55.358 IP *127.0.0.1*.53 > 10.0.2.15.46537: 33452 1/0/0 A 93.184.216.34 (45) where the response packet has wrong src ip. Looks like a failure of the NAT layer, but it does not happen when the guest uses another emulated network driver: don't know whether it's because the relevant code is in virtio-net or because other drivers add overhead that masks the issue. There's nothing special in port 53: I was just investigating a weird failure in name resolution in a MUSL based guest (https://www.openwall.com/lists/musl/2024/02/17/3) and wrote the program to mimic MUSL resolver's behaviour. But it succeeds/fails consistently with a different port, and in all guests I tried (as long as the emulated network device is virtio-net). To see the issue, it's important that the response to the first request is so fast that it's simultaneous with the second request. Best regards, g.b. -- System Information: Debian Release: 12.5 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 6.1.0-18-amd64 (SMP w/4 CPU threads; PREEMPT) Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: sysvinit (via /sbin/init) LSM: AppArmor: enabled Versions of packages qemu-system-x86 depends on: ii ipxe-qemu 1.0.0+git-20190125.36a4c85-5.1 ii libaio1 0.3.113-4 ii libbpf1 1:1.1.0-1 ii libc6 2.36-9+deb12u4 ii libcapstone4 4.0.2-5 ii libfdt1 1.6.1-4+b1 ii libfuse3-3 3.14.0-4 ii libgcc-s1 12.2.0-14 ii libglib2.0-0 2.74.6-2 ii libgmp10 2:6.2.1+dfsg1-1.1 ii libgnutls30 3.7.9-2+deb12u2 ii libhogweed6 3.8.1-2 ii libibverbs1 44.0-2 ii libjpeg62-turbo 1:2.1.5-2 ii libnettle8 3.8.1-2 ii libnuma1 2.0.16-1 ii libpixman-1-0 0.42.2-1 ii libpmem1 1.12.1-2 ii libpng16-16 1.6.39-2 ii librdmacm1 44.0-2 ii libsasl2-2 2.1.28+dfsg-10 ii libseccomp2 2.5.4-1+b3 ii libslirp0 4.7.0-1 ii libudev1 252.22-1~deb12u1 ii liburing2 2.3-3 ii libvdeplug2 4.0.1-4 ii libzstd1 1.5.4+dfsg2-5 ii qemu-system-common 1:7.2+dfsg-7+deb12u5 ii qemu-system-data 1:7.2+dfsg-7+deb12u5 ii seabios 1.16.2-1 ii zlib1g 1:1.2.13.dfsg-1 Versions of packages qemu-system-x86 recommends: ii ovmf 2022.11-6+deb12u1 pn qemu-block-extra <none> ii qemu-system-gui 1:7.2+dfsg-7+deb12u5 ii qemu-utils 1:7.2+dfsg-7+deb12u5 Versions of packages qemu-system-x86 suggests: pn samba <none> pn vde2 <none> -- no debconf information
#include <stdio.h> #include <time.h> #include <poll.h> #include <assert.h> #include <string.h> #include <arpa/inet.h> #include <netdb.h> #include <netinet/in.h> #include <sys/socket.h> #include <sys/socket.h> #include <sys/types.h> static void dump(const char *s, size_t len) { while (len--) { char t = *s++; if (' ' <= t && t <= '~' && t != '\\') printf("%c", t); else printf("\\%o", t & 0xff); } printf("\n"); } int main(int argc, char *argv[]) { int sock, rv, n; const char req[] = "\202\254\1\0\0\1\0\0\0\0\0\0\7example\3com\0\0\1\0\1"; struct timespec delay_l = { 1, 0 }; /* 1 sec */ struct pollfd pfs; struct sockaddr_in me = { 0 }; sock = socket(AF_INET, SOCK_DGRAM | SOCK_CLOEXEC | SOCK_NONBLOCK, IPPROTO_IP); assert(sock >= 0); me.sin_family = AF_INET; me.sin_port = 0; me.sin_addr.s_addr = inet_addr("0.0.0.0"); rv = bind(sock, (struct sockaddr *) &me, sizeof me); assert(0 == rv); for (n = 1; n < argc; n++) { if (0 == strcmp("delay", argv[n])) { struct timespec delay_s = { 0, (1 << 24) }; /* ~ 16 msec */ nanosleep(&delay_s, NULL); } else { struct sockaddr_in dst = { 0 }; dst.sin_family = AF_INET; dst.sin_port = htons(53); dst.sin_addr.s_addr = inet_addr(argv[n]); rv = sendto(sock, req, sizeof req - 1, MSG_NOSIGNAL, (struct sockaddr *) &dst, sizeof dst); assert(rv >= 0); } } nanosleep(&delay_l, NULL); pfs.fd = sock; pfs.events = POLLIN; rv = poll(&pfs, 1, 2000); printf("poll: %d %d %d\n", rv, pfs.events, pfs.revents); for (n = 1; n < argc; n++) { char resp[4000]; if (0 == strcmp("delay", argv[n])) continue; rv = recvfrom(sock, resp, sizeof resp, 0, NULL, NULL); printf("recvfrom() %d\n", rv); if (rv > 0) dump(resp, rv); } return 0; }