Hello, I observed a strange stall in a multi-threaded process (originally OpenLDAP slapd), which stalls during a call to getaddrinfo() after upgrading the kernel from 3.16.x to 4.1.12:
> #0 0x00007f8e6d97b87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82 > #1 0x00007f8e6d99bfbc in make_request (fd=12, pid=2871, seen_ipv4=<optimized > out>, seen_ipv6=<optimized out>, in6ai=<optimized out>, in6ailen=<optimized > out>) at ../sysdeps/unix/sysv/linux/check_pf.c:119 > #2 0x00007f8e6d99c42a in __check_pf (seen_ipv4=0x7f8e6a0969ff, > seen_ipv6=0x7f8e6a0969fe, in6ai=0x7f8e6a0969e0, in6ailen=0x7f8e6a0969d8) at > ../sysdeps/unix/sysv/linux/check_pf.c:272 > #3 0x00007f8e6d959007 in *__GI_getaddrinfo (name=0x7f8e6a096a80 "0.0.0.0", > service=0x400d04 "7636", hints=0x7f8e6a096a50, pai=0x7f8e6a096a48) at > ../sysdeps/posix/getaddrinfo.c:2389 (The glibc tries to get the routing table to see via netlink, if it should support IPv4 and/or IPv6. Neither <https://sourceware.org/bugzilla/show_bug.cgi?id=12926> nor <https://sourceware.org/git/?p=glibc.git;a=commit;h=636064eb4c03397c86aa26e489e68f952bd5e53f> fixed the problem.) I've been able to reproduce it by writing the attached program, which calls getaddrinfo() from 10 thread 10^6 times. On my 8-core Intel I see several threads hanging in the above call after some seconds - you don't have to wait for the process to terminate): > gdb --batch --ex 'set pagination off' --ex 'thread apply all bt' -p `pidof > a.out` | grep rec ... > #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82 > #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82 > #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82 > #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82 > #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82 > #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82 > #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82 > #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82 > #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82 Using "git bisect start v4.1.11 v.4.10" I've been able to identify 4e27762417669cb459971635be550eb7b5598286 (v4.1.9-26-g4e27762), which is a back-port of 1f770c0a09da855a2b51af6d19de97fb955eca85 (v4.2-11228-g1f770c0). I've attached the log. > # git log -1 4e27762417669cb459971635be550eb7b5598286 > commit 4e27762417669cb459971635be550eb7b5598286 > Author: Herbert Xu <herb...@gondor.apana.org.au> > Date: Fri Sep 18 19:16:50 2015 +0800 > > netlink: Fix autobind race condition that leads to zero port ID > > [ Upstream commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ] > > The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink: > Reset portid after netlink_insert failure") introduced a race > condition where if two threads try to autobind the same socket > one of them may end up with a zero port ID. This led to kernel > deadlocks that were observed by multiple people. > > This patch reverts that commit and instead fixes it by introducing > a separte rhash_portid variable so that the real portid is only set > after the socket has been successfully hashed. > > Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure") > Reported-by: Tejun Heo <t...@kernel.org> > Reported-by: Linus Torvalds <torva...@linux-foundation.org> > Signed-off-by: Herbert Xu <herb...@gondor.apana.org.au> > Signed-off-by: David S. Miller <da...@davemloft.net> > Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org> I verified 4e27762417669cb459971635be550eb7b5598286~1 does not show the problem. v4.1.13 still has the problem. > # git log --oneline v4.1..v4.1.13 -- net/netlink/af_netlink.c > e962218 netlink: Trim skb to alloc size to avoid MSG_TRUNC > d486236 netlink: Replace rhash_portid with bound > 4e27762 netlink: Fix autobind race condition that leads to zero port ID > 65d48c6 netlink, mmap: transform mmap skb into full skb on taps > d397617 netlink: make sure -EBUSY won't escape from netlink_insert > b265c30 netlink: don't hold mutex in rcu callback when releasing mmapd ring v4.3 does not show the problem. > # git log --oneline v4.2..v4.3 -- net/netlink/af_netlink.c > 47191d6 netlink: fix locking around NETLINK_LIST_MEMBERSHIPS > db65a3a netlink: Trim skb to alloc size to avoid MSG_TRUNC > da314c9 netlink: Replace rhash_portid with bound > 1f770c0 netlink: Fix autobind race condition that leads to zero port ID > 1853c94 netlink, mmap: transform mmap skb into full skb on taps > 6bb0fef netlink, mmap: fix edge-case leakages in nf queue zero-copy > a66e365 netlink, mmap: don't walk rx ring on poll if receive queue non-empty > 0ef7077 netlink: rx mmap: fix POLLIN condition > 7084a31 netlink: mmap: fix lookup frame position > 0a6a3a2 netlink: add NETLINK_CAP_ACK socket option v4.2 does not show the problem (as expected, as the patch isn't in 4.2.0 > # git log --oneline v4.2..v4.2.6 -- net/netlink/af_netlink.c > 326ce2c netlink: Trim skb to alloc size to avoid MSG_TRUNC > e2a3131 netlink: Replace rhash_portid with bound > 6e32e73 netlink: Fix autobind race condition that leads to zero port ID > 62f43b5 netlink, mmap: transform mmap skb into full skb on taps ring I no longer have an idea where to look next, as something outside of af_netlink.c seems to have fixed the issue for at least 4.3. Any idea? Kernel: 4.1.12 (Debian) GLIBC: 2.13-38+deb7u8 CPU: Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz Thank you and a (hopefully) happy weekend. Philipp -- Philipp Hahn Open Source Software Engineer Univention GmbH be open. Mary-Somerville-Str. 1 D-28359 Bremen Tel.: +49 421 22232-0 Fax : +49 421 22232-99 h...@univention.de http://www.univention.de/ Geschäftsführer: Peter H. Ganten HRB 20755 Amtsgericht Bremen Steuer-Nr.: 71-597-02876
/* * <https://forge.univention.org/bugzilla/show_bug.cgi?id=40059> */ #define _GNU_SOURCE #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <pthread.h> #include <grp.h> #include <string.h> #include <limits.h> #include <stdarg.h> #include <ctype.h> #include <unistd.h> #include <sys/types.h> #include <sys/socket.h> #include <netdb.h> #define ARRAY_SIZE(a) ((sizeof (a)) / sizeof (*(a))) #define BUFLEN 4096 // cyrus-sasl int check(void) { const char *addr = "0.0.0.0;7636"; char hbuf[NI_MAXHOST]; struct addrinfo hints, *ai = NULL; int i, j; /* Parse the address */ for (i = 0; addr[i] != '\0' && addr[i] != ';'; i++) { if (i >= NI_MAXHOST) return 3; hbuf[i] = addr[i]; } hbuf[i] = '\0'; if (addr[i] == ';') i++; /* XXX: Do we need this check? */ for (j = i; addr[j] != '\0'; j++) if (!isdigit((int)(addr[j]))) return 2; memset(&hints, 0, sizeof(hints)); hints.ai_family = PF_UNSPEC; hints.ai_socktype = SOCK_STREAM; hints.ai_flags = AI_PASSIVE | AI_NUMERICHOST; if (getaddrinfo(hbuf, &addr[i], &hints, &ai) != 0) return 1; freeaddrinfo(ai); return 0; } static pthread_t thread[10]; static void *run(void *data) { unsigned int round, j; unsigned int self = (unsigned int)(unsigned long)data; pthread_t tid = pthread_self(); printf("[%d] POSIX thread ID is %u\n", self, (unsigned int)tid); for (round = 0; round < 1000; round++) { for (j = 0; j < 1000; j++) { int i; i = check(); if (i) break; } printf("%d\n", round); unsigned int usecs = (round * (unsigned int)tid) % 1000000; usleep(usecs); if (round % 10 == 0) putchar('0' + (self % 10)); } printf("[%d] end\n", self); pthread_exit(NULL); } int main(int argc, char *argv[], char *envp[]) { pid_t pid = getpid(); printf("pid=%d\n", pid); unsigned int t; for (t = 0; t < ARRAY_SIZE(thread); t++) pthread_create(&thread[t], NULL, run, (void *)(long)t); int ret = 0; void *value_ptr; for (t = 0; t < ARRAY_SIZE(thread); t++) ret |= pthread_join(thread[t], &value_ptr); return ret; } /* :!gcc -O0 -g -Wall -lpthread % */
git bisect start # bad: [205a8514e63a221c3a5071f27521013444e43e5f] Linux 4.1.11 git bisect bad 205a8514e63a221c3a5071f27521013444e43e5f # good: [b953c0d234bc72e8489d3bf51a276c5c4ec85345] Linux 4.1 git bisect good b953c0d234bc72e8489d3bf51a276c5c4ec85345 # good: [4b6bf170e83e666974ce00d7251dd0d4b5ca5439] ocfs2: fix shift left overflow git bisect good 4b6bf170e83e666974ce00d7251dd0d4b5ca5439 # good: [0796f55ba4db268f3a6a6c6bb9e481ce9eaf5d48] ideapad-laptop: Add Lenovo Yoga 3 14 to no_hw_rfkill dmi list git bisect good 0796f55ba4db268f3a6a6c6bb9e481ce9eaf5d48 # bad: [dc7a3d707c9784f26048d6f059e7467a8a92e04b] spi: Fix documentation of spi_alloc_master() git bisect bad dc7a3d707c9784f26048d6f059e7467a8a92e04b # good: [8bb9225a7bd3a07aadd58b19a633f1f7a8884d01] net: eth: altera: fix napi poll_list corruption git bisect good 8bb9225a7bd3a07aadd58b19a633f1f7a8884d01 # bad: [b6ac3aee1c73d5b107ae6ec6e3715b39db782781] iser-target: Put the reference on commands waiting for unsol data git bisect bad b6ac3aee1c73d5b107ae6ec6e3715b39db782781 # bad: [4e27762417669cb459971635be550eb7b5598286] netlink: Fix autobind race condition that leads to zero port ID git bisect bad 4e27762417669cb459971635be550eb7b5598286 # good: [cf9cf6bc253c5d83a8cb593d4ed7042bb4f04e7c] macvtap: fix TUNSETSNDBUF values > 64k git bisect good cf9cf6bc253c5d83a8cb593d4ed7042bb4f04e7c # good: [fce134644303a4ebe0892edc9af7d46e0ebda9ab] fib_rules: fix fib rule dumps across multiple skbs git bisect good fce134644303a4ebe0892edc9af7d46e0ebda9ab # good: [282117acdfac7b5d621b18f002a9751d7f583daf] net: phy: fixed_phy: handle link-down case git bisect good 282117acdfac7b5d621b18f002a9751d7f583daf # good: [d600176461e85a0bf3b963587e4ca92d494a7a47] mvneta: use inband status only when explicitly enabled git bisect good d600176461e85a0bf3b963587e4ca92d494a7a47