Hello,

I observed a strange stall in a multi-threaded process (originally
OpenLDAP slapd), which stalls during a call to getaddrinfo() after
upgrading the kernel from 3.16.x to 4.1.12:

> #0  0x00007f8e6d97b87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #1  0x00007f8e6d99bfbc in make_request (fd=12, pid=2871, seen_ipv4=<optimized 
> out>, seen_ipv6=<optimized out>, in6ai=<optimized out>, in6ailen=<optimized 
> out>) at ../sysdeps/unix/sysv/linux/check_pf.c:119
> #2  0x00007f8e6d99c42a in __check_pf (seen_ipv4=0x7f8e6a0969ff, 
> seen_ipv6=0x7f8e6a0969fe, in6ai=0x7f8e6a0969e0, in6ailen=0x7f8e6a0969d8) at 
> ../sysdeps/unix/sysv/linux/check_pf.c:272
> #3  0x00007f8e6d959007 in *__GI_getaddrinfo (name=0x7f8e6a096a80 "0.0.0.0", 
> service=0x400d04 "7636", hints=0x7f8e6a096a50, pai=0x7f8e6a096a48) at 
> ../sysdeps/posix/getaddrinfo.c:2389

(The glibc tries to get the routing table to see via netlink, if it
should support IPv4 and/or IPv6. Neither
<https://sourceware.org/bugzilla/show_bug.cgi?id=12926> nor
<https://sourceware.org/git/?p=glibc.git;a=commit;h=636064eb4c03397c86aa26e489e68f952bd5e53f>
fixed the problem.)

I've been able to reproduce it by writing the attached program, which
calls getaddrinfo() from 10 thread 10^6 times. On my 8-core Intel I see
several threads hanging in the above call after some seconds - you don't
have to wait for the process to terminate):

> gdb --batch --ex 'set pagination off' --ex 'thread apply all bt' -p `pidof 
> a.out` | grep rec
...
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82

Using "git bisect start v4.1.11 v.4.10" I've been able to identify
4e27762417669cb459971635be550eb7b5598286 (v4.1.9-26-g4e27762), which is
a back-port of 1f770c0a09da855a2b51af6d19de97fb955eca85
(v4.2-11228-g1f770c0). I've attached the log.

> # git log -1 4e27762417669cb459971635be550eb7b5598286
> commit 4e27762417669cb459971635be550eb7b5598286
> Author: Herbert Xu <herb...@gondor.apana.org.au>
> Date:   Fri Sep 18 19:16:50 2015 +0800
> 
>     netlink: Fix autobind race condition that leads to zero port ID
>     
>     [ Upstream commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ]
>     
>     The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink:
>     Reset portid after netlink_insert failure") introduced a race
>     condition where if two threads try to autobind the same socket
>     one of them may end up with a zero port ID.  This led to kernel
>     deadlocks that were observed by multiple people.
>     
>     This patch reverts that commit and instead fixes it by introducing
>     a separte rhash_portid variable so that the real portid is only set
>     after the socket has been successfully hashed.
>     
>     Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure")
>     Reported-by: Tejun Heo <t...@kernel.org>
>     Reported-by: Linus Torvalds <torva...@linux-foundation.org>
>     Signed-off-by: Herbert Xu <herb...@gondor.apana.org.au>
>     Signed-off-by: David S. Miller <da...@davemloft.net>
>     Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

I verified 4e27762417669cb459971635be550eb7b5598286~1 does not show the
problem.

v4.1.13 still has the problem.
> # git log --oneline  v4.1..v4.1.13 -- net/netlink/af_netlink.c
> e962218 netlink: Trim skb to alloc size to avoid MSG_TRUNC
> d486236 netlink: Replace rhash_portid with bound
> 4e27762 netlink: Fix autobind race condition that leads to zero port ID
> 65d48c6 netlink, mmap: transform mmap skb into full skb on taps
> d397617 netlink: make sure -EBUSY won't escape from netlink_insert
> b265c30 netlink: don't hold mutex in rcu callback when releasing mmapd ring

v4.3 does not show the problem.
> # git log --oneline  v4.2..v4.3 -- net/netlink/af_netlink.c
> 47191d6 netlink: fix locking around NETLINK_LIST_MEMBERSHIPS
> db65a3a netlink: Trim skb to alloc size to avoid MSG_TRUNC
> da314c9 netlink: Replace rhash_portid with bound
> 1f770c0 netlink: Fix autobind race condition that leads to zero port ID
> 1853c94 netlink, mmap: transform mmap skb into full skb on taps
> 6bb0fef netlink, mmap: fix edge-case leakages in nf queue zero-copy
> a66e365 netlink, mmap: don't walk rx ring on poll if receive queue non-empty
> 0ef7077 netlink: rx mmap: fix POLLIN condition
> 7084a31 netlink: mmap: fix lookup frame position
> 0a6a3a2 netlink: add NETLINK_CAP_ACK socket option

v4.2 does not show the problem (as expected, as the patch isn't in 4.2.0
> # git log --oneline  v4.2..v4.2.6 -- net/netlink/af_netlink.c
> 326ce2c netlink: Trim skb to alloc size to avoid MSG_TRUNC
> e2a3131 netlink: Replace rhash_portid with bound
> 6e32e73 netlink: Fix autobind race condition that leads to zero port ID
> 62f43b5 netlink, mmap: transform mmap skb into full skb on taps ring

I no longer have an idea where to look next, as something outside of
af_netlink.c seems to have fixed the issue for at least 4.3. Any idea?

Kernel: 4.1.12 (Debian)
GLIBC: 2.13-38+deb7u8
CPU: Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz

Thank you and a (hopefully) happy weekend.
Philipp
-- 
Philipp Hahn
Open Source Software Engineer

Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
Tel.: +49 421 22232-0
Fax : +49 421 22232-99
h...@univention.de

http://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-02876
/*
 * <https://forge.univention.org/bugzilla/show_bug.cgi?id=40059>
 */
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <grp.h>
#include <string.h>
#include <limits.h>
#include <stdarg.h>
#include <ctype.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>

#define ARRAY_SIZE(a) ((sizeof (a)) / sizeof (*(a)))
#define BUFLEN 4096

// cyrus-sasl
int check(void) {
	const char *addr = "0.0.0.0;7636";
	char hbuf[NI_MAXHOST];
	struct addrinfo hints, *ai = NULL;
	int i, j;

	/* Parse the address */
	for (i = 0; addr[i] != '\0' && addr[i] != ';'; i++) {
		if (i >= NI_MAXHOST)
			return 3;
		hbuf[i] = addr[i];
	}
	hbuf[i] = '\0';

	if (addr[i] == ';')
		i++;
	/* XXX: Do we need this check? */
	for (j = i; addr[j] != '\0'; j++)
		if (!isdigit((int)(addr[j])))
			return 2;

	memset(&hints, 0, sizeof(hints));
	hints.ai_family = PF_UNSPEC;
	hints.ai_socktype = SOCK_STREAM;
	hints.ai_flags = AI_PASSIVE | AI_NUMERICHOST;
	if (getaddrinfo(hbuf, &addr[i], &hints, &ai) != 0)
		return 1;

	freeaddrinfo(ai);

	return 0;
}

static pthread_t thread[10];

static void *run(void *data) {
	unsigned int round, j;

	unsigned int self = (unsigned int)(unsigned long)data;
	pthread_t tid = pthread_self();

	printf("[%d] POSIX thread ID is %u\n", self, (unsigned int)tid);

	for (round = 0; round < 1000; round++) {
		for (j = 0; j < 1000; j++) {
			int i;
			i = check();
			if (i)
				break;
		}
		printf("%d\n", round);

		unsigned int usecs = (round * (unsigned int)tid) % 1000000;
		usleep(usecs);
		if (round % 10 == 0)
			putchar('0' + (self % 10));
	}

	printf("[%d] end\n", self);
	pthread_exit(NULL);
}

int main(int argc, char *argv[], char *envp[]) {
	pid_t pid = getpid();
	printf("pid=%d\n", pid);

	unsigned int t;
	for (t = 0; t < ARRAY_SIZE(thread); t++)
		pthread_create(&thread[t], NULL, run, (void *)(long)t);

	int ret = 0;
	void *value_ptr;
	for (t = 0; t < ARRAY_SIZE(thread); t++)
		ret |= pthread_join(thread[t], &value_ptr);

	return ret;
}
/* :!gcc -O0 -g -Wall -lpthread % */
git bisect start
# bad: [205a8514e63a221c3a5071f27521013444e43e5f] Linux 4.1.11
git bisect bad 205a8514e63a221c3a5071f27521013444e43e5f
# good: [b953c0d234bc72e8489d3bf51a276c5c4ec85345] Linux 4.1
git bisect good b953c0d234bc72e8489d3bf51a276c5c4ec85345
# good: [4b6bf170e83e666974ce00d7251dd0d4b5ca5439] ocfs2: fix shift left 
overflow
git bisect good 4b6bf170e83e666974ce00d7251dd0d4b5ca5439
# good: [0796f55ba4db268f3a6a6c6bb9e481ce9eaf5d48] ideapad-laptop: Add Lenovo 
Yoga 3 14 to no_hw_rfkill dmi list
git bisect good 0796f55ba4db268f3a6a6c6bb9e481ce9eaf5d48
# bad: [dc7a3d707c9784f26048d6f059e7467a8a92e04b] spi: Fix documentation of 
spi_alloc_master()
git bisect bad dc7a3d707c9784f26048d6f059e7467a8a92e04b
# good: [8bb9225a7bd3a07aadd58b19a633f1f7a8884d01] net: eth: altera: fix napi 
poll_list corruption
git bisect good 8bb9225a7bd3a07aadd58b19a633f1f7a8884d01
# bad: [b6ac3aee1c73d5b107ae6ec6e3715b39db782781] iser-target: Put the 
reference on commands waiting for unsol data
git bisect bad b6ac3aee1c73d5b107ae6ec6e3715b39db782781
# bad: [4e27762417669cb459971635be550eb7b5598286] netlink: Fix autobind race 
condition that leads to zero port ID
git bisect bad 4e27762417669cb459971635be550eb7b5598286
# good: [cf9cf6bc253c5d83a8cb593d4ed7042bb4f04e7c] macvtap: fix TUNSETSNDBUF 
values > 64k
git bisect good cf9cf6bc253c5d83a8cb593d4ed7042bb4f04e7c
# good: [fce134644303a4ebe0892edc9af7d46e0ebda9ab] fib_rules: fix fib rule 
dumps across multiple skbs
git bisect good fce134644303a4ebe0892edc9af7d46e0ebda9ab
# good: [282117acdfac7b5d621b18f002a9751d7f583daf] net: phy: fixed_phy: handle 
link-down case
git bisect good 282117acdfac7b5d621b18f002a9751d7f583daf
# good: [d600176461e85a0bf3b963587e4ca92d494a7a47] mvneta: use inband status 
only when explicitly enabled
git bisect good d600176461e85a0bf3b963587e4ca92d494a7a47

Reply via email to