Re: Serious reproducible 2.4.x kernel hang
Hi, What is related in /proc w.r.t. sysrq? Kees - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
Hi, What is related in /proc w.r.t. sysrq? Kees - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch]Re: Serious reproducible 2.4.x kernel hang
Prasanna P Subash writes: > I looked at the skb_recv_datagram code and noticed that wait_for_packet is not > returning an error, even while trying to read a closed socket. > Anyways here is a patch against 2.4.1 that will fix the issue. > Please feel free to flame me about the patch :) Please read the rest of today's postings, Alexey Kuznetsov already posted the correct fix, which I'm attached below: diff -u --recursive --new-file --exclude=CVS --exclude=.cvsignore vanilla/linux/net/core/datagram.c linux/net/core/datagram.c --- vanilla/linux/net/core/datagram.c Sat Nov 11 19:02:40 2000 +++ linux/net/core/datagram.c Thu Feb 1 17:15:12 2001 @@ -72,19 +73,19 @@ /* Socket errors? */ error = sock_error(sk); if (error) - goto out; + goto out_err; if (!skb_queue_empty(>receive_queue)) goto ready; /* Socket shut down? */ if (sk->shutdown & RCV_SHUTDOWN) - goto out; + goto out_noerr; /* Sequenced packets can come disconnected. If so we report the problem */ error = -ENOTCONN; if(connection_based(sk) && !(sk->state==TCP_ESTABLISHED || sk->state==TCP_LISTEN)) - goto out; + goto out_err; /* handle signals */ if (signal_pending(current)) @@ -99,11 +100,16 @@ interrupted: error = sock_intr_errno(*timeo_p); +out_err: + *err = error; out: current->state = TASK_RUNNING; remove_wait_queue(sk->sleep, ); - *err = error; return error; +out_noerr: + *err = 0; + error = 1; + goto out; } /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[Patch]Re: Serious reproducible 2.4.x kernel hang
> #include > #include > #include > #include > > int > main(int argc, const char* argv[]) > { > int retval; > int sockets[2]; > char buf[1]; > > retval = socketpair(PF_UNIX, SOCK_DGRAM, 0, sockets); > if (retval != 0) > { > perror("socketpair"); > exit(1); > } > shutdown(sockets[0], SHUT_RDWR); > read(sockets[0], buf, 1); > } I tried to debug this issue with the kdb on 2.4.1-pre7. Here is the stack trace mcount+0x1f9 wait_for_packet+0x13 skb_recv_datagram+0xbb unix_dgram_recvmsg+0x53 sock_recvmsg+0x41 sock_read+0x8f sys_read+0xa4 system_call+0x3c I looked at the skb_recv_datagram code and noticed that wait_for_packet is not returning an error, even while trying to read a closed socket. Anyways here is a patch against 2.4.1 that will fix the issue. Please feel free to flame me about the patch :) thanks -- Prasanna Subash --- [EMAIL PROTECTED] --- TurboLinux, INC Linux, the choice | Q: How do you keep a moron in suspense? of a GNU generation -o) | Kernel 2.2.16 /\\ | on a i686_\\_v | | --- 2.4.1/net/core/datagram.c Fri Feb 2 01:00:10 2001 +++ linux/net/core/datagram.c Fri Feb 2 01:06:59 2001 @@ -74,15 +74,15 @@ if (error) goto out; - if (!skb_queue_empty(>receive_queue)) - goto ready; - + error = -ENOTCONN; /* Socket shut down? */ if (sk->shutdown & RCV_SHUTDOWN) goto out; + if (!skb_queue_empty(>receive_queue)) + goto ready; + /* Sequenced packets can come disconnected. If so we report the problem */ - error = -ENOTCONN; if(connection_based(sk) && !(sk->state==TCP_ESTABLISHED || sk->state==TCP_LISTEN)) goto out; PGP signature
[Patch]Re: Serious reproducible 2.4.x kernel hang
#include stdio.h #include unistd.h #include sys/types.h #include sys/socket.h int main(int argc, const char* argv[]) { int retval; int sockets[2]; char buf[1]; retval = socketpair(PF_UNIX, SOCK_DGRAM, 0, sockets); if (retval != 0) { perror("socketpair"); exit(1); } shutdown(sockets[0], SHUT_RDWR); read(sockets[0], buf, 1); } I tried to debug this issue with the kdb on 2.4.1-pre7. Here is the stack trace mcount+0x1f9 wait_for_packet+0x13 skb_recv_datagram+0xbb unix_dgram_recvmsg+0x53 sock_recvmsg+0x41 sock_read+0x8f sys_read+0xa4 system_call+0x3c I looked at the skb_recv_datagram code and noticed that wait_for_packet is not returning an error, even while trying to read a closed socket. Anyways here is a patch against 2.4.1 that will fix the issue. Please feel free to flame me about the patch :) thanks -- Prasanna Subash --- [EMAIL PROTECTED] --- TurboLinux, INC Linux, the choice | Q: How do you keep a moron in suspense? of a GNU generation -o) | Kernel 2.2.16 /\\ | on a i686_\\_v | | --- 2.4.1/net/core/datagram.c Fri Feb 2 01:00:10 2001 +++ linux/net/core/datagram.c Fri Feb 2 01:06:59 2001 @@ -74,15 +74,15 @@ if (error) goto out; - if (!skb_queue_empty(sk-receive_queue)) - goto ready; - + error = -ENOTCONN; /* Socket shut down? */ if (sk-shutdown RCV_SHUTDOWN) goto out; + if (!skb_queue_empty(sk-receive_queue)) + goto ready; + /* Sequenced packets can come disconnected. If so we report the problem */ - error = -ENOTCONN; if(connection_based(sk) !(sk-state==TCP_ESTABLISHED || sk-state==TCP_LISTEN)) goto out; PGP signature
Re: Serious reproducible 2.4.x kernel hang
Chris Evans <[EMAIL PROTECTED]> writes: > [cc: davem because of the severity] > > On Thu, 1 Feb 2001, Malcolm Beattie wrote: > > > rid of the hang. So it looks as though some combination of > > shutdown(2) and SIGABRT is at fault. After the hang the kernel-side > > Nope - I've nailed it to a _really_ simple test case. It looks like a > read() on a shutdown() unix dgram socket just kills the kernel. Demo code > below. I wonder if this affects UP or is SMP only? Kills my UP K6-2 dead as a doornail (except for pings, as you say). -Doug - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
On Thu, 1 Feb 2001, Chris Evans wrote: > > Nope - I've nailed it to a _really_ simple test case. It looks like a > read() on a shutdown() unix dgram socket just kills the kernel. Demo code > below. I wonder if this affects UP or is SMP only? It surely killed my PIII UP machine (running 2.4.1) Rui Sousa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
[cc: davem because of the severity] On Thu, 1 Feb 2001, Malcolm Beattie wrote: > rid of the hang. So it looks as though some combination of > shutdown(2) and SIGABRT is at fault. After the hang the kernel-side Nope - I've nailed it to a _really_ simple test case. It looks like a read() on a shutdown() unix dgram socket just kills the kernel. Demo code below. I wonder if this affects UP or is SMP only? Malcolm, does the below code reproduce the problem for you? Cheers Chris #include #include #include #include int main(int argc, const char* argv[]) { int retval; int sockets[2]; char buf[1]; retval = socketpair(PF_UNIX, SOCK_DGRAM, 0, sockets); if (retval != 0) { perror("socketpair"); exit(1); } shutdown(sockets[0], SHUT_RDWR); read(sockets[0], buf, 1); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
On Thu, 1 Feb 2001, Malcolm Beattie wrote: > Mapping the addresses from whichever ScrollLock combination produced > the task list to symbols produces the call trace > do_exit <- do_signal <- tcp_destroy_sock <- inet_ioctl <- signal_return > > The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl > anywhere at all and the next function before it in memory is > inet_shutdown which looks more believable. I have checked I'm looking Probably, the empty SIGPIPE handler triggered. The response to this is a lot of shutdown() close() and finally an exit(). The trace you give above looks like the child process trace. I always see the parent process go nuts. The parent process is almost always blocking on read() of a unix dgram socket, which it shares with the child. The child does a shutdown() on this socket just before exit(). Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
Chris Evans writes: > > On Thu, 1 Feb 2001, Malcolm Beattie wrote: > > > Mapping the addresses from whichever ScrollLock combination produced > > the task list to symbols produces the call trace > > do_exit <- do_signal <- tcp_destroy_sock <- inet_ioctl <- signal_return > > > > The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl > > anywhere at all and the next function before it in memory is > > inet_shutdown which looks more believable. I have checked I'm looking > > Probably, the empty SIGPIPE handler triggered. The response to this is a > lot of shutdown() close() and finally an exit(). > > The trace you give above looks like the child process trace. I always see > the parent process go nuts. The parent process is almost always blocking > on read() of a unix dgram socket, which it shares with the child. The > child does a shutdown() on this socket just before exit(). We've done some more detective work. I can reproduce the hang too by quitting the ftp client abruptly (^Z and kill %1 in my case). Inducing the hang while stracing the daemon shows a recv returning 0 as expected when the socket closes. The daemon then calls "die": die(const char* p_text) { /* Going down hard... */ #ifdef DIE_DEBUG bug(p_text); #endif and DIE_DEBUG is defined. bug() writes an error message and then does three things: shutdown(2) on the sockets close(2) on the sockets abort() the last of which libc implements as rt_sigprocmask(SIG_UNBLOCK, [SIGABRT]) kill(getpid(), SIGABRT) Here's the interesting thing: doing an exit(0) before the shutdowns and abort gets rid of the hang. The only unusual and potentially untested thing I could find about the program was that it uses capset() and prctl(PR_SET_KEEPCAPS). However, replacing the "retval = capset(...)" call with a dummy "retval = 0" doesn't get rid of the hang. So it looks as though some combination of shutdown(2) and SIGABRT is at fault. After the hang the kernel-side stack trace is always either the one I gave above (and I *did* write down the address for inet_ioctl correctly; it's definitely not inet_shutdown) or else do_exit <- do_signal <- schedule <- syscall_trace <- signal_return (with exactly the same addresses as above except for the differing schedule and syscall_trace ones) which appeared after the hang while vsftpd was being run under strace. --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
Malcolm Beattie writes: > Chris Evans writes: > > I've just managed to reproduce this personally on 2.4.0. I've had a report > > that 2.4.1 is also affected. Both myself and the other person who > > reproduced this have SMP i686 machines, which may or may not be relevant. > > > > To reproduce, all you need to do is get my vsftpd ftp server: > > ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz [...] > As in Chris' case, vzftpd was a zombie (so Foo-ScrollLock told me) and > all other processes were looking OK in R or S state. Mapping the addresses from whichever ScrollLock combination produced the task list to symbols produces the call trace do_exit <- do_signal <- tcp_destroy_sock <- inet_ioctl <- signal_return The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl anywhere at all and the next function before it in memory is inet_shutdown which looks more believable. I have checked I'm looking at the right System.map but I suppose I may have mis-transcribed the address when writing it down. vsftpd doesn't make use of signal handlers except to unset some existing ones and a SIGALRM handler which I don't think would have triggered. Something like a seg fault may have caused it (I should have seen an oops if it had happened in kernel space) perhaps? --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
On Thu, 1 Feb 2001, Malcolm Beattie wrote: > Chris Evans writes: > > I've just managed to reproduce this personally on 2.4.0. I've had a report > > that 2.4.1 is also affected. Both myself and the other person who > > reproduced this have SMP i686 machines, which may or may not be relevant. > > > > To reproduce, all you need to do is get my vsftpd ftp server: > > ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz > > I got this just before lunch too. I was trying out 2.4.1 + zerocopy > (with netfilter configured off, see the sendfile/zerocopy thread for [...] I reproduced with 2.4.1. > Looking at the kernel's EIP every so often to see what was going > showed remove_wait_queue, add_wait_queue, skb_recv_datagram and > wait_for_packet mostly. Random thought: if vsftpd did a sendfile and > then exited, becoming a zombie, could there be a problem with > tearing down a sendfile mapping? I'm off to read some code. I get it simply doing CTRL-C at the ftp logon prompt. No sendfile has been used at this point. Trying to distill a test case... Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
Chris Evans writes: > I've just managed to reproduce this personally on 2.4.0. I've had a report > that 2.4.1 is also affected. Both myself and the other person who > reproduced this have SMP i686 machines, which may or may not be relevant. > > To reproduce, all you need to do is get my vsftpd ftp server: > ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz I got this just before lunch too. I was trying out 2.4.1 + zerocopy (with netfilter configured off, see the sendfile/zerocopy thread for more details and hardware specs) and tried running vsftpd on the slower machine instead of the faster machine as before. I connected to vsftpd with an ftp client and got a 500 OOPS: chdir Login failed. 421 Service not available, remote server has closed connection (ftpd's idea of an OOPS; not the kernel's idea of an oops, of course). That was probably because I hadn't configured the directory properly but following that the machine hung, in the following way: userland hung: no more logins, existent xterm processes didn't refresh their windows on my (remote) display. The machine was still pingable, though. I configured Magic SysRq into the kernel but hadn't played with it before so I hadn't enabled it in /proc (D'oh. Next time I'll know.) As in Chris' case, vzftpd was a zombie (so Foo-ScrollLock told me) and all other processes were looking OK in R or S state. Looking at the kernel's EIP every so often to see what was going showed remove_wait_queue, add_wait_queue, skb_recv_datagram and wait_for_packet mostly. Random thought: if vsftpd did a sendfile and then exited, becoming a zombie, could there be a problem with tearing down a sendfile mapping? I'm off to read some code. --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
Chris Evans writes: I've just managed to reproduce this personally on 2.4.0. I've had a report that 2.4.1 is also affected. Both myself and the other person who reproduced this have SMP i686 machines, which may or may not be relevant. To reproduce, all you need to do is get my vsftpd ftp server: ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz I got this just before lunch too. I was trying out 2.4.1 + zerocopy (with netfilter configured off, see the sendfile/zerocopy thread for more details and hardware specs) and tried running vsftpd on the slower machine instead of the faster machine as before. I connected to vsftpd with an ftp client and got a 500 OOPS: chdir Login failed. 421 Service not available, remote server has closed connection (ftpd's idea of an OOPS; not the kernel's idea of an oops, of course). That was probably because I hadn't configured the directory properly but following that the machine hung, in the following way: userland hung: no more logins, existent xterm processes didn't refresh their windows on my (remote) display. The machine was still pingable, though. I configured Magic SysRq into the kernel but hadn't played with it before so I hadn't enabled it in /proc (D'oh. Next time I'll know.) As in Chris' case, vzftpd was a zombie (so Foo-ScrollLock told me) and all other processes were looking OK in R or S state. Looking at the kernel's EIP every so often to see what was going showed remove_wait_queue, add_wait_queue, skb_recv_datagram and wait_for_packet mostly. Random thought: if vsftpd did a sendfile and then exited, becoming a zombie, could there be a problem with tearing down a sendfile mapping? I'm off to read some code. --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
On Thu, 1 Feb 2001, Malcolm Beattie wrote: Chris Evans writes: I've just managed to reproduce this personally on 2.4.0. I've had a report that 2.4.1 is also affected. Both myself and the other person who reproduced this have SMP i686 machines, which may or may not be relevant. To reproduce, all you need to do is get my vsftpd ftp server: ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz I got this just before lunch too. I was trying out 2.4.1 + zerocopy (with netfilter configured off, see the sendfile/zerocopy thread for [...] I reproduced with 2.4.1. Looking at the kernel's EIP every so often to see what was going showed remove_wait_queue, add_wait_queue, skb_recv_datagram and wait_for_packet mostly. Random thought: if vsftpd did a sendfile and then exited, becoming a zombie, could there be a problem with tearing down a sendfile mapping? I'm off to read some code. I get it simply doing CTRL-C at the ftp logon prompt. No sendfile has been used at this point. Trying to distill a test case... Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
Malcolm Beattie writes: Chris Evans writes: I've just managed to reproduce this personally on 2.4.0. I've had a report that 2.4.1 is also affected. Both myself and the other person who reproduced this have SMP i686 machines, which may or may not be relevant. To reproduce, all you need to do is get my vsftpd ftp server: ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz [...] As in Chris' case, vzftpd was a zombie (so Foo-ScrollLock told me) and all other processes were looking OK in R or S state. Mapping the addresses from whichever ScrollLock combination produced the task list to symbols produces the call trace do_exit - do_signal - tcp_destroy_sock - inet_ioctl - signal_return The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl anywhere at all and the next function before it in memory is inet_shutdown which looks more believable. I have checked I'm looking at the right System.map but I suppose I may have mis-transcribed the address when writing it down. vsftpd doesn't make use of signal handlers except to unset some existing ones and a SIGALRM handler which I don't think would have triggered. Something like a seg fault may have caused it (I should have seen an oops if it had happened in kernel space) perhaps? --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
Chris Evans writes: On Thu, 1 Feb 2001, Malcolm Beattie wrote: Mapping the addresses from whichever ScrollLock combination produced the task list to symbols produces the call trace do_exit - do_signal - tcp_destroy_sock - inet_ioctl - signal_return The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl anywhere at all and the next function before it in memory is inet_shutdown which looks more believable. I have checked I'm looking Probably, the empty SIGPIPE handler triggered. The response to this is a lot of shutdown() close() and finally an exit(). The trace you give above looks like the child process trace. I always see the parent process go nuts. The parent process is almost always blocking on read() of a unix dgram socket, which it shares with the child. The child does a shutdown() on this socket just before exit(). We've done some more detective work. I can reproduce the hang too by quitting the ftp client abruptly (^Z and kill %1 in my case). Inducing the hang while stracing the daemon shows a recv returning 0 as expected when the socket closes. The daemon then calls "die": die(const char* p_text) { /* Going down hard... */ #ifdef DIE_DEBUG bug(p_text); #endif and DIE_DEBUG is defined. bug() writes an error message and then does three things: shutdown(2) on the sockets close(2) on the sockets abort() the last of which libc implements as rt_sigprocmask(SIG_UNBLOCK, [SIGABRT]) kill(getpid(), SIGABRT) Here's the interesting thing: doing an exit(0) before the shutdowns and abort gets rid of the hang. The only unusual and potentially untested thing I could find about the program was that it uses capset() and prctl(PR_SET_KEEPCAPS). However, replacing the "retval = capset(...)" call with a dummy "retval = 0" doesn't get rid of the hang. So it looks as though some combination of shutdown(2) and SIGABRT is at fault. After the hang the kernel-side stack trace is always either the one I gave above (and I *did* write down the address for inet_ioctl correctly; it's definitely not inet_shutdown) or else do_exit - do_signal - schedule - syscall_trace - signal_return (with exactly the same addresses as above except for the differing schedule and syscall_trace ones) which appeared after the hang while vsftpd was being run under strace. --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
On Thu, 1 Feb 2001, Malcolm Beattie wrote: Mapping the addresses from whichever ScrollLock combination produced the task list to symbols produces the call trace do_exit - do_signal - tcp_destroy_sock - inet_ioctl - signal_return The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl anywhere at all and the next function before it in memory is inet_shutdown which looks more believable. I have checked I'm looking Probably, the empty SIGPIPE handler triggered. The response to this is a lot of shutdown() close() and finally an exit(). The trace you give above looks like the child process trace. I always see the parent process go nuts. The parent process is almost always blocking on read() of a unix dgram socket, which it shares with the child. The child does a shutdown() on this socket just before exit(). Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
On Thu, 1 Feb 2001, Chris Evans wrote: Nope - I've nailed it to a _really_ simple test case. It looks like a read() on a shutdown() unix dgram socket just kills the kernel. Demo code below. I wonder if this affects UP or is SMP only? It surely killed my PIII UP machine (running 2.4.1) Rui Sousa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
[cc: davem because of the severity] On Thu, 1 Feb 2001, Malcolm Beattie wrote: rid of the hang. So it looks as though some combination of shutdown(2) and SIGABRT is at fault. After the hang the kernel-side Nope - I've nailed it to a _really_ simple test case. It looks like a read() on a shutdown() unix dgram socket just kills the kernel. Demo code below. I wonder if this affects UP or is SMP only? Malcolm, does the below code reproduce the problem for you? Cheers Chris #include stdio.h #include unistd.h #include sys/types.h #include sys/socket.h int main(int argc, const char* argv[]) { int retval; int sockets[2]; char buf[1]; retval = socketpair(PF_UNIX, SOCK_DGRAM, 0, sockets); if (retval != 0) { perror("socketpair"); exit(1); } shutdown(sockets[0], SHUT_RDWR); read(sockets[0], buf, 1); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
Chris Evans [EMAIL PROTECTED] writes: [cc: davem because of the severity] On Thu, 1 Feb 2001, Malcolm Beattie wrote: rid of the hang. So it looks as though some combination of shutdown(2) and SIGABRT is at fault. After the hang the kernel-side Nope - I've nailed it to a _really_ simple test case. It looks like a read() on a shutdown() unix dgram socket just kills the kernel. Demo code below. I wonder if this affects UP or is SMP only? Kills my UP K6-2 dead as a doornail (except for pings, as you say). -Doug - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/