Re: Serious reproducible 2.4.x kernel hang

2001-02-03 Thread kees

Hi,

What is related in /proc w.r.t. sysrq?

Kees

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-03 Thread kees

Hi,

What is related in /proc w.r.t. sysrq?

Kees

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Patch]Re: Serious reproducible 2.4.x kernel hang

2001-02-02 Thread David S. Miller


Prasanna P Subash writes:
 > I looked at the skb_recv_datagram code and noticed that wait_for_packet is not
 > returning an error, even while trying to read a closed socket.
 > Anyways here is a patch against 2.4.1 that will fix the issue.
 > Please feel free to flame me about the patch :)

Please read the rest of today's postings, Alexey Kuznetsov already
posted the correct fix, which I'm attached below:

diff -u --recursive --new-file --exclude=CVS --exclude=.cvsignore 
vanilla/linux/net/core/datagram.c linux/net/core/datagram.c
--- vanilla/linux/net/core/datagram.c   Sat Nov 11 19:02:40 2000
+++ linux/net/core/datagram.c   Thu Feb  1 17:15:12 2001
@@ -72,19 +73,19 @@
/* Socket errors? */
error = sock_error(sk);
if (error)
-   goto out;
+   goto out_err;
 
if (!skb_queue_empty(>receive_queue))
goto ready;
 
/* Socket shut down? */
if (sk->shutdown & RCV_SHUTDOWN)
-   goto out;
+   goto out_noerr;
 
/* Sequenced packets can come disconnected. If so we report the problem */
error = -ENOTCONN;
if(connection_based(sk) && !(sk->state==TCP_ESTABLISHED || 
sk->state==TCP_LISTEN))
-   goto out;
+   goto out_err;
 
/* handle signals */
if (signal_pending(current))
@@ -99,11 +100,16 @@
 
 interrupted:
error = sock_intr_errno(*timeo_p);
+out_err:
+   *err = error;
 out:
current->state = TASK_RUNNING;
remove_wait_queue(sk->sleep, );
-   *err = error;
return error;
+out_noerr:
+   *err = 0;
+   error = 1;
+   goto out;
 }
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[Patch]Re: Serious reproducible 2.4.x kernel hang

2001-02-02 Thread Prasanna P Subash

 
> #include 
> #include 
> #include 
> #include 
> 
> int
> main(int argc, const char* argv[])
> {
>   int retval;
>   int sockets[2];
>   char buf[1];
> 
>   retval = socketpair(PF_UNIX, SOCK_DGRAM, 0, sockets);
>   if (retval != 0)
>   {
> perror("socketpair");
> exit(1);
>   }
>   shutdown(sockets[0], SHUT_RDWR);
>   read(sockets[0], buf, 1);
> }

I tried to debug this issue with the kdb on 2.4.1-pre7.
Here is the stack trace

mcount+0x1f9
wait_for_packet+0x13
skb_recv_datagram+0xbb
unix_dgram_recvmsg+0x53
sock_recvmsg+0x41
sock_read+0x8f
sys_read+0xa4
system_call+0x3c

I looked at the skb_recv_datagram code and noticed that wait_for_packet is not
returning an error, even while trying to read a closed socket.
Anyways here is a patch against 2.4.1 that will fix the issue.
Please feel free to flame me about the patch :)

thanks
-- 
Prasanna Subash   ---   [EMAIL PROTECTED]   --- TurboLinux, INC

Linux, the choice  | Q: How do you keep a moron in suspense? 
of a GNU generation   -o)  | 
Kernel 2.2.16 /\\  | 
on a i686_\\_v | 
   | 



--- 2.4.1/net/core/datagram.c	Fri Feb  2 01:00:10 2001
+++ linux/net/core/datagram.c	Fri Feb  2 01:06:59 2001
@@ -74,15 +74,15 @@
 	if (error)
 		goto out;
 
-	if (!skb_queue_empty(>receive_queue))
-		goto ready;
-
+	error = -ENOTCONN;
 	/* Socket shut down? */
 	if (sk->shutdown & RCV_SHUTDOWN)
 		goto out;
 
+	if (!skb_queue_empty(>receive_queue))
+		goto ready;
+
 	/* Sequenced packets can come disconnected. If so we report the problem */
-	error = -ENOTCONN;
 	if(connection_based(sk) && !(sk->state==TCP_ESTABLISHED || sk->state==TCP_LISTEN))
 		goto out;
 

 PGP signature


[Patch]Re: Serious reproducible 2.4.x kernel hang

2001-02-02 Thread Prasanna P Subash

 
 #include stdio.h
 #include unistd.h
 #include sys/types.h
 #include sys/socket.h
 
 int
 main(int argc, const char* argv[])
 {
   int retval;
   int sockets[2];
   char buf[1];
 
   retval = socketpair(PF_UNIX, SOCK_DGRAM, 0, sockets);
   if (retval != 0)
   {
 perror("socketpair");
 exit(1);
   }
   shutdown(sockets[0], SHUT_RDWR);
   read(sockets[0], buf, 1);
 }

I tried to debug this issue with the kdb on 2.4.1-pre7.
Here is the stack trace

mcount+0x1f9
wait_for_packet+0x13
skb_recv_datagram+0xbb
unix_dgram_recvmsg+0x53
sock_recvmsg+0x41
sock_read+0x8f
sys_read+0xa4
system_call+0x3c

I looked at the skb_recv_datagram code and noticed that wait_for_packet is not
returning an error, even while trying to read a closed socket.
Anyways here is a patch against 2.4.1 that will fix the issue.
Please feel free to flame me about the patch :)

thanks
-- 
Prasanna Subash   ---   [EMAIL PROTECTED]   --- TurboLinux, INC

Linux, the choice  | Q: How do you keep a moron in suspense? 
of a GNU generation   -o)  | 
Kernel 2.2.16 /\\  | 
on a i686_\\_v | 
   | 



--- 2.4.1/net/core/datagram.c	Fri Feb  2 01:00:10 2001
+++ linux/net/core/datagram.c	Fri Feb  2 01:06:59 2001
@@ -74,15 +74,15 @@
 	if (error)
 		goto out;
 
-	if (!skb_queue_empty(sk-receive_queue))
-		goto ready;
-
+	error = -ENOTCONN;
 	/* Socket shut down? */
 	if (sk-shutdown  RCV_SHUTDOWN)
 		goto out;
 
+	if (!skb_queue_empty(sk-receive_queue))
+		goto ready;
+
 	/* Sequenced packets can come disconnected. If so we report the problem */
-	error = -ENOTCONN;
 	if(connection_based(sk)  !(sk-state==TCP_ESTABLISHED || sk-state==TCP_LISTEN))
 		goto out;
 

 PGP signature


Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Doug McNaught

Chris Evans <[EMAIL PROTECTED]> writes:

> [cc: davem because of the severity]
> 
> On Thu, 1 Feb 2001, Malcolm Beattie wrote:
> 
> > rid of the hang. So it looks as though some combination of
> > shutdown(2) and SIGABRT is at fault. After the hang the kernel-side
> 
> Nope - I've nailed it to a _really_ simple test case. It looks like a
> read() on a shutdown() unix dgram socket just kills the kernel. Demo code
> below. I wonder if this affects UP or is SMP only?

Kills my UP K6-2 dead as a doornail (except for pings, as you say). 

-Doug
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread rui.sousa

On Thu, 1 Feb 2001, Chris Evans wrote:

>
> Nope - I've nailed it to a _really_ simple test case. It looks like a
> read() on a shutdown() unix dgram socket just kills the kernel. Demo code
> below. I wonder if this affects UP or is SMP only?

It surely killed my PIII UP machine (running 2.4.1)

Rui Sousa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Chris Evans


[cc: davem because of the severity]

On Thu, 1 Feb 2001, Malcolm Beattie wrote:

> rid of the hang. So it looks as though some combination of
> shutdown(2) and SIGABRT is at fault. After the hang the kernel-side

Nope - I've nailed it to a _really_ simple test case. It looks like a
read() on a shutdown() unix dgram socket just kills the kernel. Demo code
below. I wonder if this affects UP or is SMP only?

Malcolm, does the below code reproduce the problem for you?

Cheers
Chris

#include 
#include 
#include 
#include 

int
main(int argc, const char* argv[])
{
  int retval;
  int sockets[2];
  char buf[1];

  retval = socketpair(PF_UNIX, SOCK_DGRAM, 0, sockets);
  if (retval != 0)
  {
perror("socketpair");
exit(1);
  }
  shutdown(sockets[0], SHUT_RDWR);
  read(sockets[0], buf, 1);
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Chris Evans


On Thu, 1 Feb 2001, Malcolm Beattie wrote:

> Mapping the addresses from whichever ScrollLock combination produced
> the task list to symbols produces the call trace
>  do_exit <- do_signal <- tcp_destroy_sock <- inet_ioctl <- signal_return
>
> The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl
> anywhere at all and the next function before it in memory is
> inet_shutdown which looks more believable. I have checked I'm looking

Probably, the empty SIGPIPE handler triggered. The response to this is a
lot of shutdown() close() and finally an exit().

The trace you give above looks like the child process trace. I always see
the parent process go nuts. The parent process is almost always blocking
on read() of a unix dgram socket, which it shares with the child. The
child does a shutdown() on this socket just before exit().

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Malcolm Beattie

Chris Evans writes:
> 
> On Thu, 1 Feb 2001, Malcolm Beattie wrote:
> 
> > Mapping the addresses from whichever ScrollLock combination produced
> > the task list to symbols produces the call trace
> >  do_exit <- do_signal <- tcp_destroy_sock <- inet_ioctl <- signal_return
> >
> > The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl
> > anywhere at all and the next function before it in memory is
> > inet_shutdown which looks more believable. I have checked I'm looking
> 
> Probably, the empty SIGPIPE handler triggered. The response to this is a
> lot of shutdown() close() and finally an exit().
> 
> The trace you give above looks like the child process trace. I always see
> the parent process go nuts. The parent process is almost always blocking
> on read() of a unix dgram socket, which it shares with the child. The
> child does a shutdown() on this socket just before exit().

We've done some more detective work. I can reproduce the hang too
by quitting the ftp client abruptly (^Z and kill %1 in my case).
Inducing the hang while stracing the daemon shows a recv returning 0
as expected when the socket closes. The daemon then calls "die":

die(const char* p_text)
{
  /* Going down hard... */
#ifdef DIE_DEBUG
  bug(p_text);
#endif

and DIE_DEBUG is defined. bug() writes an error message and then does
three things:
shutdown(2) on the sockets
close(2) on the sockets
abort()
the last of which libc implements as
rt_sigprocmask(SIG_UNBLOCK, [SIGABRT])
kill(getpid(), SIGABRT)

Here's the interesting thing: doing an exit(0) before the shutdowns
and abort gets rid of the hang. The only unusual and potentially
untested thing I could find about the program was that it uses
capset() and prctl(PR_SET_KEEPCAPS). However, replacing the
"retval = capset(...)" call with a dummy "retval = 0" doesn't get
rid of the hang. So it looks as though some combination of
shutdown(2) and SIGABRT is at fault. After the hang the kernel-side
stack trace is always either the one I gave above (and I *did*
write down the address for inet_ioctl correctly; it's definitely
not inet_shutdown) or else
  do_exit <- do_signal <- schedule <- syscall_trace <- signal_return
(with exactly the same addresses as above except for the differing
schedule and syscall_trace ones) which appeared after the hang while
vsftpd was being run under strace.

--Malcolm

-- 
Malcolm Beattie <[EMAIL PROTECTED]>
Unix Systems Programmer
Oxford University Computing Services
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Malcolm Beattie

Malcolm Beattie writes:
> Chris Evans writes:
> > I've just managed to reproduce this personally on 2.4.0. I've had a report
> > that 2.4.1 is also affected. Both myself and the other person who
> > reproduced this have SMP i686 machines, which may or may not be relevant.
> > 
> > To reproduce, all you need to do is get my vsftpd ftp server:
> > ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz
[...]
> As in Chris' case, vzftpd was a zombie (so Foo-ScrollLock told me) and
> all other processes were looking OK in R or S state.

Mapping the addresses from whichever ScrollLock combination produced
the task list to symbols produces the call trace
 do_exit <- do_signal <- tcp_destroy_sock <- inet_ioctl <- signal_return

The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl
anywhere at all and the next function before it in memory is
inet_shutdown which looks more believable. I have checked I'm looking
at the right System.map but I suppose I may have mis-transcribed the
address when writing it down. vsftpd doesn't make use of signal
handlers except to unset some existing ones and a SIGALRM handler
which I don't think would have triggered. Something like a seg fault
may have caused it (I should have seen an oops if it had happened in
kernel space) perhaps?

--Malcolm

-- 
Malcolm Beattie <[EMAIL PROTECTED]>
Unix Systems Programmer
Oxford University Computing Services
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Chris Evans


On Thu, 1 Feb 2001, Malcolm Beattie wrote:

> Chris Evans writes:
> > I've just managed to reproduce this personally on 2.4.0. I've had a report
> > that 2.4.1 is also affected. Both myself and the other person who
> > reproduced this have SMP i686 machines, which may or may not be relevant.
> >
> > To reproduce, all you need to do is get my vsftpd ftp server:
> > ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz
>
> I got this just before lunch too. I was trying out 2.4.1 + zerocopy
> (with netfilter configured off, see the sendfile/zerocopy thread for

[...]

I reproduced with 2.4.1.

> Looking at the kernel's EIP every so often to see what was going
> showed remove_wait_queue, add_wait_queue, skb_recv_datagram and
> wait_for_packet mostly. Random thought: if vsftpd did a sendfile and
> then exited, becoming a zombie, could there be a problem with
> tearing down a sendfile mapping? I'm off to read some code.

I get it simply doing CTRL-C at the ftp logon prompt. No sendfile has been
used at this point. Trying to distill a test case...

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Malcolm Beattie

Chris Evans writes:
> I've just managed to reproduce this personally on 2.4.0. I've had a report
> that 2.4.1 is also affected. Both myself and the other person who
> reproduced this have SMP i686 machines, which may or may not be relevant.
> 
> To reproduce, all you need to do is get my vsftpd ftp server:
> ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz

I got this just before lunch too. I was trying out 2.4.1 + zerocopy
(with netfilter configured off, see the sendfile/zerocopy thread for
more details and hardware specs) and tried running vsftpd on the
slower machine instead of the faster machine as before. I connected
to vsftpd with an ftp client and got a
500 OOPS: chdir
Login failed.
421 Service not available, remote server has closed connection
(ftpd's idea of an OOPS; not the kernel's idea of an oops, of course).
That was probably because I hadn't configured the directory properly
but following that the machine hung, in the following way: userland
hung: no more logins, existent xterm processes didn't refresh their
windows on my (remote) display. The machine was still pingable, though.

I configured Magic SysRq into the kernel but hadn't played with it
before so I hadn't enabled it in /proc (D'oh. Next time I'll know.)
As in Chris' case, vzftpd was a zombie (so Foo-ScrollLock told me) and
all other processes were looking OK in R or S state.

Looking at the kernel's EIP every so often to see what was going
showed remove_wait_queue, add_wait_queue, skb_recv_datagram and
wait_for_packet mostly. Random thought: if vsftpd did a sendfile and
then exited, becoming a zombie, could there be a problem with
tearing down a sendfile mapping? I'm off to read some code.

--Malcolm

-- 
Malcolm Beattie <[EMAIL PROTECTED]>
Unix Systems Programmer
Oxford University Computing Services
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Malcolm Beattie

Chris Evans writes:
 I've just managed to reproduce this personally on 2.4.0. I've had a report
 that 2.4.1 is also affected. Both myself and the other person who
 reproduced this have SMP i686 machines, which may or may not be relevant.
 
 To reproduce, all you need to do is get my vsftpd ftp server:
 ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz

I got this just before lunch too. I was trying out 2.4.1 + zerocopy
(with netfilter configured off, see the sendfile/zerocopy thread for
more details and hardware specs) and tried running vsftpd on the
slower machine instead of the faster machine as before. I connected
to vsftpd with an ftp client and got a
500 OOPS: chdir
Login failed.
421 Service not available, remote server has closed connection
(ftpd's idea of an OOPS; not the kernel's idea of an oops, of course).
That was probably because I hadn't configured the directory properly
but following that the machine hung, in the following way: userland
hung: no more logins, existent xterm processes didn't refresh their
windows on my (remote) display. The machine was still pingable, though.

I configured Magic SysRq into the kernel but hadn't played with it
before so I hadn't enabled it in /proc (D'oh. Next time I'll know.)
As in Chris' case, vzftpd was a zombie (so Foo-ScrollLock told me) and
all other processes were looking OK in R or S state.

Looking at the kernel's EIP every so often to see what was going
showed remove_wait_queue, add_wait_queue, skb_recv_datagram and
wait_for_packet mostly. Random thought: if vsftpd did a sendfile and
then exited, becoming a zombie, could there be a problem with
tearing down a sendfile mapping? I'm off to read some code.

--Malcolm

-- 
Malcolm Beattie [EMAIL PROTECTED]
Unix Systems Programmer
Oxford University Computing Services
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Chris Evans


On Thu, 1 Feb 2001, Malcolm Beattie wrote:

 Chris Evans writes:
  I've just managed to reproduce this personally on 2.4.0. I've had a report
  that 2.4.1 is also affected. Both myself and the other person who
  reproduced this have SMP i686 machines, which may or may not be relevant.
 
  To reproduce, all you need to do is get my vsftpd ftp server:
  ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz

 I got this just before lunch too. I was trying out 2.4.1 + zerocopy
 (with netfilter configured off, see the sendfile/zerocopy thread for

[...]

I reproduced with 2.4.1.

 Looking at the kernel's EIP every so often to see what was going
 showed remove_wait_queue, add_wait_queue, skb_recv_datagram and
 wait_for_packet mostly. Random thought: if vsftpd did a sendfile and
 then exited, becoming a zombie, could there be a problem with
 tearing down a sendfile mapping? I'm off to read some code.

I get it simply doing CTRL-C at the ftp logon prompt. No sendfile has been
used at this point. Trying to distill a test case...

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Malcolm Beattie

Malcolm Beattie writes:
 Chris Evans writes:
  I've just managed to reproduce this personally on 2.4.0. I've had a report
  that 2.4.1 is also affected. Both myself and the other person who
  reproduced this have SMP i686 machines, which may or may not be relevant.
  
  To reproduce, all you need to do is get my vsftpd ftp server:
  ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz
[...]
 As in Chris' case, vzftpd was a zombie (so Foo-ScrollLock told me) and
 all other processes were looking OK in R or S state.

Mapping the addresses from whichever ScrollLock combination produced
the task list to symbols produces the call trace
 do_exit - do_signal - tcp_destroy_sock - inet_ioctl - signal_return

The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl
anywhere at all and the next function before it in memory is
inet_shutdown which looks more believable. I have checked I'm looking
at the right System.map but I suppose I may have mis-transcribed the
address when writing it down. vsftpd doesn't make use of signal
handlers except to unset some existing ones and a SIGALRM handler
which I don't think would have triggered. Something like a seg fault
may have caused it (I should have seen an oops if it had happened in
kernel space) perhaps?

--Malcolm

-- 
Malcolm Beattie [EMAIL PROTECTED]
Unix Systems Programmer
Oxford University Computing Services
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Malcolm Beattie

Chris Evans writes:
 
 On Thu, 1 Feb 2001, Malcolm Beattie wrote:
 
  Mapping the addresses from whichever ScrollLock combination produced
  the task list to symbols produces the call trace
   do_exit - do_signal - tcp_destroy_sock - inet_ioctl - signal_return
 
  The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl
  anywhere at all and the next function before it in memory is
  inet_shutdown which looks more believable. I have checked I'm looking
 
 Probably, the empty SIGPIPE handler triggered. The response to this is a
 lot of shutdown() close() and finally an exit().
 
 The trace you give above looks like the child process trace. I always see
 the parent process go nuts. The parent process is almost always blocking
 on read() of a unix dgram socket, which it shares with the child. The
 child does a shutdown() on this socket just before exit().

We've done some more detective work. I can reproduce the hang too
by quitting the ftp client abruptly (^Z and kill %1 in my case).
Inducing the hang while stracing the daemon shows a recv returning 0
as expected when the socket closes. The daemon then calls "die":

die(const char* p_text)
{
  /* Going down hard... */
#ifdef DIE_DEBUG
  bug(p_text);
#endif

and DIE_DEBUG is defined. bug() writes an error message and then does
three things:
shutdown(2) on the sockets
close(2) on the sockets
abort()
the last of which libc implements as
rt_sigprocmask(SIG_UNBLOCK, [SIGABRT])
kill(getpid(), SIGABRT)

Here's the interesting thing: doing an exit(0) before the shutdowns
and abort gets rid of the hang. The only unusual and potentially
untested thing I could find about the program was that it uses
capset() and prctl(PR_SET_KEEPCAPS). However, replacing the
"retval = capset(...)" call with a dummy "retval = 0" doesn't get
rid of the hang. So it looks as though some combination of
shutdown(2) and SIGABRT is at fault. After the hang the kernel-side
stack trace is always either the one I gave above (and I *did*
write down the address for inet_ioctl correctly; it's definitely
not inet_shutdown) or else
  do_exit - do_signal - schedule - syscall_trace - signal_return
(with exactly the same addresses as above except for the differing
schedule and syscall_trace ones) which appeared after the hang while
vsftpd was being run under strace.

--Malcolm

-- 
Malcolm Beattie [EMAIL PROTECTED]
Unix Systems Programmer
Oxford University Computing Services
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Chris Evans


On Thu, 1 Feb 2001, Malcolm Beattie wrote:

 Mapping the addresses from whichever ScrollLock combination produced
 the task list to symbols produces the call trace
  do_exit - do_signal - tcp_destroy_sock - inet_ioctl - signal_return

 The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl
 anywhere at all and the next function before it in memory is
 inet_shutdown which looks more believable. I have checked I'm looking

Probably, the empty SIGPIPE handler triggered. The response to this is a
lot of shutdown() close() and finally an exit().

The trace you give above looks like the child process trace. I always see
the parent process go nuts. The parent process is almost always blocking
on read() of a unix dgram socket, which it shares with the child. The
child does a shutdown() on this socket just before exit().

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread rui.sousa

On Thu, 1 Feb 2001, Chris Evans wrote:


 Nope - I've nailed it to a _really_ simple test case. It looks like a
 read() on a shutdown() unix dgram socket just kills the kernel. Demo code
 below. I wonder if this affects UP or is SMP only?

It surely killed my PIII UP machine (running 2.4.1)

Rui Sousa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Chris Evans


[cc: davem because of the severity]

On Thu, 1 Feb 2001, Malcolm Beattie wrote:

 rid of the hang. So it looks as though some combination of
 shutdown(2) and SIGABRT is at fault. After the hang the kernel-side

Nope - I've nailed it to a _really_ simple test case. It looks like a
read() on a shutdown() unix dgram socket just kills the kernel. Demo code
below. I wonder if this affects UP or is SMP only?

Malcolm, does the below code reproduce the problem for you?

Cheers
Chris

#include stdio.h
#include unistd.h
#include sys/types.h
#include sys/socket.h

int
main(int argc, const char* argv[])
{
  int retval;
  int sockets[2];
  char buf[1];

  retval = socketpair(PF_UNIX, SOCK_DGRAM, 0, sockets);
  if (retval != 0)
  {
perror("socketpair");
exit(1);
  }
  shutdown(sockets[0], SHUT_RDWR);
  read(sockets[0], buf, 1);
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Serious reproducible 2.4.x kernel hang

2001-02-01 Thread Doug McNaught

Chris Evans [EMAIL PROTECTED] writes:

 [cc: davem because of the severity]
 
 On Thu, 1 Feb 2001, Malcolm Beattie wrote:
 
  rid of the hang. So it looks as though some combination of
  shutdown(2) and SIGABRT is at fault. After the hang the kernel-side
 
 Nope - I've nailed it to a _really_ simple test case. It looks like a
 read() on a shutdown() unix dgram socket just kills the kernel. Demo code
 below. I wonder if this affects UP or is SMP only?

Kills my UP K6-2 dead as a doornail (except for pings, as you say). 

-Doug
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/