Bug#599496: linux-2.6: cannot kill process hung on syscall

2010-10-07 Thread brian m. carlson
Package: linux-2.6
Version: 2.6.32-22
Severity: normal

Recently, I've had a problem with tasks that apparently get hung on a
syscall.  For example:

  Oct  8 02:57:54 lakeview kernel: [84840.484280] INFO: task Xorg:4560 blocked 
for more than 120 seconds.

When this occurs, even "sudo kill -9 4560" does not work.  The kernel
should properly and immediately terminate processes receiving a SIGKILL
as root, even (especially) if that process is hung on a syscall.  The
inability to do this means that the machine becomes unusable when Xorg
hangs.

-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187


signature.asc
Description: Digital signature


Bug#599496: linux-2.6: cannot kill process hung on syscall

2010-10-07 Thread Ben Hutchings
On Fri, 2010-10-08 at 03:25 +, brian m. carlson wrote:
> Package: linux-2.6
> Version: 2.6.32-22
> Severity: normal
> 
> Recently, I've had a problem with tasks that apparently get hung on a
> syscall.  For example:
> 
>   Oct  8 02:57:54 lakeview kernel: [84840.484280] INFO: task Xorg:4560 
> blocked for more than 120 seconds.
> 
> When this occurs, even "sudo kill -9 4560" does not work.  The kernel
> should properly and immediately terminate processes receiving a SIGKILL
> as root, even (especially) if that process is hung on a syscall.

In general, this can leave kernel structures in an invalid state.  This
is not a desirable result and so fatal signals do not work that way.
There are variants of sleeping and locking functions that return
immediately on receipt of a fatal signal, and it is desirable that these
are used in the implementation of system calls.  However, the
implementations of system calls include parts of every file system,
driver and network protocol in the system.  There is no central place
where such changes could be made.

> The
> inability to do this means that the machine becomes unusable when Xorg
> hangs.

Would you care to provide more information about the context in which
this happens?

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


signature.asc
Description: This is a digitally signed message part