This is what's happening:

   1 - boot UML using root_fs
   2 - shutdown UML - doesn't shutdown properly, leaving root_fs open
   and locked
   3 - boot UML again
   4 - new UML cannot use root_fs due to lock held by old UML

I am seeing this problem on my setup. "shutdown" on a UML does not
terminate. It gets to the "power down" point but the /usr/bin/linux
process does not exit, rather it hangs forever.  If I hit ^C then
I get a shell prompt back, but there is still a [linux] process
which cannot be killed.

This is 2.4.26-3um running on host 2.6.10 with the skas3-v8 patch.

Observe:

baron:~# ps axuww | grep linux
usermode   954  0.0  0.0   304  300 ?        T    Apr29   1:15 [linux]
usermode 24870  0.0  0.4 262232 8620 ?       T    18:32   0:00 linux            
                                  con0=fd:0,fd:1 mem=256m 
eth0=tuntap,,,10.1.0.1ubd0=linux-mantis.image
usermode 24991  0.9  3.2 262248 67180 pts/4  S+   18:38   0:11 linux [depmod]   
                                           con0=fd:0,fd:1 mem=256m 
eth0=tuntap,,,10.1.0.1ubd0=linux-mantis.image
usermode 24994  0.2  0.1 16392 2124 pts/4    T+   18:38   0:03 [linux]
usermode 25003  0.0  3.2 262248 67180 pts/4  S+   18:38   0:00 linux [depmod]   
                                           con0=fd:0,fd:1 mem=256m 
eth0=tuntap,,,10.1.0.1ubd0=linux-mantis.image
usermode 25004  0.0  3.2 262248 67180 pts/4  S+   18:38   0:00 linux [depmod]   
                                           con0=fd:0,fd:1 mem=256m 
eth0=tuntap,,,10.1.0.1ubd0=linux-mantis.image
usermode 25005  0.0  3.2 262248 67180 pts/4  S+   18:38   0:00 linux [depmod]   
                                           con0=fd:0,fd:1 mem=256m 
eth0=tuntap,,,10.1.0.1ubd0=linux-mantis.image

The last 4 processes are a working UML instance. The first two are old UMLs
which did not shutdown or startup correctly.

Status 'T' means "traced or stopped". Not a zombie, so it should be killable.

baron:~# ls -l /proc/954/fd
total 20
lrwx------  1 root root 64 May  5 18:59 0 -> /dev/pts/0 (deleted)
lrwx------  1 root root 64 May  5 18:59 1 -> /dev/pts/0 (deleted)
lrwx------  1 root root 64 May  5 18:59 10 -> socket:[2912]
lrwx------  1 root root 64 May  5 18:59 11 -> socket:[2913]
lrwx------  1 root root 64 May  5 18:59 12 -> socket:[2914]
lrwx------  1 root root 64 May  5 18:59 13 -> 
/baron/FileSystems/linux-mantis.image (deleted)
l-wx------  1 root root 64 May  5 18:59 14 -> /proc/mm
l-wx------  1 root root 64 May  5 18:59 15 -> /proc/mm
l-wx------  1 root root 64 May  5 18:59 16 -> /proc/mm
l-wx------  1 root root 64 May  5 18:59 17 -> /proc/mm
lr-x------  1 root root 64 May  5 18:59 18 -> /baron/.uml/wnTBF0 (deleted)
lrwx------  1 root root 64 May  5 18:59 19 -> socket:[2924]
lrwx------  1 root root 64 May  5 18:59 2 -> /dev/pts/0 (deleted)
lrwx------  1 root root 64 May  5 18:59 3 -> /tmp/vm_file-iUsEEz (deleted)
lrwx------  1 root root 64 May  5 18:59 4 -> socket:[2905]
lrwx------  1 root root 64 May  5 18:59 5 -> socket:[2906]
lrwx------  1 root root 64 May  5 18:59 6 -> socket:[2907]
lrwx------  1 root root 64 May  5 18:59 7 -> socket:[2909]
lrwx------  1 root root 64 May  5 18:59 8 -> socket:[2910]
lrwx------  1 root root 64 May  5 18:59 9 -> socket:[2911]

I had to copy the filesystem image to another file and rename it
over the top of the original before I could run UML again.

baron:~# kill -9 954
baron:~# kill -9 954
baron:~# kill -9 954
baron:~# kill -9 954
baron:~# kill -9 24870
baron:~# kill -9 24870
baron:~# kill -9 24870
baron:~# kill -9 24870
baron:~# ps uww 954
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
usermode   954  0.0  0.0   304  300 ?        T    Apr29   1:15 [linux]

These processes cannot be killed.

But I just found that I can do this:

baron:~# kill -CONT 954
baron:~# kill -9 954
-bash: kill: (954) - No such process
baron:~# kill -CONT 24870
baron:~# ps uww 24860
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND

So sending them a SIGCONT gets the process going again. That leaves
the questions: why did the shutdown hang in the first place, is the
process SIGSTOPing itself, and if I had done a "kill -CONT" before
doing a "kill -9" would the process have exited cleanly?

Nick.
-- 
PGP Key ID = 0x418487E7                      http://www.nick-andrew.net/
PGP Key fingerprint = B3ED 6894 8E49 1770 C24A  67E3 6266 6EB9 4184 87E7
"I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect."                  -- Linus Torvalds, Sep 2003

Attachment: signature.asc
Description: Digital signature

Reply via email to