Re: [uml-devel] negative pid -516 possible ?

2014-02-15 Thread Toralf Förster
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 01/13/2014 08:54 PM, Toralf Förster wrote:
 On 01/13/2014 12:21 AM, Richard Weinberger wrote:
 On Sat, Jan 11, 2014 at 11:47 AM, Toralf Förster toralf.foers...@gmx.de 
 wrote:
 I do fuzz testing with trinity (latest git version) a stable 32 bit Gentoo 
 Linux user mode linux image.
 The host is a stable 32 bit vanilla 3.12.7 kernel, the guest runs latest git 
 tree + 2 patches (attached).
 
 The trinity call in the UML guest is :
 $ trinity -q -l off -N 1 -C 2 -x move_pages -x mremap -v /mnt/ramdisk
 
 After a while there's no progress on the command line seen at the host 
 system - the trinity process seems to just hangs/idling. When this does 
 occur I cannot longer ssh into the system. The system however runs 
 furthermore. In another terminal I still see the output of this command:
 
 Does it consume 100% CPU?
 
 No.
 It just doesnt allow new ssh connections. Existing ssh conenctinos are still 
 working.
 
 $ ssh root@trinity tail -f /var/log/messages
 
 That's why I do know that the system does not hang completely. The output of 
 top at the host system gives me the pid of the linux exe. A gdb call gives 
 for that pid :
 
 $ date; sudo gdb /home/tfoerste/devel/linux/linux 25224 -n -batch -ex 'bt 
 full'
 Sat Jan 11 11:36:47 CET 2014
 
 warning: Could not load shared library symbols for linux-gate.so.1.
 Do you need set solib-search-path or set sysroot?
 0xb7800424 in __kernel_vsyscall ()
 #0  0xb7800424 in __kernel_vsyscall ()
 No symbol table info available.
 #1  0x083d63ff in __nanosleep_nocancel ()
 No symbol table info available.
 #2  0x0807266c in idle_sleep (nsecs=602496380195307520) at 
 arch/um/os-Linux/time.c:183
 ts = {tv_sec = 0, tv_nsec = 8436602}
 #3  0x0805fc0f in arch_cpu_idle () at arch/um/kernel/process.c:208
 No locals.
 #4  0x080a8971 in cpu_idle_loop () at kernel/cpu/idle.c:98
 No locals.
 #5  cpu_startup_entry (state=CPUHP_ONLINE) at kernel/cpu/idle.c:140
 No locals.
 #6  0x084215e9 in rest_init () at init/main.c:402
 pid = -516
 __func__ = rest_init
 #7  0x080487e1 in start_kernel () at init/main.c:656
 command_line = 0x85b8400 command_line earlyprintk 
 ubda=/home/tfoerste/virtual/uml/trinity ubdb=/mnt/ramdisk/trinity_swap 
 eth0=tuntap,tap0,72:ef:3d:9f:c3:5a mem=1025M con0=fd:0,fd:1 con=pts 
 rootfstype=ext4  root=98:0
 #8  0x08049e42 in start_kernel_proc (unused=0x0) at 
 arch/um/kernel/skas/process.c:48
 pid = -516
 __func__ = start_kernel_proc
 #9  0x0805f7cb in new_thread_handler () at arch/um/kernel/process.c:129
 fn = 0x0
 #10 0x in ?? ()
 No symbol table info available.
 
 
 
 Please note that BUG_ON was not triggered. For completeness here are the gdb 
 traces from all linux processes currently running at the host:
 
 So let's forget the 516 issue for now.
 What we no for now is that you manage to trigger a lockup within UML.
 
 Agreed, especially b/c I added this patch too :
 $ cat ~/devel/priv/uml/pid516_2.patch
 --- init/main.c_orig2014-01-12 16:43:48.585439158 +0100
 +++ init/main.c 2014-01-12 16:44:01.706438453 +0100
 @@ -389,6 +389,7 @@
 BUG_ON(pid == -516);
 rcu_read_lock();
 kthreadd_task = find_task_by_pid_ns(pid, init_pid_ns);
 +   BUG_ON(pid == -516);
 rcu_read_unlock();
 complete(kthreadd_done);
 
 and this wasn't triggered (/me wonders if the -516 is somehow garbage).
 
 But I can narrow down the problem. In an still open ssh sessions I made :
 
 $ lsof | grep t3
 bash  6129  tfoerste  cwd   DIR   98,0 4096734 
 /home/tfoerste/t3
 logger6135  tfoerste  cwd   DIR   98,0 4096734 
 /home/tfoerste/t3
 
 (t3 is the ~/t3 directory where I cd into it bewfore I run trinity.
 
 And after killing the logger command the trinity batch continues :
 
 $ ps xf -eo pid,start_time,command | grep trinity
  6412 20:48  |   \_ grep --colour=auto trinity
  6129 19:17  \_ bash -c cd ~; sudo su -c 'if [[ -d ./t3 ]]; then sudo 
 chmod -R a+rwx ./t3; sudo rm -rf ./t3; fi'; mkdir ./t3; cd ./t3; logger 
 17#-1, M=/mnt/ramdisk; if [[ -n /mnt/ramdisk ]]; then if [[ -d 
 /mnt/ramdisk/victims/v1 ]]; then sudo chmod -R a+rwx /mnt/ramdisk/victims/v1; 
 sudo rm -rf /mnt/ramdisk/victims/v1; fi; mkdir -p /mnt/ramdisk/victims/v1/v2; 
 for i in $(seq -w 0 99); do touch /mnt/ramdisk/victims/v1/v2/f$i 2/dev/null; 
 mkdir /mnt/ramdisk/victims/v1/v2/d$i 2/dev/null; done; fi;  trinity -q -N 
 1 -C 2 -x move_pages -x mremap -V /mnt/ramdisk/victims/v1/v2
  6390 20:46  \_ trinity -q -N 1 -C 2 -x move_pages -x mremap 
 -V /mnt/ramdisk/victims/v1/v2
  6391 20:46  \_ trinity -q -N 1 -C 2 -x move_pages -x 
 mremap -V /mnt/ramdisk/victims/v1/v2
  6392 20:46  \_ trinity -q -N 1 -C 2 -x move_pages -x 
 mremap -V /mnt/ramdisk/victims/v1/v2
  6408 20:47  \_ trinity -q -N 1 -C 2 -x move_pages -x 
 

Re: [uml-devel] negative pid -516 possible ?

2014-01-13 Thread Toralf Förster
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 01/13/2014 12:21 AM, Richard Weinberger wrote:
 On Sat, Jan 11, 2014 at 11:47 AM, Toralf Förster toralf.foers...@gmx.de 
 wrote:
 I do fuzz testing with trinity (latest git version) a stable 32 bit Gentoo 
 Linux user mode linux image.
 The host is a stable 32 bit vanilla 3.12.7 kernel, the guest runs latest git 
 tree + 2 patches (attached).
 
 The trinity call in the UML guest is :
 $ trinity -q -l off -N 1 -C 2 -x move_pages -x mremap -v /mnt/ramdisk
 
 After a while there's no progress on the command line seen at the host system 
 - the trinity process seems to just hangs/idling. When this does occur I 
 cannot longer ssh into the system. The system however runs furthermore. In 
 another terminal I still see the output of this command:
 
 Does it consume 100% CPU?
 
No.
It just doesnt allow new ssh connections. Existing ssh conenctinos are still 
working.

 $ ssh root@trinity tail -f /var/log/messages
 
 That's why I do know that the system does not hang completely. The output of 
 top at the host system gives me the pid of the linux exe. A gdb call gives 
 for that pid :
 
 $ date; sudo gdb /home/tfoerste/devel/linux/linux 25224 -n -batch -ex 'bt 
 full'
 Sat Jan 11 11:36:47 CET 2014
 
 warning: Could not load shared library symbols for linux-gate.so.1.
 Do you need set solib-search-path or set sysroot?
 0xb7800424 in __kernel_vsyscall ()
 #0  0xb7800424 in __kernel_vsyscall ()
 No symbol table info available.
 #1  0x083d63ff in __nanosleep_nocancel ()
 No symbol table info available.
 #2  0x0807266c in idle_sleep (nsecs=602496380195307520) at 
 arch/um/os-Linux/time.c:183
 ts = {tv_sec = 0, tv_nsec = 8436602}
 #3  0x0805fc0f in arch_cpu_idle () at arch/um/kernel/process.c:208
 No locals.
 #4  0x080a8971 in cpu_idle_loop () at kernel/cpu/idle.c:98
 No locals.
 #5  cpu_startup_entry (state=CPUHP_ONLINE) at kernel/cpu/idle.c:140
 No locals.
 #6  0x084215e9 in rest_init () at init/main.c:402
 pid = -516
 __func__ = rest_init
 #7  0x080487e1 in start_kernel () at init/main.c:656
 command_line = 0x85b8400 command_line earlyprintk 
 ubda=/home/tfoerste/virtual/uml/trinity ubdb=/mnt/ramdisk/trinity_swap 
 eth0=tuntap,tap0,72:ef:3d:9f:c3:5a mem=1025M con0=fd:0,fd:1 con=pts 
 rootfstype=ext4  root=98:0
 #8  0x08049e42 in start_kernel_proc (unused=0x0) at 
 arch/um/kernel/skas/process.c:48
 pid = -516
 __func__ = start_kernel_proc
 #9  0x0805f7cb in new_thread_handler () at arch/um/kernel/process.c:129
 fn = 0x0
 #10 0x in ?? ()
 No symbol table info available.
 
 
 
 Please note that BUG_ON was not triggered. For completeness here are the gdb 
 traces from all linux processes currently running at the host:
 
 So let's forget the 516 issue for now.
 What we no for now is that you manage to trigger a lockup within UML.
 
Agreed, especially b/c I added this patch too :
$ cat ~/devel/priv/uml/pid516_2.patch
- --- init/main.c_orig2014-01-12 16:43:48.585439158 +0100
+++ init/main.c 2014-01-12 16:44:01.706438453 +0100
@@ -389,6 +389,7 @@
BUG_ON(pid == -516);
rcu_read_lock();
kthreadd_task = find_task_by_pid_ns(pid, init_pid_ns);
+   BUG_ON(pid == -516);
rcu_read_unlock();
complete(kthreadd_done);

and this wasn't triggered (/me wonders if the -516 is somehow garbage).

But I can narrow down the problem. In an still open ssh sessions I made :

$ lsof | grep t3
bash  6129  tfoerste  cwd   DIR   98,0 4096734 
/home/tfoerste/t3
logger6135  tfoerste  cwd   DIR   98,0 4096734 
/home/tfoerste/t3

(t3 is the ~/t3 directory where I cd into it bewfore I run trinity.

And after killing the logger command the trinity batch continues :

$ ps xf -eo pid,start_time,command | grep trinity
 6412 20:48  |   \_ grep --colour=auto trinity
 6129 19:17  \_ bash -c cd ~; sudo su -c 'if [[ -d ./t3 ]]; then sudo 
chmod -R a+rwx ./t3; sudo rm -rf ./t3; fi'; mkdir ./t3; cd ./t3; logger 17#-1, 
M=/mnt/ramdisk; if [[ -n /mnt/ramdisk ]]; then if [[ -d 
/mnt/ramdisk/victims/v1 ]]; then sudo chmod -R a+rwx /mnt/ramdisk/victims/v1; 
sudo rm -rf /mnt/ramdisk/victims/v1; fi; mkdir -p /mnt/ramdisk/victims/v1/v2; 
for i in $(seq -w 0 99); do touch /mnt/ramdisk/victims/v1/v2/f$i 2/dev/null; 
mkdir /mnt/ramdisk/victims/v1/v2/d$i 2/dev/null; done; fi;  trinity -q -N 
1 -C 2 -x move_pages -x mremap -V /mnt/ramdisk/victims/v1/v2
 6390 20:46  \_ trinity -q -N 1 -C 2 -x move_pages -x mremap -V 
/mnt/ramdisk/victims/v1/v2
 6391 20:46  \_ trinity -q -N 1 -C 2 -x move_pages -x 
mremap -V /mnt/ramdisk/victims/v1/v2
 6392 20:46  \_ trinity -q -N 1 -C 2 -x move_pages -x 
mremap -V /mnt/ramdisk/victims/v1/v2
 6408 20:47  \_ trinity -q -N 1 -C 2 -x move_pages -x 
mremap -V /mnt/ramdisk/victims/v1/v2
 6410 20:48  \_ trinity -q -N 1 -C 

Re: [uml-devel] negative pid -516 possible ?

2014-01-12 Thread Richard Weinberger
On Sat, Jan 11, 2014 at 11:47 AM, Toralf Förster toralf.foers...@gmx.de wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 I do fuzz testing with trinity (latest git version) a stable 32 bit Gentoo 
 Linux user mode linux image.
 The host is a stable 32 bit vanilla 3.12.7 kernel, the guest runs latest git 
 tree + 2 patches (attached).

 The trinity call in the UML guest is :
 $ trinity -q -l off -N 1 -C 2 -x move_pages -x mremap -v /mnt/ramdisk

 After a while there's no progress on the command line seen at the host system 
 - the trinity process seems to just hangs/idling. When this does occur I 
 cannot longer ssh into the system. The system however runs furthermore. In 
 another terminal I still see the output of this command:

Does it consume 100% CPU?

 $ ssh root@trinity tail -f /var/log/messages

 That's why I do know that the system does not hang completely. The output of 
 top at the host system gives me the pid of the linux exe. A gdb call gives 
 for that pid :

 $ date; sudo gdb /home/tfoerste/devel/linux/linux 25224 -n -batch -ex 'bt 
 full'
 Sat Jan 11 11:36:47 CET 2014

 warning: Could not load shared library symbols for linux-gate.so.1.
 Do you need set solib-search-path or set sysroot?
 0xb7800424 in __kernel_vsyscall ()
 #0  0xb7800424 in __kernel_vsyscall ()
 No symbol table info available.
 #1  0x083d63ff in __nanosleep_nocancel ()
 No symbol table info available.
 #2  0x0807266c in idle_sleep (nsecs=602496380195307520) at 
 arch/um/os-Linux/time.c:183
 ts = {tv_sec = 0, tv_nsec = 8436602}
 #3  0x0805fc0f in arch_cpu_idle () at arch/um/kernel/process.c:208
 No locals.
 #4  0x080a8971 in cpu_idle_loop () at kernel/cpu/idle.c:98
 No locals.
 #5  cpu_startup_entry (state=CPUHP_ONLINE) at kernel/cpu/idle.c:140
 No locals.
 #6  0x084215e9 in rest_init () at init/main.c:402
 pid = -516
 __func__ = rest_init
 #7  0x080487e1 in start_kernel () at init/main.c:656
 command_line = 0x85b8400 command_line earlyprintk 
 ubda=/home/tfoerste/virtual/uml/trinity ubdb=/mnt/ramdisk/trinity_swap 
 eth0=tuntap,tap0,72:ef:3d:9f:c3:5a mem=1025M con0=fd:0,fd:1 con=pts 
 rootfstype=ext4  root=98:0
 #8  0x08049e42 in start_kernel_proc (unused=0x0) at 
 arch/um/kernel/skas/process.c:48
 pid = -516
 __func__ = start_kernel_proc
 #9  0x0805f7cb in new_thread_handler () at arch/um/kernel/process.c:129
 fn = 0x0
 #10 0x in ?? ()
 No symbol table info available.



 Please note that BUG_ON was not triggered. For completeness here are the gdb 
 traces from all linux processes currently running at the host:

So let's forget the 516 issue for now.
What we no for now is that you manage to trigger a lockup within UML.



 $ pgrep linux | xargs -n1 -I {} sudo gdb /home/tfoerste/devel/linux/linux {} 
 -n -batch -ex 'bt'
 warning: process 1613 is already traced by process 25224
 ptrace: Operation not permitted.
 /home/tfoerste/1613: No such file or directory.
 No stack.
 warning: process 21849 is already traced by process 25224
 ptrace: Operation not permitted.
 /home/tfoerste/21849: No such file or directory.
 No stack.

 warning: Could not load shared library symbols for linux-gate.so.1.
 Do you need set solib-search-path or set sysroot?
 0xb7800424 in __kernel_vsyscall ()
 #0  0xb7800424 in __kernel_vsyscall ()
 #1  0x083d63ff in __nanosleep_nocancel ()
 #2  0x0807266c in idle_sleep (nsecs=602496380205307520) at 
 arch/um/os-Linux/time.c:183
 #3  0x0805fc0f in arch_cpu_idle () at arch/um/kernel/process.c:208
 #4  0x080a8971 in cpu_idle_loop () at kernel/cpu/idle.c:98
 #5  cpu_startup_entry (state=CPUHP_ONLINE) at kernel/cpu/idle.c:140
 #6  0x084215e9 in rest_init () at init/main.c:402
 #7  0x080487e1 in start_kernel () at init/main.c:656
 #8  0x08049e42 in start_kernel_proc (unused=0x0) at 
 arch/um/kernel/skas/process.c:48
 #9  0x0805f7cb in new_thread_handler () at arch/um/kernel/process.c:129
 #10 0x in ?? ()

 warning: process 25231 is a cloned process

 warning: Could not load shared library symbols for linux-gate.so.1.
 Do you need set solib-search-path or set sysroot?
 0xb7800424 in __kernel_vsyscall ()
 #0  0xb7800424 in __kernel_vsyscall ()
 #1  0x083da446 in syscall ()
 #2  0x0806e861 in io_getevents (events=optimized out, ctx_id=optimized 
 out, min_nr=optimized out, nr=optimized out, timeout=optimized out) at 
 arch/um/os-Linux/aio.c:49
 #3  aio_thread (arg=0x0) at arch/um/os-Linux/aio.c:109
 #4  0x083db56e in clone ()

 warning: process 25232 is a cloned process

 warning: Could not load shared library symbols for linux-gate.so.1.
 Do you need set solib-search-path or set sysroot?
 0xb7800424 in __kernel_vsyscall ()
 #0  0xb7800424 in __kernel_vsyscall ()
 #1  0x083d82c2 in __read_nocancel ()
 #2  0x0806f3ff in read (__nbytes=optimized out, __buf=optimized out, 
 __fd=optimized out) at /usr/include/bits/unistd.h:44
 #3  os_read_file (fd=-512, buf=0xfe00, len=-512) at 
 arch/um/os-Linux/file.c:253
 #4  0x0806bafc in 

Re: [uml-devel] negative pid -516 possible ?

2014-01-11 Thread Toralf Förster
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I do fuzz testing with trinity (latest git version) a stable 32 bit Gentoo 
Linux user mode linux image.
The host is a stable 32 bit vanilla 3.12.7 kernel, the guest runs latest git 
tree + 2 patches (attached).

The trinity call in the UML guest is :
$ trinity -q -l off -N 1 -C 2 -x move_pages -x mremap -v /mnt/ramdisk

After a while there's no progress on the command line seen at the host system - 
the trinity process seems to just hangs/idling. When this does occur I cannot 
longer ssh into the system. The system however runs furthermore. In another 
terminal I still see the output of this command:

$ ssh root@trinity tail -f /var/log/messages

That's why I do know that the system does not hang completely. The output of 
top at the host system gives me the pid of the linux exe. A gdb call gives for 
that pid :

$ date; sudo gdb /home/tfoerste/devel/linux/linux 25224 -n -batch -ex 'bt full'
Sat Jan 11 11:36:47 CET 2014

warning: Could not load shared library symbols for linux-gate.so.1.
Do you need set solib-search-path or set sysroot?
0xb7800424 in __kernel_vsyscall ()
#0  0xb7800424 in __kernel_vsyscall ()
No symbol table info available.
#1  0x083d63ff in __nanosleep_nocancel ()
No symbol table info available.
#2  0x0807266c in idle_sleep (nsecs=602496380195307520) at 
arch/um/os-Linux/time.c:183
ts = {tv_sec = 0, tv_nsec = 8436602}
#3  0x0805fc0f in arch_cpu_idle () at arch/um/kernel/process.c:208
No locals.
#4  0x080a8971 in cpu_idle_loop () at kernel/cpu/idle.c:98
No locals.
#5  cpu_startup_entry (state=CPUHP_ONLINE) at kernel/cpu/idle.c:140
No locals.
#6  0x084215e9 in rest_init () at init/main.c:402
pid = -516
__func__ = rest_init
#7  0x080487e1 in start_kernel () at init/main.c:656
command_line = 0x85b8400 command_line earlyprintk 
ubda=/home/tfoerste/virtual/uml/trinity ubdb=/mnt/ramdisk/trinity_swap 
eth0=tuntap,tap0,72:ef:3d:9f:c3:5a mem=1025M con0=fd:0,fd:1 con=pts 
rootfstype=ext4  root=98:0
#8  0x08049e42 in start_kernel_proc (unused=0x0) at 
arch/um/kernel/skas/process.c:48
pid = -516
__func__ = start_kernel_proc
#9  0x0805f7cb in new_thread_handler () at arch/um/kernel/process.c:129
fn = 0x0
#10 0x in ?? ()
No symbol table info available.



Please note that BUG_ON was not triggered. For completeness here are the gdb 
traces from all linux processes currently running at the host:


$ pgrep linux | xargs -n1 -I {} sudo gdb /home/tfoerste/devel/linux/linux {} -n 
-batch -ex 'bt'  
warning: process 1613 is already traced by process 25224
   
ptrace: Operation not permitted.
   
/home/tfoerste/1613: No such file or directory. 
   
No stack.   
   
warning: process 21849 is already traced by process 25224
ptrace: Operation not permitted.
/home/tfoerste/21849: No such file or directory.
No stack.

warning: Could not load shared library symbols for linux-gate.so.1.
Do you need set solib-search-path or set sysroot?
0xb7800424 in __kernel_vsyscall ()
#0  0xb7800424 in __kernel_vsyscall ()
#1  0x083d63ff in __nanosleep_nocancel ()
#2  0x0807266c in idle_sleep (nsecs=602496380205307520) at 
arch/um/os-Linux/time.c:183
#3  0x0805fc0f in arch_cpu_idle () at arch/um/kernel/process.c:208
#4  0x080a8971 in cpu_idle_loop () at kernel/cpu/idle.c:98
#5  cpu_startup_entry (state=CPUHP_ONLINE) at kernel/cpu/idle.c:140
#6  0x084215e9 in rest_init () at init/main.c:402
#7  0x080487e1 in start_kernel () at init/main.c:656
#8  0x08049e42 in start_kernel_proc (unused=0x0) at 
arch/um/kernel/skas/process.c:48
#9  0x0805f7cb in new_thread_handler () at arch/um/kernel/process.c:129
#10 0x in ?? ()

warning: process 25231 is a cloned process

warning: Could not load shared library symbols for linux-gate.so.1.
Do you need set solib-search-path or set sysroot?
0xb7800424 in __kernel_vsyscall ()
#0  0xb7800424 in __kernel_vsyscall ()
#1  0x083da446 in syscall ()
#2  0x0806e861 in io_getevents (events=optimized out, ctx_id=optimized out, 
min_nr=optimized out, nr=optimized out, timeout=optimized out) at 
arch/um/os-Linux/aio.c:49
#3  aio_thread (arg=0x0) at arch/um/os-Linux/aio.c:109
#4  0x083db56e in clone ()

warning: process 25232 is a cloned process

warning: Could not load shared library symbols for linux-gate.so.1.
Do you need set solib-search-path or set sysroot?
0xb7800424 in __kernel_vsyscall ()
#0  0xb7800424 in __kernel_vsyscall ()
#1  0x083d82c2 in __read_nocancel ()
#2  0x0806f3ff in read (__nbytes=optimized out, __buf=optimized out, 

Re: [uml-devel] negative pid -516 possible ?

2014-01-02 Thread Richard Weinberger
On Sun, Dec 29, 2013 at 2:14 PM,  st...@nixia.no wrote:
 #6  0x08421d02 in rest_init () at init/main.c:401
 pid = -516
 #7  0x080487e1 in start_kernel () at init/main.c:655
 command_line = 0x85b6400 command_line earlyprintk
 ubda=/home/tfoerste/virtual/uml/trinity ubdb=/mnt/ramdisk/trinity_swap
 eth0=tuntap,tap0,72:ef:3d:9f:c3:5a mem=1025M con0=fd:0,fd:1 con=pts
 rootfstype=ext4  root=98:0
 #8  0x08049e09 in start_kernel_proc (unused=0x0) at
 arch/um/kernel/skas/process.c:46
 pid = -516
 #9  0x0805f7cb in new_thread_handler () at
 arch/um/kernel/process.c:129
 fn = 0x0
 #10 0x in ?? ()
 No symbol table info available.


 Is this a valid number ?
 I'm asking b/c there's no process group id 516, and -516 always
 happens in the back traces.
 And furthermore after a while the UML system does no longer serve any
 ssh login attempts.

 516 ==  -ERESTART_RESTARTBLOCK  ??

Yeah, maybe.

Toralf, where exactly comes this back trace from? gives for a guest
is not a good error description.
Did it crash and you took it from the core dump?
Did it panic() and you attached to it?
Did it hang...?
IOW don't throw random back traces to us without much details. ;-)

The number -516 is a bit odd because you see it in
arch/um/kernel/skas/process.c.
In that function it comes from os_getpid() which indicates that the
host kernel reports that number.
...very strange.

init/main.c makes a bit more sense. Maybe a kthread creation within
UML returned that internal error.

Can you try the attached debug patch?
If the BUG_ON() trigger, please show us panic from UML, not just the
gdb back trace.

-- 
Thanks,
//richard
diff --git a/arch/um/kernel/skas/process.c b/arch/um/kernel/skas/process.c
index 4da11b3..71a5828 100644
--- a/arch/um/kernel/skas/process.c
+++ b/arch/um/kernel/skas/process.c
@@ -38,6 +38,8 @@ static int __init start_kernel_proc(void *unused)
block_signals();
pid = os_getpid();
 
+   BUG_ON(pid == -516);
+
cpu_tasks[0].pid = pid;
cpu_tasks[0].task = current;
 #ifdef CONFIG_SMP
diff --git a/init/main.c b/init/main.c
index febc511..9ad68ab 100644
--- a/init/main.c
+++ b/init/main.c
@@ -386,6 +386,7 @@ static noinline void __init_refok rest_init(void)
kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND);
numa_default_policy();
pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);
+   BUG_ON(pid == -516);
rcu_read_lock();
kthreadd_task = find_task_by_pid_ns(pid, init_pid_ns);
rcu_read_unlock();
--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


Re: [uml-devel] negative pid -516 possible ?

2013-12-29 Thread Toralf Förster
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 12/21/2013 03:36 PM, Toralf Förster wrote:
 Trinity'ing a 32 bit linux user mode linux (still the raid x tree issue ) 
 gives for a guest :
 
 tfoerste@n22 ~ $ date; sudo gdb /home/tfoerste/devel/linux/linux 10044 -n 
 -batch -ex 'bt full'
 Sat Dec 21 15:33:03 CET 2013
 0xb7710424 in __kernel_vsyscall ()
 #0  0xb7710424 in __kernel_vsyscall ()
 No symbol table info available.
 #1  0x083d5d2f in __nanosleep_nocancel ()
 No symbol table info available.
 #2  0x0807267c in idle_sleep (nsecs=602496466104653440) at 
 arch/um/os-Linux/time.c:183
 ts = {tv_sec = 0, tv_nsec = 6471789}
 #3  0x0805fc0f in arch_cpu_idle () at arch/um/kernel/process.c:208
 No locals.
 #4  0x080a8981 in cpu_idle_loop () at kernel/cpu/idle.c:98
 No locals.
 #5  cpu_startup_entry (state=CPUHP_ONLINE) at kernel/cpu/idle.c:140
 No locals.
 #6  0x08421d02 in rest_init () at init/main.c:401
 pid = -516
 #7  0x080487e1 in start_kernel () at init/main.c:655
 command_line = 0x85b6400 command_line earlyprintk 
 ubda=/home/tfoerste/virtual/uml/trinity ubdb=/mnt/ramdisk/trinity_swap 
 eth0=tuntap,tap0,72:ef:3d:9f:c3:5a mem=1025M con0=fd:0,fd:1 con=pts 
 rootfstype=ext4  root=98:0
 #8  0x08049e09 in start_kernel_proc (unused=0x0) at 
 arch/um/kernel/skas/process.c:46
 pid = -516
 #9  0x0805f7cb in new_thread_handler () at arch/um/kernel/process.c:129
 fn = 0x0
 #10 0x in ?? ()
 No symbol table info available.
 
 
 Is this a valid number ?
 
 

I'm asking b/c there's no process group id 516, and -516 always happens in the 
back traces.
And furthermore after a while the UML system does no longer serve any ssh login 
attempts.

- -- 
MfG/Sincerely
Toralf Förster
pgp finger print:1A37 6F99 4A9D 026F 13E2 4DCF C4EA CDDE 0076 E94E
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iF4EAREIAAYFAlLAGzwACgkQxOrN3gB26U4GaAD+J/AW3LTgeooTehy4vIw1QQO4
o1m6w/3Isy4JhVE/GBQA/AqqqNeuLRJsXrG0i3NpRiD9IpAiXbzieDaFQFOncGe5
=7Bs6
-END PGP SIGNATURE-

--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
___
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


Re: [uml-devel] negative pid -516 possible ?

2013-12-29 Thread stian
 #6  0x08421d02 in rest_init () at init/main.c:401
 pid = -516
 #7  0x080487e1 in start_kernel () at init/main.c:655
 command_line = 0x85b6400 command_line earlyprintk 
 ubda=/home/tfoerste/virtual/uml/trinity ubdb=/mnt/ramdisk/trinity_swap 
 eth0=tuntap,tap0,72:ef:3d:9f:c3:5a mem=1025M con0=fd:0,fd:1 con=pts 
 rootfstype=ext4  root=98:0
 #8  0x08049e09 in start_kernel_proc (unused=0x0) at 
 arch/um/kernel/skas/process.c:46
 pid = -516
 #9  0x0805f7cb in new_thread_handler () at 
 arch/um/kernel/process.c:129
 fn = 0x0
 #10 0x in ?? ()
 No symbol table info available.


 Is this a valid number ?
 I'm asking b/c there's no process group id 516, and -516 always
 happens in the back traces.
 And furthermore after a while the UML system does no longer serve any
 ssh login attempts.

516 ==  -ERESTART_RESTARTBLOCK  ??

Stian

--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
___
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel