[ kvm-Bugs-2042889 ] guest: device offline, then kernel panic

SourceForge.net Tue, 30 Nov 2010 03:39:29 -0800

Bugs item #2042889, was opened at 2008-08-08 13:16
Message generated for change (Settings changed) made by jessorensen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2042889&group_id=180599


Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Duplicate
Priority: 5
Private: No
Submitted By: Rafal Wijata (ravpl)
Assigned to: Nobody/Anonymous (nobody)
Summary: guest: device offline, then kernel panic

Initial Comment:
host: kvm71, 64bit 2.6.18-92.1.6.el5, 16Gram, 2*X5450(8cores)
guest: 64bit 2.6.18-92.1.6.el5, 3.5Gram, 2cpus, 5hdds on raw partitions(!).

In the guest, i'm getting quite often messages like
kernel: sd 0:0:0:0: ABORT operation started.
kernel: sd 0:0:0:0: ABORT operation timed-out.
[many times like that]
[there was more messages concerning the device is offline, but I lost them, 
will update if it happens again]
then filesystem gets remounted read-only, then kernel panics with message(part 
of the message only, that's what i got on the screen)
FS:  0000000000000000(0000) GS:ffffffff8039f000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000400000013 CR3: 0000000000201000 CR4: 00000000000006e0
Process sshd (pid: 23911, threadinfo ffff81006f53a000, task ffff8100dc2ca0c0)
Stack:
    ffffffff800075dc
    ffff8100dc1ba960
    ffff8100dc1ba688
    ffff810096b52300
    ffff8100dd15acc0
    ffff8100dc1ba758
    ffff8100dc1ba758
    ffff810003f2a680
    ffffffff8000d11c
    0000000000000008
    0000000000000008
    ffff8100dd15acc0
Call Trace:
    [<ffffffff800075dc>] kmem_cache_free+0x13c/0x1dd
    [<ffffffff8000d11c>] dput+0xf6/0x114
    [<ffffffff800125f3>] __fput+0x16c/0x198
    [<ffffffff8001a6a7>] remove_vma+0x3d/0x64
    [<ffffffff80039c60>] exit_mmap+0xcf/0xf3
    [<ffffffff8003bd73>] mmput+0x30/0x83
    [<ffffffff800151b6>] do_exit+0x28b/0x8d0
    [<ffffffff80048a1c>] cpuset_exit+0x0/0x6c
    [<ffffffff8005d28d>] tracesys+0xd5/0xe0
                      Code: f0 ff 0f 0f 88 6c 01 00 00 c3 f0 81 2f 00 00 00 01 
74 05 e8
RIP  [<ffffffff80064a2d>] _spin_lock+0x0/0xa
RSP <ffff81006f53be10>
CR2: 0000000400000013
<0>Kernel panic - not syncing: Fatal exception 

Even though the kernel panic, the kvm process was still taking 100% CPU. gdb 
shows following info - no clue though if it's helpful in any way.

Thread 4 (Thread 1938626880 (LWP 17006)):
#0  0x000000368bec6fa7 in ioctl () from /lib64/libc.so.6
#1  0x000000000050f726 in kvm_run (kvm=0x11b15010, vcpu=0) at libkvm.c:903
#2  0x00000000004e9426 in kvm_cpu_exec (env=<value optimized out>) at 
/usr/src/kvm-71/qemu/qemu-kvm.c:218
#3  0x00000000004e9700 in ap_main_loop (_env=<value optimized out>) at 
/usr/src/kvm-71/qemu/qemu-kvm.c:407
#4  0x000000368ca062e7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000368bece3bd in clone () from /lib64/libc.so.6

Thread 3 (Thread 1087498560 (LWP 17007)):
#0  0x000000368bec6fa7 in ioctl () from /lib64/libc.so.6
#1  0x000000000050f726 in kvm_run (kvm=0x11b15010, vcpu=1) at libkvm.c:903
#2  0x00000000004e9426 in kvm_cpu_exec (env=<value optimized out>) at 
/usr/src/kvm-71/qemu/qemu-kvm.c:218
#3  0x00000000004e9700 in ap_main_loop (_env=<value optimized out>) at 
/usr/src/kvm-71/qemu/qemu-kvm.c:407
#4  0x000000368ca062e7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000368bece3bd in clone () from /lib64/libc.so.6

Thread 2 (Thread 1949133120 (LWP 17014)):
#0  0x000000368ca0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x0000003692202ee5 in handle_fildes_io () from /lib64/librt.so.1
#2  0x000000368ca062e7 in start_thread () from /lib64/libpthread.so.0
#3  0x000000368bece3bd in clone () from /lib64/libc.so.6

Thread 1 (Thread 47523282295136 (LWP 16990)):
#0  0x000000368bec7922 in select () from /lib64/libc.so.6
#1  0x00000000004094b2 in main_loop_wait (timeout=<value optimized out>) at 
/usr/src/kvm-71/qemu/vl.c:7545
#2  0x00000000004e9342 in kvm_main_loop () at 
/usr/src/kvm-71/qemu/qemu-kvm.c:587
#3  0x0000000000411662 in main (argc=20, argv=0x7fffca7a9b38) at 
/usr/src/kvm-71/qemu/vl.c:7705
#0  0x000000368bec7922 in select () from /lib64/libc.so.6


----------------------------------------------------------------------

>Comment By: Jes Sorensen (jessorensen)
Date: 2010-11-30 12:39

Message:
This looks like a duplicate of https://bugs.launchpad.net/qemu/+bug/587993

If you can reproduce this problem, it would be great if you can add the
info to the bug in launchpad.

Thanks,
Jes


----------------------------------------------------------------------

Comment By: Rafal Wijata (ravpl)
Date: 2008-08-13 10:53

Message:
Logged In: YES 
user_id=996150
Originator: YES

Another crash with guest bt, please advise how to debug?

R13: ffff8100dd107000 R14: ffffffff80077090 R15: ffffffff80418e80
FS:  0000000000000000(0000) GS:ffffffff8039f000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002aec1f42e000 CR3: 00000000d49e4000 CR4: 00000000000006e0

Call Trace:
<IRQ>  [<ffffffff8003eadd>] dev_watchdog+0x98/0xc0
[<ffffffff800953c2>] run_timer_softirq+0x133/0x1af
[<ffffffff80011ed2>] __do_softirq+0x5e/0xd6
[<ffffffff8005e2fc>] call_softirq+0x1c/0x28
[<ffffffff8006c6e4>] do_softirq+0x2c/0x85
[<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
<EOI>  [<ffffffff800d403a>] drain_array+0x28/0xc0
[<ffffffff800d4aea>] cache_reap+0x0/0x219
[<ffffffff800d4b8f>] cache_reap+0xa5/0x219
[<ffffffff8004cea9>] run_workqueue+0x94/0xe4
[<ffffffff800497be>] worker_thread+0x0/0x122
[<ffffffff800498ae>] worker_thread+0xf0/0x122
[<ffffffff8008ad76>] default_wake_function+0x0/0xe
[<ffffffff8003253d>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff8003243f>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11

----------------------------------------------------------------------

Comment By: Rafal Wijata (ravpl)
Date: 2008-08-13 08:40

Message:
Logged In: YES 
user_id=996150
Originator: YES

[guest] And finally the device gets offline
[guest] sd 0:0:0:0: rejecting I/O to offline device

Is it possible, that those problems come from the fact, that I have
configured raw devices as kvm disks? Eg:
-drive media=disk,if=scsi,boot=on,file=/dev/sdb2 -drive
media=disk,if=scsi,boot=off,file=/dev/sdc2 ...

----------------------------------------------------------------------

Comment By: Rafal Wijata (ravpl)
Date: 2008-08-11 15:26

Message:
Logged In: YES 
user_id=996150
Originator: YES

Update_1:
while guest was panicking, I was able to see SEGV for it's host's qemu
process. No core file though. I'll try next time
happened on kvm72 as well.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2042889&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2042889 ] guest: device offline, then kernel panic

Reply via email to