Bugs item #1895893, was opened at 2008-02-18 01:44
Message generated for change (Comment added) made by alfmel
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1895893&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Technologov (technologov)
Assigned to: Nobody/Anonymous (nobody)
Summary: KVM-60+ halts, when using SCSI

Initial Comment:
Host: Intel CPU, F7/x64, KVM-60+ from git (userspace: kvm-60-155-g4422f97, 
kernelspace: kvm-60-10207-g9ef1f35)

When installing Windows XP guest on emulated SCSI disk, KVM lock ups.

The Command sent to Qemu/KVM: 

/usr/local/bin/qemu-system-x86_64 -drive 
file=/vm/WindowsXP.qcow2,if=scsi,boot=on -m 128 -monitor 
tcp:localhost:4503,server,nowait -cdrom 
/isos/windows/WindowsXP-SP2-Home-Pro-Tablet.iso -boot d -name WindowsXP

Reproducible: Sometimes.

Symptons:
-The image during XP setup looks halted/locked, and no progress over 12 hours.
-kvm_stat shows zero KVM activity.
-Host CPU is 100% busy.
-Qemu doesn't responds to any commands (such as alt+f2).

GNU Debugger shows:
(gdb) bt
#0  lsi_execute_script (s=0x2bed030) at ../cpu-all.h:848
#1  0x000000000048a2e9 in qcow_aio_write_cb (opaque=0x2c8a050, ret=0)
    at block-qcow2.c:947
#2  0x000000000041898f in qemu_aio_poll ()
    at /root/git/kvm/qemu/block-raw-posix.c:318
#3  0x000000000040de3c in main_loop_wait (timeout=0)
    at /root/git/kvm/qemu/vl.c:7822
#4  0x00000000004fd81d in kvm_eat_signals (env=0x2b52400, timeout=0)
    at /root/git/kvm/qemu/qemu-kvm.c:204
#5  0x00000000004fd859 in kvm_main_loop_wait (env=0x2b52400, timeout=0)
    at /root/git/kvm/qemu/qemu-kvm.c:211
#6  0x00000000004fe0a6 in kvm_main_loop_cpu (env=0x2b52400)
    at /root/git/kvm/qemu/qemu-kvm.c:309
#7  0x0000000000410e3d in main (argc=<value optimized out>,
    argv=0x7fff06235728) at /root/git/kvm/qemu/vl.c:7856
====================================================
Dmesg shows:

apic write: bad size=1 fee00030
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 0
apic write: bad size=1 fee00030
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 0

...looping forever.

-Alexey "Technologov", 18.02.2008.

----------------------------------------------------------------------

Comment By: Alf Mel (alfmel)
Date: 2008-04-28 14:11

Message:
Logged In: YES 
user_id=1865908
Originator: NO

OK.  I've applied the matley patch and your debug patch to KVM 66.  I've
also been able to reproduce the problem on a raw SCSI disk while installing
Windows 2003.  You can find the log at:

http://mel.byu.edu/kvm-scsi-debug.log.bz2

----------------------------------------------------------------------

Comment By: Marcelo Tosatti (mtosatti)
Date: 2008-04-26 17:45

Message:
Logged In: YES 
user_id=2022487
Originator: NO

Alexey, Alberto,

I'm unable to reproduce the problem with the Linux driver.

The Windows SCSI SCRIPTS is different so that might the reason. The 
state machine is relatively complex depending on this SCRIPTS code.

Please try the following:

1 - Attempt to reproduce the problem with raw disk instead of qcow2.
2 - Apply matley's patch below, and on top of that, this debug patch:
http://people.redhat.com/~mtosatti/lsi-debug-crash.patch

And then run qemu-kvm as usual, but redirect stderr output to a file:

# qemu-kvm options 2> log-scsi-crash.txt

Once the crash happens, there should be a pattern that repeats in this
output. 
With that information its easier to understand what is going on.

Thanks.


----------------------------------------------------------------------

Comment By: Alf Mel (alfmel)
Date: 2008-04-11 16:46

Message:
Logged In: YES 
user_id=1865908
Originator: NO

I've confirmed the problem with KVM-65 as well.  I applied the patch but
it didn't work; I still experienced lockups.  I am trying to install
Windows Server 2003 on a SCSI disk and the installation keeps locking up on
different parts of the file copy process.  I'm using qcow2 disk format.  I
tried using raw format and it would lock up consistently when formatting
the disk.  I have tried installing W2K3 at least a dozen times with the
same lockups.  As part of my configuration, I move the monitor to run on a
telnet server.  When the lockup occurs, I can't connect to the monitor via
telnet.

I am also experiencing boot problems with Grub on SCSI disks.  I reported
the problem on the mailing list:

http://article.gmane.org/gmane.comp.emulators.kvm.devel/15884

I don't know if the problems are related.

----------------------------------------------------------------------

Comment By: lanconnected (lanconnected)
Date: 2008-04-08 10:17

Message:
Logged In: YES 
user_id=2041746
Originator: NO

Applied proposed patch on kvm-65. Windows XP Pro can be installed on scsi
disk and boots up, but hangs unpredictably during disk activity. SDL
windows can't be closed, kvm can only be killed with kill -9.

----------------------------------------------------------------------

Comment By: Matteo Frigo (matley)
Date: 2008-03-30 06:58

Message:
Logged In: YES 
user_id=35769
Originator: NO

The bug seems to have nothing to do with Windows.  You can reproduce the
bug
in kvm-63 and kvm-64 by creating an empty qcow2 scsi disk and running
``dd if=/dev/sda of=/dev/null bs=1M'' in linux.

The patch below seems to fix the problem (at least with linux, I haven't
tried Windows).  If I understand the AIO layer correctly,
scsi_read_data()
and scsi_write_data() can be called again before the bdrv_aio_read
call returns.  If this happens, the original code reissues the same
request twice, which is incorrect.  The patch increments the read/writer
counters before invoking the AIO layer.

diff -aur kvm-64.old/qemu/hw/scsi-disk.c kvm-64.new/qemu/hw/scsi-disk.c
--- kvm-64.old/qemu/hw/scsi-disk.c      2008-03-26 08:49:35.000000000 -0400
+++ kvm-64.new/qemu/hw/scsi-disk.c      2008-03-30 08:37:25.000000000 -0400
@@ -196,12 +196,12 @@
         n = SCSI_DMA_BUF_SIZE / 512;
 
     r->buf_len = n * 512;
-    r->aiocb = bdrv_aio_read(s->bdrv, r->sector, r->dma_buf, n,
+    r->sector += n;
+    r->sector_count -= n;
+    r->aiocb = bdrv_aio_read(s->bdrv, r->sector - n, r->dma_buf, n,
                              scsi_read_complete, r);
     if (r->aiocb == NULL)
         scsi_command_complete(r, SENSE_HARDWARE_ERROR);
-    r->sector += n;
-    r->sector_count -= n;
 }
 
 static void scsi_write_complete(void * opaque, int ret)
@@ -248,12 +248,12 @@
         BADF("Data transfer already in progress\n");
     n = r->buf_len / 512;
     if (n) {
-        r->aiocb = bdrv_aio_write(s->bdrv, r->sector, r->dma_buf, n,
+        r->sector += n;
+        r->sector_count -= n;
+        r->aiocb = bdrv_aio_write(s->bdrv, r->sector - n, r->dma_buf, n,
                                   scsi_write_complete, r);
         if (r->aiocb == NULL)
             scsi_command_complete(r, SENSE_HARDWARE_ERROR);
-        r->sector += n;
-        r->sector_count -= n;
     } else {
         /* Invoke completion routine to fetch data from host.  */
         scsi_write_complete(r, 0);


----------------------------------------------------------------------

Comment By: lanconnected (lanconnected)
Date: 2008-03-20 13:23

Message:
Logged In: YES 
user_id=2041746
Originator: NO

Can confirm it on kvm-63, 100% reproducible, same symptoms. System can be
installed and always boots in safe mode, but never boots in normal mode.
ACPI/noACPI settings have no influance.

----------------------------------------------------------------------

Comment By: Technologov (technologov)
Date: 2008-02-18 03:21

Message:
Logged In: YES 
user_id=1839746
Originator: YES

ps axu:
alexeye  21429 84.2  4.1 296740 166712 pts/4   Rl+  04:40  16:22
/usr/local/bin/qemu-system-x86_64 -drive
file=/vm/WindowsXP.qcow2,if=scsi,boot=on -m 128 -monitor
tcp:localhost:4503,server,nowait -cdrom
/isos/windows/WindowsXP-SP2-Home-Pro-Tablet.iso -boot c -name
WindowsXP-SCSI-manual -no-kvm

Another symptom I forgot to mention:
Qemu (both KVM and -no-kvm) cannot be killed by pressing "X" on the SDL
window, only by doing ctrl+C on the console.

Anyone knows what "Rl+" means in the "ps" command output?

-Alexey "Technologov", 18.02.2008.

----------------------------------------------------------------------

Comment By: Technologov (technologov)
Date: 2008-02-18 03:18

Message:
Logged In: YES 
user_id=1839746
Originator: YES

Well, the same problem is reproducible with Qemu (-no-kvm):

Same symptoms.

(gdb) bt
#0  0x000000000048ea9d in cpu_physical_memory_rw (addr=72552,
    buf=0x7fff26397b70 "???\200", len=4, is_write=0)
    at /root/git/kvm/qemu/exec.c:2682
#1  0x000000000041b0db in lsi_execute_script (s=0x2bed030) at
../cpu-all.h:848
#2  0x000000000048a2e9 in qcow_aio_write_cb (opaque=0x2bcefa0, ret=0)
    at block-qcow2.c:947
#3  0x000000000041898f in qemu_aio_poll ()
    at /root/git/kvm/qemu/block-raw-posix.c:318
#4  0x000000000040de3c in main_loop_wait (timeout=10)
    at /root/git/kvm/qemu/vl.c:7822
#5  0x0000000000410d97 in main (argc=<value optimized out>,
    argv=0x7fff2639c858) at /root/git/kvm/qemu/vl.c:7926

-Alexey "Technologov", 18.02.2008.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1895893&group_id=180599

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to