Re: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration

2015-01-28 Thread Mikhail Sennikovskii

Hi Jidong,

right, this issue is SMP-specific.

Mikhail

On 27.01.2015 20:09, Jidong Xiao wrote:

On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii
 wrote:

Hi all,

I've posted the bolow mail to the qemu-dev mailing list, but I've got no
response there.
That's why I decided to re-post it here as well, and besides that I think
this could be a kvm-specific issue as well.

Some additional thing to note:
I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as
well.
I would typically use a max_downtime adjusted to 1 second instead of default
30 ms.
I also noticed that the issue happens much more rarelly if I increase the
migration bandwidth, i.e. like

diff --git a/migration.c b/migration.c
index 26f4b65..d2e3b39 100644
--- a/migration.c
+++ b/migration.c
@@ -36,7 +36,7 @@ enum {
  MIG_STATE_COMPLETED,
  };

-#define MAX_THROTTLE  (32 << 20)  /* Migration speed throttling */
+#define MAX_THROTTLE  (90 << 20)  /* Migration speed throttling */

Like I said below, I would be glad to provide you with any additional
information.

Thanks,
Mikhail


Hi, Mikhail,

So if you choose to use one vcpu, instead of smp, this issue would not
happen, right?

-Jidong


On 23.01.2015 15:03, Mikhail Sennikovskii wrote:

Hi all,

I'm running a slitely modified migration over tcp test in virt-test, which
does a migration from one "smp=2" VM to another on the same host over TCP,
and exposes some dummy CPU load inside the GUEST while migration, and
after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD
inside the guest,
which happens when
"
An expected clock interrupt was not received on a secondary processor in
an
MP system within the allocated interval. This indicates that the specified
processor is hung and not processing interrupts.
"

This seems to happen with any qemu version I've tested (1.2 and above,
including upstream),
and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1
LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6
host.

One thing I noticed is that exposing a dummy CPU load on the HOST (like
running multiple instances of the "while true; do false; done" script) in
parallel with doing migration makes the issue to be quite easily
reproducible.


Looking inside the windows crash dump, the second CPU is just running at
IRQL 0, and it aparently not hung, as Windows is able to save its state in
the crash dump correctly, which assumes running some code on it.
So this aparently seems to be some timing issue (like host scheduler does
not schedule the thread executing secondary CPU's code in time).

Could you give me some insight on this, i.e. is there a way to customize
QEMU/KVM to avoid such issue?

If you think this might be a qemu/kvm issue, I can provide you any info,
like windows crash dumps, or the test-case to reproduce this.


qemu is started as:

from-VM:

qemu-system-x86_64 \
 -S  \
 -name 'virt-tests-vm1'  \
 -sandbox off  \
 -M pc-1.0  \
 -nodefaults  \
 -vga std  \
 -chardev
socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait
\
 -mon chardev=qmp_id_qmp1,mode=control  \
 -chardev
socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait
\
 -device isa-serial,chardev=serial_id_serial0  \
 -chardev
socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait
\
 -device
isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \
 -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
 -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
 -device
virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
 -device
virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05
\
 -netdev
user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023  \
 -m 2G  \
 -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
 -cpu phenom \
 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
 -vnc :0  \
 -rtc base=localtime,clock=host,driftfix=none  \
 -boot order=cdn,once=c,menu=off \
 -enable-kvm

to-VM:

qemu-system-x86_64 \
 -S  \
 -name 'virt-tests-vm1'  \
 -sandbox off  \
 -M pc-1.0  \
 -nodefaults  \
 -vga std  \
 -chardev
socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait
\
 -mon chardev=qmp_id_qmp1,mode=control  \
 -chardev
socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait
\
 -device isa-serial,chardev=serial_id_serial0  \
 -chardev
socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait
\
 -device
isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 \
 -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
 -drive id=drive_image1,if=none,file=/path/to/image.qco

Re: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration

2015-01-28 Thread Mikhail Sennikovskii

Hi Zhang,

Thanks a lot for the suggestion, it indeed worked for me!
I.e. after adding the hv_relaxed to the list of CPU properties I can no 
longer reproduce the BSOD on migration with any kernel version that I 
used so far.


Thanks for your help,
Mikhail

On 28.01.2015 07:42, Zhang Haoyu wrote:

On 2015-01-28 03:10:23, Jidong Xiao wrote:

On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii
 wrote:

Hi all,

I've posted the bolow mail to the qemu-dev mailing list, but I've got no
response there.
That's why I decided to re-post it here as well, and besides that I think
this could be a kvm-specific issue as well.

Some additional thing to note:
I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as
well.
I would typically use a max_downtime adjusted to 1 second instead of default
30 ms.
I also noticed that the issue happens much more rarelly if I increase the
migration bandwidth, i.e. like

diff --git a/migration.c b/migration.c
index 26f4b65..d2e3b39 100644
--- a/migration.c
+++ b/migration.c
@@ -36,7 +36,7 @@ enum {
  MIG_STATE_COMPLETED,
  };

-#define MAX_THROTTLE  (32 << 20)  /* Migration speed throttling */
+#define MAX_THROTTLE  (90 << 20)  /* Migration speed throttling */

Like I said below, I would be glad to provide you with any additional
information.

Thanks,
Mikhail


Hi, Mikhail,

So if you choose to use one vcpu, instead of smp, this issue would not
happen, right?


I think you can try cpu feature hv_relaxed, like
-cpu Haswell,hv_relaxed


-Jidong


On 23.01.2015 15:03, Mikhail Sennikovskii wrote:

Hi all,

I'm running a slitely modified migration over tcp test in virt-test, which
does a migration from one "smp=2" VM to another on the same host over TCP,
and exposes some dummy CPU load inside the GUEST while migration, and
after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD
inside the guest,
which happens when
"
An expected clock interrupt was not received on a secondary processor in
an
MP system within the allocated interval. This indicates that the specified
processor is hung and not processing interrupts.
"

This seems to happen with any qemu version I've tested (1.2 and above,
including upstream),
and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1
LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6
host.

One thing I noticed is that exposing a dummy CPU load on the HOST (like
running multiple instances of the "while true; do false; done" script) in
parallel with doing migration makes the issue to be quite easily
reproducible.


Looking inside the windows crash dump, the second CPU is just running at
IRQL 0, and it aparently not hung, as Windows is able to save its state in
the crash dump correctly, which assumes running some code on it.
So this aparently seems to be some timing issue (like host scheduler does
not schedule the thread executing secondary CPU's code in time).

Could you give me some insight on this, i.e. is there a way to customize
QEMU/KVM to avoid such issue?

If you think this might be a qemu/kvm issue, I can provide you any info,
like windows crash dumps, or the test-case to reproduce this.


qemu is started as:

from-VM:

qemu-system-x86_64 \
 -S  \
 -name 'virt-tests-vm1'  \
 -sandbox off  \
 -M pc-1.0  \
 -nodefaults  \
 -vga std  \
 -chardev
socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait
\
 -mon chardev=qmp_id_qmp1,mode=control  \
 -chardev
socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait
\
 -device isa-serial,chardev=serial_id_serial0  \
 -chardev
socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait
\
 -device
isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \
 -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
 -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
 -device
virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
 -device
virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05
\
 -netdev
user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023  \
 -m 2G  \
 -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
 -cpu phenom \
 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
 -vnc :0  \
 -rtc base=localtime,clock=host,driftfix=none  \
 -boot order=cdn,once=c,menu=off \
 -enable-kvm

to-VM:

qemu-system-x86_64 \
 -S  \
 -name 'virt-tests-vm1'  \
 -sandbox off  \
 -M pc-1.0  \
 -nodefaults  \
 -vga std  \
 -chardev
socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait
\
 -mon chardev=qmp_id_qmp1,mode=control  \
 -chardev
socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait
\
 -device isa-serial,chardev=serial_id_serial0  \
   

Re: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration

2015-01-27 Thread Zhang Haoyu

On 2015-01-28 03:10:23, Jidong Xiao wrote:
> On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii
>  wrote:
> > Hi all,
>>
> > I've posted the bolow mail to the qemu-dev mailing list, but I've got no
> > response there.
> > That's why I decided to re-post it here as well, and besides that I think
> > this could be a kvm-specific issue as well.
> >
> > Some additional thing to note:
> > I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as
> > well.
> > I would typically use a max_downtime adjusted to 1 second instead of default
> > 30 ms.
> > I also noticed that the issue happens much more rarelly if I increase the
> > migration bandwidth, i.e. like
> >
> > diff --git a/migration.c b/migration.c
> > index 26f4b65..d2e3b39 100644
>> --- a/migration.c
> > +++ b/migration.c
> > @@ -36,7 +36,7 @@ enum {
> >  MIG_STATE_COMPLETED,
> >  };
> >
> > -#define MAX_THROTTLE  (32 << 20)  /* Migration speed throttling */
> > +#define MAX_THROTTLE  (90 << 20)  /* Migration speed throttling */
> >
> > Like I said below, I would be glad to provide you with any additional
> > information.
> >
> > Thanks,
> > Mikhail
> >
> Hi, Mikhail,
>
> So if you choose to use one vcpu, instead of smp, this issue would not
> happen, right?
> 
I think you can try cpu feature hv_relaxed, like
-cpu Haswell,hv_relaxed

> -Jidong
> 
> > On 23.01.2015 15:03, Mikhail Sennikovskii wrote:
> >>
> >> Hi all,
> >>
> >> I'm running a slitely modified migration over tcp test in virt-test, which
> >> does a migration from one "smp=2" VM to another on the same host over TCP,
> >> and exposes some dummy CPU load inside the GUEST while migration, and
> >> after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD
> >> inside the guest,
> >> which happens when
>>> "
> >> An expected clock interrupt was not received on a secondary processor in
> >> an
> >> MP system within the allocated interval. This indicates that the specified
> >> processor is hung and not processing interrupts.
> >> "
> >>
> >> This seems to happen with any qemu version I've tested (1.2 and above,
> >> including upstream),
> >> and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1
> >> LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6
> >> host.
> >>
> >> One thing I noticed is that exposing a dummy CPU load on the HOST (like
> >> running multiple instances of the "while true; do false; done" script) in
> >> parallel with doing migration makes the issue to be quite easily
>>> reproducible.
> >>
> >>
> >> Looking inside the windows crash dump, the second CPU is just running at
> >> IRQL 0, and it aparently not hung, as Windows is able to save its state in
> >> the crash dump correctly, which assumes running some code on it.
> >> So this aparently seems to be some timing issue (like host scheduler does
> >> not schedule the thread executing secondary CPU's code in time).
> >>
> >> Could you give me some insight on this, i.e. is there a way to customize
> >> QEMU/KVM to avoid such issue?
> >>
> >> If you think this might be a qemu/kvm issue, I can provide you any info,
> >> like windows crash dumps, or the test-case to reproduce this.
> >>
> >>
>>> qemu is started as:
> >>
> >> from-VM:
> >>
> >> qemu-system-x86_64 \
> >> -S  \
> >> -name 'virt-tests-vm1'  \
> >> -sandbox off  \
> >> -M pc-1.0  \
> >> -nodefaults  \
> >> -vga std  \
> >> -chardev
> >> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait
> >> \
> >> -mon chardev=qmp_id_qmp1,mode=control  \
> >> -chardev
>>> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait
> >> \
> >> -device isa-serial,chardev=serial_id_serial0  \
> >> -chardev
> >> socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait
> >> \
> >> -device
> >> isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \
> >> -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
> >> -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
> >> -device
> >> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
> >> -device
> >> virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05
> >> \
> >> -netdev
>>> user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023  \
> >> -m 2G  \
> >> -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
> >> -cpu phenom \
> >> -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
> >> -vnc :0  \
> >> -rtc base=localtime,clock=host,driftfix=none  \
> >> -boot order=cdn,once=c,menu=off \
> >> -enable-kvm
> >>
> >> to-VM:
> >>
> >> qemu-system-x86_64 \
> >> -S  \
> >> -name 'virt-tests-vm1'  \
> >> -sandbox off  \
>>> -M pc-1.0  \
> >> -nodefaults  \
> >> -vga std  \
> >> -chardev
> >> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-2

Re: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration

2015-01-27 Thread Jidong Xiao
On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii
 wrote:
> Hi all,
>
> I've posted the bolow mail to the qemu-dev mailing list, but I've got no
> response there.
> That's why I decided to re-post it here as well, and besides that I think
> this could be a kvm-specific issue as well.
>
> Some additional thing to note:
> I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as
> well.
> I would typically use a max_downtime adjusted to 1 second instead of default
> 30 ms.
> I also noticed that the issue happens much more rarelly if I increase the
> migration bandwidth, i.e. like
>
> diff --git a/migration.c b/migration.c
> index 26f4b65..d2e3b39 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -36,7 +36,7 @@ enum {
>  MIG_STATE_COMPLETED,
>  };
>
> -#define MAX_THROTTLE  (32 << 20)  /* Migration speed throttling */
> +#define MAX_THROTTLE  (90 << 20)  /* Migration speed throttling */
>
> Like I said below, I would be glad to provide you with any additional
> information.
>
> Thanks,
> Mikhail
>
Hi, Mikhail,

So if you choose to use one vcpu, instead of smp, this issue would not
happen, right?

-Jidong

> On 23.01.2015 15:03, Mikhail Sennikovskii wrote:
>>
>> Hi all,
>>
>> I'm running a slitely modified migration over tcp test in virt-test, which
>> does a migration from one "smp=2" VM to another on the same host over TCP,
>> and exposes some dummy CPU load inside the GUEST while migration, and
>> after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD
>> inside the guest,
>> which happens when
>> "
>> An expected clock interrupt was not received on a secondary processor in
>> an
>> MP system within the allocated interval. This indicates that the specified
>> processor is hung and not processing interrupts.
>> "
>>
>> This seems to happen with any qemu version I've tested (1.2 and above,
>> including upstream),
>> and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1
>> LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6
>> host.
>>
>> One thing I noticed is that exposing a dummy CPU load on the HOST (like
>> running multiple instances of the "while true; do false; done" script) in
>> parallel with doing migration makes the issue to be quite easily
>> reproducible.
>>
>>
>> Looking inside the windows crash dump, the second CPU is just running at
>> IRQL 0, and it aparently not hung, as Windows is able to save its state in
>> the crash dump correctly, which assumes running some code on it.
>> So this aparently seems to be some timing issue (like host scheduler does
>> not schedule the thread executing secondary CPU's code in time).
>>
>> Could you give me some insight on this, i.e. is there a way to customize
>> QEMU/KVM to avoid such issue?
>>
>> If you think this might be a qemu/kvm issue, I can provide you any info,
>> like windows crash dumps, or the test-case to reproduce this.
>>
>>
>> qemu is started as:
>>
>> from-VM:
>>
>> qemu-system-x86_64 \
>> -S  \
>> -name 'virt-tests-vm1'  \
>> -sandbox off  \
>> -M pc-1.0  \
>> -nodefaults  \
>> -vga std  \
>> -chardev
>> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait
>> \
>> -mon chardev=qmp_id_qmp1,mode=control  \
>> -chardev
>> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait
>> \
>> -device isa-serial,chardev=serial_id_serial0  \
>> -chardev
>> socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait
>> \
>> -device
>> isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \
>> -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
>> -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
>> -device
>> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
>> -device
>> virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05
>> \
>> -netdev
>> user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023  \
>> -m 2G  \
>> -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
>> -cpu phenom \
>> -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
>> -vnc :0  \
>> -rtc base=localtime,clock=host,driftfix=none  \
>> -boot order=cdn,once=c,menu=off \
>> -enable-kvm
>>
>> to-VM:
>>
>> qemu-system-x86_64 \
>> -S  \
>> -name 'virt-tests-vm1'  \
>> -sandbox off  \
>> -M pc-1.0  \
>> -nodefaults  \
>> -vga std  \
>> -chardev
>> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait
>> \
>> -mon chardev=qmp_id_qmp1,mode=control  \
>> -chardev
>> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait
>> \
>> -device isa-serial,chardev=serial_id_serial0  \
>> -chardev
>> socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,n

Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration

2015-01-27 Thread Mikhail Sennikovskii

Hi all,

I've posted the bolow mail to the qemu-dev mailing list, but I've got no 
response there.
That's why I decided to re-post it here as well, and besides that I 
think this could be a kvm-specific issue as well.


Some additional thing to note:
I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 
kernel as well.
I would typically use a max_downtime adjusted to 1 second instead of 
default 30 ms.
I also noticed that the issue happens much more rarelly if I increase 
the migration bandwidth, i.e. like


diff --git a/migration.c b/migration.c
index 26f4b65..d2e3b39 100644
--- a/migration.c
+++ b/migration.c
@@ -36,7 +36,7 @@ enum {
 MIG_STATE_COMPLETED,
 };

-#define MAX_THROTTLE  (32 << 20)  /* Migration speed throttling */
+#define MAX_THROTTLE  (90 << 20)  /* Migration speed throttling */

Like I said below, I would be glad to provide you with any additional 
information.


Thanks,
Mikhail

On 23.01.2015 15:03, Mikhail Sennikovskii wrote:

Hi all,

I'm running a slitely modified migration over tcp test in virt-test, 
which does a migration from one "smp=2" VM to another on the same host 
over TCP,
and exposes some dummy CPU load inside the GUEST while migration, and 
after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT 
BSOD inside the guest,

which happens when
"
An expected clock interrupt was not received on a secondary processor 
in an
MP system within the allocated interval. This indicates that the 
specified

processor is hung and not processing interrupts.
"

This seems to happen with any qemu version I've tested (1.2 and above, 
including upstream),
and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 
14.04.1 LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 
6 with SMP6 host.


One thing I noticed is that exposing a dummy CPU load on the HOST 
(like running multiple instances of the "while true; do false; done" 
script) in parallel with doing migration makes the issue to be quite 
easily reproducible.



Looking inside the windows crash dump, the second CPU is just running 
at IRQL 0, and it aparently not hung, as Windows is able to save its 
state in the crash dump correctly, which assumes running some code on it.
So this aparently seems to be some timing issue (like host scheduler 
does not schedule the thread executing secondary CPU's code in time).


Could you give me some insight on this, i.e. is there a way to 
customize QEMU/KVM to avoid such issue?


If you think this might be a qemu/kvm issue, I can provide you any 
info, like windows crash dumps, or the test-case to reproduce this.



qemu is started as:

from-VM:

qemu-system-x86_64 \
-S  \
-name 'virt-tests-vm1'  \
-sandbox off  \
-M pc-1.0  \
-nodefaults  \
-vga std  \
-chardev 
socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait 
\

-mon chardev=qmp_id_qmp1,mode=control  \
-chardev 
socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait 
\

-device isa-serial,chardev=serial_id_serial0  \
-chardev 
socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait 
\
-device 
isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 
\

-device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
-drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
-device 
virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 
\
-device 
virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05 
\
-netdev 
user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023  \

-m 2G  \
-smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
-cpu phenom \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
-vnc :0  \
-rtc base=localtime,clock=host,driftfix=none  \
-boot order=cdn,once=c,menu=off \
-enable-kvm

to-VM:

qemu-system-x86_64 \
-S  \
-name 'virt-tests-vm1'  \
-sandbox off  \
-M pc-1.0  \
-nodefaults  \
-vga std  \
-chardev 
socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait 
\

-mon chardev=qmp_id_qmp1,mode=control  \
-chardev 
socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait 
\

-device isa-serial,chardev=serial_id_serial0  \
-chardev 
socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait 
\
-device 
isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 
\

-device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
-drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
-device 
virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 
\
-device 
virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05 
\
-netdev 
user,id=idl9vRQt,hostfwd=tcp::5002-: