Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0

2012-09-25 Thread Kevin Wolf
Am 19.09.2012 07:49, schrieb Peter Lieven:
 On 09/18/12 12:31, Kevin Wolf wrote:
 Am 18.09.2012 12:28, schrieb Peter Lieven:
 On 09/17/12 22:12, Peter Lieven wrote:
 On 09/17/12 10:41, Kevin Wolf wrote:
 Am 16.09.2012 12:13, schrieb Peter Lieven:
 Hi,

 when trying to block migrate a VM from one node to another, the source
 VM crashed with the following assertion:
 block.c:3829: bdrv_set_in_use: Assertion `bs-in_use != in_use' failed.

 Is this sth already addresses/known?
 Not that I'm aware of, at least.

 Block migration doesn't seem to check whether the device is already in
 use, maybe this is the problem. Not sure why it would be in use, though,
 and in my quick test it didn't crash.

 So we need some more information: What's you command line, did you do
 anything specific in the monitor with block devices, what does the
 stacktrace look like, etc.?
 kevin, it seems that i can very easily force a crash if I cancel a
 running block migration.
 if I understand correctly what happens there are aio callbacks coming in
 after
 blk_mig_cleanup() has been called.

 what is the proper way to detect this in blk_mig_read_cb()?
 You could try this, it doesn't detect the situation in
 blk_mig_read_cb(), but ensures that all callbacks happen before we do
 the actual cleanup (completely untested):
 after testing it for half an hour i can say, it seems to fix the problem.
 no segfaults and also no other assertions.
 
 while searching I have seen that the queses blk_list and bmds_list are 
 initialized at
 qemu startup. wouldn't it be better to initialize them at init_blk_migration
 or at least check that they are really empty? i have also seen that 
 prev_time_offset
 is not initialized.

Probably. If you sent this as a proper patch with a SoB, I wouldn't
reject it, but considering that block migration is deprecated anyway, I
won't bother myself as long as there's no real bug.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0

2012-09-18 Thread Peter Lieven

On 09/17/12 22:12, Peter Lieven wrote:

On 09/17/12 10:41, Kevin Wolf wrote:

Am 16.09.2012 12:13, schrieb Peter Lieven:

Hi,

when trying to block migrate a VM from one node to another, the source
VM crashed with the following assertion:
block.c:3829: bdrv_set_in_use: Assertion `bs-in_use != in_use' failed.

Is this sth already addresses/known?

Not that I'm aware of, at least.

Block migration doesn't seem to check whether the device is already in
use, maybe this is the problem. Not sure why it would be in use, though,
and in my quick test it didn't crash.

So we need some more information: What's you command line, did you do
anything specific in the monitor with block devices, what does the
stacktrace look like, etc.?
kevin, it seems that i can very easily force a crash if I cancel a 
running block migration.
if I understand correctly what happens there are aio callbacks coming in 
after

blk_mig_cleanup() has been called.

what is the proper way to detect this in blk_mig_read_cb()?

Thanks,
Peter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0

2012-09-18 Thread Kevin Wolf
Am 18.09.2012 12:28, schrieb Peter Lieven:
 On 09/17/12 22:12, Peter Lieven wrote:
 On 09/17/12 10:41, Kevin Wolf wrote:
 Am 16.09.2012 12:13, schrieb Peter Lieven:
 Hi,

 when trying to block migrate a VM from one node to another, the source
 VM crashed with the following assertion:
 block.c:3829: bdrv_set_in_use: Assertion `bs-in_use != in_use' failed.

 Is this sth already addresses/known?
 Not that I'm aware of, at least.

 Block migration doesn't seem to check whether the device is already in
 use, maybe this is the problem. Not sure why it would be in use, though,
 and in my quick test it didn't crash.

 So we need some more information: What's you command line, did you do
 anything specific in the monitor with block devices, what does the
 stacktrace look like, etc.?
 kevin, it seems that i can very easily force a crash if I cancel a 
 running block migration.
 if I understand correctly what happens there are aio callbacks coming in 
 after
 blk_mig_cleanup() has been called.
 
 what is the proper way to detect this in blk_mig_read_cb()?

You could try this, it doesn't detect the situation in
blk_mig_read_cb(), but ensures that all callbacks happen before we do
the actual cleanup (completely untested):

diff --git a/block-migration.c b/block-migration.c
index 7def8ab..ed93301 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -519,6 +519,8 @@ static void blk_mig_cleanup(void)
 BlkMigDevState *bmds;
 BlkMigBlock *blk;

+bdrv_drain_all();
+
 set_dirty_tracking(0);

 while ((bmds = QSIMPLEQ_FIRST(block_mig_state.bmds_list)) != NULL) {
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0

2012-09-18 Thread Peter Lieven

On 09/18/12 12:31, Kevin Wolf wrote:

Am 18.09.2012 12:28, schrieb Peter Lieven:

On 09/17/12 22:12, Peter Lieven wrote:

On 09/17/12 10:41, Kevin Wolf wrote:

Am 16.09.2012 12:13, schrieb Peter Lieven:

Hi,

when trying to block migrate a VM from one node to another, the source
VM crashed with the following assertion:
block.c:3829: bdrv_set_in_use: Assertion `bs-in_use != in_use' failed.

Is this sth already addresses/known?

Not that I'm aware of, at least.

Block migration doesn't seem to check whether the device is already in
use, maybe this is the problem. Not sure why it would be in use, though,
and in my quick test it didn't crash.

So we need some more information: What's you command line, did you do
anything specific in the monitor with block devices, what does the
stacktrace look like, etc.?

kevin, it seems that i can very easily force a crash if I cancel a
running block migration.

if I understand correctly what happens there are aio callbacks coming in
after
blk_mig_cleanup() has been called.

what is the proper way to detect this in blk_mig_read_cb()?

You could try this, it doesn't detect the situation in
blk_mig_read_cb(), but ensures that all callbacks happen before we do
the actual cleanup (completely untested):

after testing it for half an hour i can say, it seems to fix the problem.
no segfaults and also no other assertions.

while searching I have seen that the queses blk_list and bmds_list are 
initialized at

qemu startup. wouldn't it be better to initialize them at init_blk_migration
or at least check that they are really empty? i have also seen that 
prev_time_offset

is not initialized.

thank you,
peter

sth like this:

--- qemu-kvm-1.2.0/block-migration.c.orig2012-09-17 
21:14:44.458429855 +0200

+++ qemu-kvm-1.2.0/block-migration.c2012-09-17 21:15:40.599736962 +0200
@@ -311,8 +311,12 @@ static void init_blk_migration(QEMUFile
 block_mig_state.prev_progress = -1;
 block_mig_state.bulk_completed = 0;
 block_mig_state.total_time = 0;
+block_mig_state.prev_time_offset = 0;
 block_mig_state.reads = 0;

+QSIMPLEQ_INIT(block_mig_state.bmds_list);
+QSIMPLEQ_INIT(block_mig_state.blk_list);
+
 bdrv_iterate(init_blk_migration_it, NULL);
 }

@@ -760,9 +764,6 @@ SaveVMHandlers savevm_block_handlers = {

 void blk_mig_init(void)
 {
-QSIMPLEQ_INIT(block_mig_state.bmds_list);
-QSIMPLEQ_INIT(block_mig_state.blk_list);
-
 register_savevm_live(NULL, block, 0, 1, savevm_block_handlers,
block_mig_state);
 }


diff --git a/block-migration.c b/block-migration.c
index 7def8ab..ed93301 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -519,6 +519,8 @@ static void blk_mig_cleanup(void)
  BlkMigDevState *bmds;
  BlkMigBlock *blk;

+bdrv_drain_all();
+
  set_dirty_tracking(0);

  while ((bmds = QSIMPLEQ_FIRST(block_mig_state.bmds_list)) != NULL) {


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0

2012-09-17 Thread Kevin Wolf
Am 16.09.2012 12:13, schrieb Peter Lieven:
 Hi,
 
 when trying to block migrate a VM from one node to another, the source 
 VM crashed with the following assertion:
 block.c:3829: bdrv_set_in_use: Assertion `bs-in_use != in_use' failed.
 
 Is this sth already addresses/known?

Not that I'm aware of, at least.

Block migration doesn't seem to check whether the device is already in
use, maybe this is the problem. Not sure why it would be in use, though,
and in my quick test it didn't crash.

So we need some more information: What's you command line, did you do
anything specific in the monitor with block devices, what does the
stacktrace look like, etc.?

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0

2012-09-17 Thread Peter Lieven

On 09/17/12 10:41, Kevin Wolf wrote:

Am 16.09.2012 12:13, schrieb Peter Lieven:

Hi,

when trying to block migrate a VM from one node to another, the source
VM crashed with the following assertion:
block.c:3829: bdrv_set_in_use: Assertion `bs-in_use != in_use' failed.

Is this sth already addresses/known?

Not that I'm aware of, at least.

Block migration doesn't seem to check whether the device is already in
use, maybe this is the problem. Not sure why it would be in use, though,
and in my quick test it didn't crash.
It seems that it only happens if a vServer that has been block migrated 
earlier is block migrated the next time.

So we need some more information: What's you command line, did you do
anything specific in the monitor with block devices, what does the
stacktrace look like, etc.?

Here is my cmdline:
/usr/bin/qemu-kvm-1.2.0  -net 
tap,vlan=164,script=no,downscript=no,ifname=tap0  -net nic,vlan
=164,model=e1000,macaddr=52:54:00:ff:01:19   -drive 
format=host_device,file=/dev/7cf58855099771c2/lieven-storage-migration-t-hd0,if=virtio,cache=none,aio=nat
ive  -m 2048 -smp 2,sockets=1,cores=2,threads=1  -monitor 
tcp:0:4001,server,nowait -vnc :1 -qmp tcp:0:3001,server,nowait  -name 
'lieven-storage-migration-test'  -boot or
der=dc,menu=off  -k de  -incoming tcp:172.21.55.34:5001  -pidfile 
/var/run/qemu/vm-254.pid  -mem-path /hugepages  -mem-prealloc  -rtc 
base=utc -usb -usbdevice tablet -no
-hpet -vga cirrus  -cpu host,+x2apic,model_id='Intel(R) Xeon(R) 
CPU   L5640  @ 2.27GHz',-tsc


I have seen other errors as well in the meantime:
block-migration.c:471: flush_blks: Assertion `block_mig_state.read_done 
= 0' failed.
qemu-kvm-1.2.0[27851]: segfault at 7f00746e78d7 ip 7f67eca6226d sp 
7fff56ae3340 error 4 in qemu-system-x86_64[7f67ec9e9000+418000]


I will now try to catch the situation in the debugger.

Thanks,
Peter


Kevin


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0

2012-09-17 Thread Peter Lieven

On 09/17/12 10:41, Kevin Wolf wrote:

Am 16.09.2012 12:13, schrieb Peter Lieven:

Hi,

when trying to block migrate a VM from one node to another, the source
VM crashed with the following assertion:
block.c:3829: bdrv_set_in_use: Assertion `bs-in_use != in_use' failed.

Is this sth already addresses/known?

Not that I'm aware of, at least.

Block migration doesn't seem to check whether the device is already in
use, maybe this is the problem. Not sure why it would be in use, though,
and in my quick test it didn't crash.

So we need some more information: What's you command line, did you do
anything specific in the monitor with block devices, what does the
stacktrace look like, etc.?
i was also able to reproduce a flush_blks: Assertion 
`block_mig_state.read_done = 0' failed. by

cancelling a block migration and restarting it afterwards.
however, how can I grep a stack trace after an assert?

thanks,
peter


Kevin


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html