Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion

2010-05-18 Thread Kevin Wolf
Am 18.05.2010 13:13, schrieb Peter Lieven:
> hi,
> 
> will this patch make it into 0.12.4.1 ?
> 
> br,
> peter

Anthony, can you please cherry-pick commit 38d8dfa1 into stable-0.12?

Kevin

> 
> Christoph Hellwig wrote:
>> On Tue, May 04, 2010 at 04:01:35PM +0200, Kevin Wolf wrote:
>>   
>>> Great, I'm going to submit it as a proper patch then.
>>>
>>> Christoph, by now I'm pretty sure it's right, but can you have another
>>> look if this is correct, anyway?
>>> 
>>
>> It looks correct to me - we really shouldn't update the the fields
>> until bdrv_aio_cancel has returned.  In fact we cannot cancel a request
>> more often than we can, so there's a fairly high chance it will
>> complete.
>>
>>
>> Reviewed-by: Christoph Hellwig 



Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion

2010-05-18 Thread Peter Lieven

hi,

will this patch make it into 0.12.4.1 ?

br,
peter

Christoph Hellwig wrote:

On Tue, May 04, 2010 at 04:01:35PM +0200, Kevin Wolf wrote:
  

Great, I'm going to submit it as a proper patch then.

Christoph, by now I'm pretty sure it's right, but can you have another
look if this is correct, anyway?



It looks correct to me - we really shouldn't update the the fields
until bdrv_aio_cancel has returned.  In fact we cannot cancel a request
more often than we can, so there's a fairly high chance it will
complete.


Reviewed-by: Christoph Hellwig 

  



--
Mit freundlichen Grüßen/Kind Regards

Peter Lieven

..

  KAMP Netzwerkdienste GmbH
  Vestische Str. 89-91 | 46117 Oberhausen
  Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
  mailto:p...@kamp.de | http://www.kamp.de

  Geschäftsführer: Heiner Lante | Michael Lante
  Amtsgericht Duisburg | HRB Nr. 12154
  USt-Id-Nr.: DE 120607556

. 





qemu-kvm hangs if multipath device is queing (was: Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion)

2010-05-12 Thread Peter Lieven

Hi Kevin,

here we go. I created a blocking multipath device (interrupted all 
paths). qemu-kvm hangs with 100% cpu.

also monitor is not responding.

If I restore at least one path, the vm is continueing.

BR,
Peter


^C
Program received signal SIGINT, Interrupt.
0x7fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0
(gdb) bt
#0  0x7fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x7fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0
#2  0x7fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0x0042e739 in kvm_mutex_lock () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524
#4  0x0042e76e in qemu_mutex_lock_iothread () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537
#5  0x0040c262 in main_loop_wait (timeout=1000) at 
/usr/src/qemu-kvm-0.12.4/vl.c:3995
#6  0x0042dcf1 in kvm_main_loop () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126

#7  0x0040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212
#8  0x0041054b in main (argc=30, argv=0x7fff266a77e8, 
envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252

(gdb) bt full
#0  0x7fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0
No symbol table info available.
#1  0x7fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0
No symbol table info available.
#2  0x7fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0
No symbol table info available.
#3  0x0042e739 in kvm_mutex_lock () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524

No locals.
#4  0x0042e76e in qemu_mutex_lock_iothread () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537

No locals.
#5  0x0040c262 in main_loop_wait (timeout=1000) at 
/usr/src/qemu-kvm-0.12.4/vl.c:3995

   ioh = (IOHandlerRecord *) 0x0
   rfds = {fds_bits = {1048576, 0 }}
   wfds = {fds_bits = {0 }}
   xfds = {fds_bits = {0 }}
   ret = 1
   nfds = 21
   tv = {tv_sec = 0, tv_usec = 999761}
#6  0x0042dcf1 in kvm_main_loop () at 
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126

   fds = {18, 19}
   mask = {__val = {268443712, 0 }}
   sigfd = 20
#7  0x0040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212
   r = 0
#8  0x0041054b in main (argc=30, argv=0x7fff266a77e8, 
envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252

   gdbstub_dev = 0x0
   boot_devices_bitmap = 12
   i = 0
   snapshot = 0
   linux_boot = 0
   initrd_filename = 0x0
   kernel_filename = 0x0
   kernel_cmdline = 0x588fac ""
   boot_devices = "dc", '\0' 
   ds = (DisplayState *) 0x198bf00
   dcl = (DisplayChangeListener *) 0x0
   cyls = 0
   heads = 0
   secs = 0
   translation = 0
   hda_opts = (QemuOpts *) 0x0
   opts = (QemuOpts *) 0x1957390
   optind = 30
---Type  to continue, or q  to quit---
   r = 0x7fff266a8a23 "-usbdevice"
   optarg = 0x7fff266a8a2e "tablet"
   loadvm = 0x0
   machine = (QEMUMachine *) 0x861720
   cpu_model = 0x7fff266a8917 "qemu64,model_id=Intel(R) Xeon(R) CPU", ' 
' , "E5520  @ 2.27GHz"

   fds = {644511720, 32767}
   tb_size = 0
   pid_file = 0x7fff266a89bb "/var/run/qemu/vm-150.pid"
   incoming = 0x0
   fd = 0
   pwd = (struct passwd *) 0x0
   chroot_dir = 0x0
   run_as = 0x0
   env = (struct CPUX86State *) 0x0
   show_vnc_port = 0
   params = {0x58cc76 "order", 0x58cc7c "once", 0x58cc81 "menu", 0x0}

Kevin Wolf wrote:

Am 04.05.2010 15:42, schrieb Peter Lieven:
  

hi kevin,

you did it *g*

looks promising. applied this patched and was not able to reproduce yet :-)

secure way to reproduce was to shut down all multipath paths, then 
initiate i/o
in the vm (e.g. start an application). of course, everything hangs at 
this point.


after reenabling one path, vm crashed. now it seems to behave correctly and
just report an DMA timeout and continues normally afterwards.



Great, I'm going to submit it as a proper patch then.

Christoph, by now I'm pretty sure it's right, but can you have another
look if this is correct, anyway?

  

can you imagine of any way preventing the vm to consume 100% cpu in
that waiting state?
my current approach is to run all vms with nice 1, which helped to keep the
machine responsible if all vms (in my test case 64 on a box) have hanging
i/o at the same time.



I don't have anything particular in mind, but you could just attach gdb
and get another backtrace while it consumes 100% CPU (you'll need to use
"thread apply all bt" to catch everything). Then we should see where
it's hanging.

Kevin



  



--
Mit freundlichen Grüßen/Kind Regards

Peter Lieven

..

  KAMP Netzwerkdienste GmbH
  Vestische Str. 89-91 | 46117 Oberhausen
  Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
  mailto:p...@kamp.de | http://www.kamp.de

  Geschäftsführer: Heiner Lante | Michael Lante
  Amtsgericht Duisburg | HRB Nr. 12154
  USt-Id-Nr.: DE 120607556

..

Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion

2010-05-08 Thread André Weidemann

Hi Kevin,
On 04.05.2010 14:20, Kevin Wolf wrote:


Am 04.05.2010 13:38, schrieb Peter Lieven:

hi kevin,

i set a breakpint at bmdma_active_if. the first 2 breaks encountered
when the last path in the multipath
failed, but the assertion was not true.
when i kicked one path back in the breakpoint was reached again, this
time leading to an assert.
the stacktrace is from the point shortly before.

hope this helps.


Hm, looks like there's something wrong with cancelling requests -
bdrv_aio_cancel might decide that it completes a request (and
consequently calls the callback for it) whereas the IDE emulation
decides that it's done with the request before calling bdrv_aio_cancel.

I haven't looked in much detail what this could break, but does
something like this help?


Your attached patch fixes the problem I had as well. I ran 3 consecutive 
tests tonight, which all finished without crashing the VM.
I reported my "assertion failed" error on March 14th while doing disk 
perfomance tests using iozone in an Ubuntu 9.10 VM with qemu-kvm 0.12.3.


Thank you very much.
 André




Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion

2010-05-04 Thread Christoph Hellwig
On Tue, May 04, 2010 at 04:01:35PM +0200, Kevin Wolf wrote:
> Great, I'm going to submit it as a proper patch then.
> 
> Christoph, by now I'm pretty sure it's right, but can you have another
> look if this is correct, anyway?

It looks correct to me - we really shouldn't update the the fields
until bdrv_aio_cancel has returned.  In fact we cannot cancel a request
more often than we can, so there's a fairly high chance it will
complete.


Reviewed-by: Christoph Hellwig 




Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion

2010-05-04 Thread Kevin Wolf
Am 04.05.2010 15:42, schrieb Peter Lieven:
> hi kevin,
> 
> you did it *g*
> 
> looks promising. applied this patched and was not able to reproduce yet :-)
> 
> secure way to reproduce was to shut down all multipath paths, then 
> initiate i/o
> in the vm (e.g. start an application). of course, everything hangs at 
> this point.
> 
> after reenabling one path, vm crashed. now it seems to behave correctly and
> just report an DMA timeout and continues normally afterwards.

Great, I'm going to submit it as a proper patch then.

Christoph, by now I'm pretty sure it's right, but can you have another
look if this is correct, anyway?

> can you imagine of any way preventing the vm to consume 100% cpu in
> that waiting state?
> my current approach is to run all vms with nice 1, which helped to keep the
> machine responsible if all vms (in my test case 64 on a box) have hanging
> i/o at the same time.

I don't have anything particular in mind, but you could just attach gdb
and get another backtrace while it consumes 100% CPU (you'll need to use
"thread apply all bt" to catch everything). Then we should see where
it's hanging.

Kevin




Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion

2010-05-04 Thread Peter Lieven

hi kevin,

you did it *g*

looks promising. applied this patched and was not able to reproduce yet :-)

secure way to reproduce was to shut down all multipath paths, then 
initiate i/o
in the vm (e.g. start an application). of course, everything hangs at 
this point.


after reenabling one path, vm crashed. now it seems to behave correctly and
just report an DMA timeout and continues normally afterwards.

can you imagine of any way preventing the vm to consume 100% cpu in
that waiting state?
my current approach is to run all vms with nice 1, which helped to keep the
machine responsible if all vms (in my test case 64 on a box) have hanging
i/o at the same time.

br,
peter



Kevin Wolf wrote:

Am 04.05.2010 13:38, schrieb Peter Lieven:
  

hi kevin,

i set a breakpint at bmdma_active_if. the first 2 breaks encountered 
when the last path in the multipath

failed, but the assertion was not true.
when i kicked one path back in the breakpoint was reached again, this 
time leading to an assert.

the stacktrace is from the point shortly before.

hope this helps.



Hm, looks like there's something wrong with cancelling requests -
bdrv_aio_cancel might decide that it completes a request (and
consequently calls the callback for it) whereas the IDE emulation
decides that it's done with the request before calling bdrv_aio_cancel.

I haven't looked in much detail what this could break, but does
something like this help?

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 0757528..3cd55e3 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2838,10 +2838,6 @@ static void ide_dma_restart(IDEState *s, int is_read)
 void ide_dma_cancel(BMDMAState *bm)
 {
 if (bm->status & BM_STATUS_DMAING) {
-bm->status &= ~BM_STATUS_DMAING;
-/* cancel DMA request */
-bm->unit = -1;
-bm->dma_cb = NULL;
 if (bm->aiocb) {
 #ifdef DEBUG_AIO
 printf("aio_cancel\n");
@@ -2849,6 +2845,10 @@ void ide_dma_cancel(BMDMAState *bm)
 bdrv_aio_cancel(bm->aiocb);
 bm->aiocb = NULL;
 }
+bm->status &= ~BM_STATUS_DMAING;
+/* cancel DMA request */
+bm->unit = -1;
+bm->dma_cb = NULL;
 }
 }

Kevin

  






Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion

2010-05-04 Thread Kevin Wolf
Am 04.05.2010 13:38, schrieb Peter Lieven:
> hi kevin,
> 
> i set a breakpint at bmdma_active_if. the first 2 breaks encountered 
> when the last path in the multipath
> failed, but the assertion was not true.
> when i kicked one path back in the breakpoint was reached again, this 
> time leading to an assert.
> the stacktrace is from the point shortly before.
> 
> hope this helps.

Hm, looks like there's something wrong with cancelling requests -
bdrv_aio_cancel might decide that it completes a request (and
consequently calls the callback for it) whereas the IDE emulation
decides that it's done with the request before calling bdrv_aio_cancel.

I haven't looked in much detail what this could break, but does
something like this help?

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 0757528..3cd55e3 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2838,10 +2838,6 @@ static void ide_dma_restart(IDEState *s, int is_read)
 void ide_dma_cancel(BMDMAState *bm)
 {
 if (bm->status & BM_STATUS_DMAING) {
-bm->status &= ~BM_STATUS_DMAING;
-/* cancel DMA request */
-bm->unit = -1;
-bm->dma_cb = NULL;
 if (bm->aiocb) {
 #ifdef DEBUG_AIO
 printf("aio_cancel\n");
@@ -2849,6 +2845,10 @@ void ide_dma_cancel(BMDMAState *bm)
 bdrv_aio_cancel(bm->aiocb);
 bm->aiocb = NULL;
 }
+bm->status &= ~BM_STATUS_DMAING;
+/* cancel DMA request */
+bm->unit = -1;
+bm->dma_cb = NULL;
 }
 }

Kevin




Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion

2010-05-04 Thread Peter Lieven

hi kevin,

i set a breakpint at bmdma_active_if. the first 2 breaks encountered 
when the last path in the multipath

failed, but the assertion was not true.
when i kicked one path back in the breakpoint was reached again, this 
time leading to an assert.

the stacktrace is from the point shortly before.

hope this helps.

br,
peter
--

(gdb) b bmdma_active_if
Breakpoint 2 at 0x43f2e0: file 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h, line 507.

(gdb) c
Continuing.
[Switching to Thread 0x7f7b3300d950 (LWP 21171)]

Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507

507assert(bmdma->unit != (uint8_t)-1);
(gdb) c
Continuing.

Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507

507assert(bmdma->unit != (uint8_t)-1);
(gdb) c
Continuing.

Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507

507assert(bmdma->unit != (uint8_t)-1);
(gdb) bt full
#0  bmdma_active_if (bmdma=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507

   __PRETTY_FUNCTION__ = "bmdma_active_if"
#1  0x0043f6ba in ide_read_dma_cb (opaque=0xe31fd8, ret=0) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/core.c:554

   bm = (BMDMAState *) 0xe31fd8
   s = (IDEState *) 0xe17940
   n = 0
   sector_num = 0
#2  0x0058730c in dma_bdrv_cb (opaque=0xe17940, ret=0) at 
/usr/src/qemu-kvm-0.12.3/dma-helpers.c:94

   dbs = (DMAAIOCB *) 0xe17940
   cur_addr = 0
   cur_len = 0
   mem = (void *) 0x0
#3  0x0049e510 in qemu_laio_process_completion (s=0xe119c0, 
laiocb=0xe179c0) at linux-aio.c:68

   ret = 0
#4  0x0049e611 in qemu_laio_enqueue_completed (s=0xe119c0, 
laiocb=0xe179c0) at linux-aio.c:107

No locals.
#5  0x0049e787 in qemu_laio_completion_cb (opaque=0xe119c0) at 
linux-aio.c:144

   iocb = (struct iocb *) 0xe179f0
   laiocb = (struct qemu_laiocb *) 0xe179c0
   val = 1
   ret = 8
   nevents = 1
   i = 0
   events = {{data = 0x0, obj = 0xe179f0, res = 4096, res2 = 0}, {data 
= 0x0, obj = 0x0, res = 0, res2 = 0} , {data = 0x0, 
obj = 0x0, res = 0,
   res2 = 4365191}, {data = 0x429abf, obj = 0x7f7b3300c410, res = 
4614129721674825936, res2 = 14777248}, {data = 0x300018, obj = 
0x7f7b3300c4c0, res = 140167113393152,
   res2 = 47259417504}, {data = 0xe17740, obj = 0xa3300c4e0, res = 
140167113393184, res2 = 0}, {data = 0xe17740, obj = 0x0, res = 0, res2 = 
17}, {data = 0x7f7b3300ccf0,
   obj = 0x92, res = 32, res2 = 168}, {data = 0x7f7b33797a00, obj = 
0x801000, res = 0, res2 = 140167141433408}, {data = 0x7f7b34496e00, obj 
= 0x7f7b33797a00,
   res = 140167113393392, res2 = 8392704}, {data = 0x0, obj = 
0x7f7b34aca040, res = 140167134932480, res2 = 140167118209654}, {data = 
0x7f7b3300d950, obj = 0x42603d, res = 0,
   res2 = 42949672960}, {data = 0x7f7b3300c510, obj = 0xe17ba0, res = 
14776128, res2 = 43805361568}, {data = 0x7f7b3300ced0, obj = 0x42797e, 
res = 0, res2 = 14777248}, {
   data = 0x174, obj = 0x0, res = 373, res2 = 0}, {data = 0x176, obj = 
0x0, res = 3221225601, res2 = 0}, {data = 0x4008ae89c083, obj = 0x0, 
res = 209379655938, res2 = 0}, {
   data = 0x7f7bc084, obj = 0x0, res = 3221225602, res2 = 0}, {data 
= 0x7f7b0012, obj = 0x0, res = 17, res2 = 0}, {data = 0x0, obj = 
0x11, res = 140167113395840,
   res2 = 146}, {data = 0x20, obj = 0xa8, res = 140167121304064, res2 = 
8392704}, {data = 0x0, obj = 0x7f7b34aca040, res = 140167134932480, res2 
= 140167121304064}, {
   data = 0x7f7b3300c680, obj = 0x801000, res = 0, res2 = 
140167141433408}, {data = 0x7f7b34496e00, obj = 0x7f7b334a4276, res = 
140167113398608, res2 = 4350013}, {data = 0x0,
   obj = 0xa, res = 140167113393824, res2 = 14777248}, {data = 
0xe2c010, obj = 0xa3300c730, res = 140167113396320, res2 = 4356478}, 
{data = 0x0, obj = 0xe17ba0,
   res = 372, res2 = 0}, {data = 0x175, obj = 0x0, res = 374, res2 = 
0}, {data = 0xc081, obj = 0x0, res = 3221225603, res2 = 0}, {data = 
0xc102, obj = 0x0,
   res = 3221225604, res2 = 0}, {data = 0xc082, obj = 0x0, res = 
18, res2 = 0}, {data = 0x11, obj = 0x0, res = 0, res2 = 0}, {data = 0x0, 
obj = 0x0, res = 0, res2 = 0}, {
   data = 0x0, obj = 0x0, res = 0, res2 = 0}, {data = 0x0, obj = 0x0, 
res = 0, res2 = 0}, {data = 0x0, obj = 0x0, res = 0, res2 = 0}, {data = 
0x0, obj = 0x0, res = 0,
   res2 = 140167139245116}, {data = 0x0, obj = 0x7f7b34abe118, res = 9, 
res2 = 13}, {data = 0x25bf5fc6, obj = 0x7f7b348b40f0, res = 
140167117719264, res2 = 6}, {
   data = 0x96fd7f, obj = 0x7f7b3300c850, res = 140167113394680, res2 = 
140167117724520}, {data = 0x0, obj = 0x7f7b34abe168, res = 
140167141388288, res2 = 4206037}, {
   data = 0x7f7b3343a210, obj = 0x402058, res = 21474836480, res2 = 
4294968102}, {data = 0x0, obj = 0x7f7b34ac8358, res = 140167113394736, 
res2 = 140167113394680}, {
   data = 0x25bf5fc6, obj = 0x7f7b3300c9e0, res = 0, res2 = 
140167139246910}, {data = 0x0, obj = 0x7f7b34abe168, res = 5, res2 =

Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion

2010-05-04 Thread Kevin Wolf
Am 03.05.2010 23:26, schrieb Peter Lieven:
> Hi Qemu/KVM Devel Team,
> 
> i'm using qemu-kvm 0.12.3 with latest Kernel 2.6.33.3.
> As backend we use open-iSCSI with dm-multipath.
> 
> Multipath is configured to queue i/o if no path is available.
> 
> If we create a failure on all paths, qemu starts to consume 100%
> CPU due to i/o waits which is ok so far.
> 
> 1 odd thing: The Monitor Interface is not responding any more ...
> 
> What es a really blocker is that KVM crashes with:
> kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: 
> Assertion `bmdma->unit != (uint8_t)-1' failed.
> 
> after the multipath has reestablisched at least one path.

Can you get a stack backtrace with gdb?

> Any ideas? I remember this was working with earlier kernel/kvm/qemu 
> versions.

If it works in the same setup with an older qemu version, bisecting
might help.

Kevin