hi kevin,
you did it *g*
looks promising. applied this patched and was not able to reproduce yet :-)
secure way to reproduce was to shut down all multipath paths, then
initiate i/o
in the vm (e.g. start an application). of course, everything hangs at
this point.
after reenabling one path, vm crashed. now it seems to behave correctly and
just report an DMA timeout and continues normally afterwards.
can you imagine of any way preventing the vm to consume 100% cpu in
that waiting state?
my current approach is to run all vms with nice 1, which helped to keep the
machine responsible if all vms (in my test case 64 on a box) have hanging
i/o at the same time.
br,
peter
Kevin Wolf wrote:
Am 04.05.2010 13:38, schrieb Peter Lieven:
hi kevin,
i set a breakpint at bmdma_active_if. the first 2 breaks encountered
when the last path in the multipath
failed, but the assertion was not true.
when i kicked one path back in the breakpoint was reached again, this
time leading to an assert.
the stacktrace is from the point shortly before.
hope this helps.
Hm, looks like there's something wrong with cancelling requests -
bdrv_aio_cancel might decide that it completes a request (and
consequently calls the callback for it) whereas the IDE emulation
decides that it's done with the request before calling bdrv_aio_cancel.
I haven't looked in much detail what this could break, but does
something like this help?
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 0757528..3cd55e3 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2838,10 +2838,6 @@ static void ide_dma_restart(IDEState *s, int is_read)
void ide_dma_cancel(BMDMAState *bm)
{
if (bm->status & BM_STATUS_DMAING) {
- bm->status &= ~BM_STATUS_DMAING;
- /* cancel DMA request */
- bm->unit = -1;
- bm->dma_cb = NULL;
if (bm->aiocb) {
#ifdef DEBUG_AIO
printf("aio_cancel\n");
@@ -2849,6 +2845,10 @@ void ide_dma_cancel(BMDMAState *bm)
bdrv_aio_cancel(bm->aiocb);
bm->aiocb = NULL;
}
+ bm->status &= ~BM_STATUS_DMAING;
+ /* cancel DMA request */
+ bm->unit = -1;
+ bm->dma_cb = NULL;
}
}
Kevin