Paolo, Kevin, I recently hit an assertion failure in the SCSI code (pseries machine), which I pinned down to a use-after-free of the BlockAcctCookie within a SCSIDiskReq.
I can't say I understand the refcounting well enough to be confident of this, but when attempting to debug I noticed that the scsi_req_unref() at the bottom of scsi_do_read() looks suspicious - either the dma_bdrv_read() or bdrv_aio_readv() called immediately above will do an unref after the AIO is complete - which seems to match the scsi_req_ref() in scsi_read_data (the called of scsi_do_read() either directly or by queuing aio). I couldn't spot another ref to match the extra unref in scsi_do_read() except in the failure case. So the patch below fixes my assertion failure, but again, I can't say I understand this well enough to be confident - it might be leaking scsi reqs instead. But if this isn't the right fix, I hope one of you can help me find where the real refcounting bug is. Signed-off-by: David Gibson <da...@gibson.dropbear.id.au> --- hw/scsi-disk.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index a029ab6..e639b3a 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -292,7 +292,10 @@ static void scsi_do_read(void *opaque, int ret) if (ret < 0) { if (scsi_handle_rw_error(r, -ret)) { - goto done; + if (!r->req.io_canceled) { + scsi_req_unref(&r->req); + return; + } } } @@ -307,11 +310,6 @@ static void scsi_do_read(void *opaque, int ret) r->req.aiocb = bdrv_aio_readv(s->qdev.conf.bs, r->sector, &r->qiov, n, scsi_read_complete, r); } - -done: - if (!r->req.io_canceled) { - scsi_req_unref(&r->req); - } } /* Read more data from scsi device into buffer. */ -- 1.7.10