An in-flight ATAPI buffered CD read (ide_buffered_readv(), used by both the PIO and DMA reply paths in hw/ide/atapi.c) can collide with a concurrent port operation, and it breaks in two distinct ways. Both are guest-triggerable and reproduce on current master.
Neither was found by inspection: both are real issues reported against the qemu-kvm 10.0 downstream -- the crash from production, and the deadlock from the downstream live-migration validation suite -- on CD-ROMs backed by high-latency network storage, where a backend read outlives the guest's SATA command timeout. 1. Command engine restart -> NULL-deref crash (patches 1-2) A guest can stop and restart a port's command engine (PxCMD.ST 1 -> 0 -> 1) while a read is in flight. The restart re-maps the command list and clears AHCIDevice.cur_cmd, but nothing tears down the outstanding read; its completion then dereferences the cleared cur_cmd in ahci_pio_transfer() / ahci_dma_rw_buf(). This is not the COMRESET path -- an engine restart does not run ide_reset(), so the drive's transfer state is preserved and the read still runs. Patch 1 cancels the outstanding I/O when the command list is unmapped, reusing ide_cancel_dma_sync() exactly as ATAPI DEVICE RESET already does from the same MMIO context. Patch 2 is a qtest covering the PIO and DMA reply paths. 2. Concurrent drain -> main-loop deadlock (patch 8, repro in patch 9) A multi-sector PIO read whose byte-count limit spans more than one sector must fetch the later sectors of a DRQ burst from inside the completion of the first, asynchronous read. cd_read_sector_sync() did that with a synchronous blk_pread(), which queues behind any drain in progress. When a guest reset reaches virtio_blk_stop_ioeventfd() -> bdrv_drain_all_begin() while such a read is in flight, the nested read is queued while the outer completion still holds blk->in_flight, so the drain never finishes: the main loop wedges in the drain with the BQL held and every QMP/monitor operation blocks behind it. Patch 8 reads the whole elementary transfer in a single asynchronous request, so no read is ever issued mid-burst; cd_read_sector_sync() is removed and the DMA path is unchanged. Patch 9 reproduces the hang via x-blockdev-set-iothread, which takes the same bdrv_drain_all_begin() path a guest reset does. Patches 3-7 are the test groundwork the deadlock fix needs: they fold the IDE CD-ROM read tests into one parametrized helper and add the previously missing coverage -- multi-sector DMA, and raw 2352-byte READ CD on both IDE and AHCI -- so the rewritten ATAPI read path (including its in-place 2352 unpack) is exercised on both consumers. The test patches pass with or without patch 8, keeping the series bisectable. Patches 1-2 stand alone as the engine-restart crash fix; they are included here because the deadlock work shares and extends the same ATAPI read path and its test scaffolding. checkpatch is clean on all nine patches; the new and existing IDE and AHCI qtests pass. Denis V. Lunev (9): hw/ide/ahci: cancel in-flight buffered reads on command engine restart tests/qtest/ahci: test ATAPI read completing after engine restart tests/qtest/ide-test: parametrize the ATAPI CD-ROM read test tests/qtest/ide-test: add a multi-sector ATAPI DMA read test tests/qtest/ide-test: cover raw (2352-byte) ATAPI CD reads tests/qtest/libqos/ahci: support raw (2352-byte) READ CD tests/qtest/ahci: cover raw (2352-byte) ATAPI CD reads hw/ide/atapi: read the whole elementary transfer asynchronously tests/qtest/ahci: regression test for ATAPI read vs. drain hw/ide/ahci.c | 3 + hw/ide/atapi.c | 180 +++++++++++++++----------------- tests/qtest/ahci-test.c | 201 ++++++++++++++++++++++++++++++++++++ tests/qtest/ide-test.c | 211 ++++++++++++++++++++++++-------------- tests/qtest/libqos/ahci.c | 8 ++ tests/qtest/libqos/ahci.h | 2 + 6 files changed, 433 insertions(+), 172 deletions(-) -- 2.53.0
