On 6/19/26 13:21, Denis V. Lunev wrote:
> An in-flight ATAPI buffered CD read (ide_buffered_readv(), used by both
> the PIO and DMA reply paths in hw/ide/atapi.c) can collide with a
> concurrent port operation, and it breaks in two distinct ways. Both are
> guest-triggerable and reproduce on current master.
>
> Neither was found by inspection: both are real issues reported against
> the qemu-kvm 10.0 downstream -- the crash from production, and the
> deadlock from the downstream live-migration validation suite -- on
> CD-ROMs backed by high-latency network storage, where a backend read
> outlives the guest's SATA command timeout.
>
> 1. Command engine restart -> NULL-deref crash (patches 1-2)
>
>    A guest can stop and restart a port's command engine
>    (PxCMD.ST 1 -> 0 -> 1) while a read is in flight. The restart re-maps
>    the command list and clears AHCIDevice.cur_cmd, but nothing tears
>    down the outstanding read; its completion then dereferences the
>    cleared cur_cmd in ahci_pio_transfer() / ahci_dma_rw_buf(). This is
>    not the COMRESET path -- an engine restart does not run ide_reset(),
>    so the drive's transfer state is preserved and the read still runs.
>
>    Patch 1 cancels the outstanding I/O when the command list is
>    unmapped, reusing ide_cancel_dma_sync() exactly as ATAPI DEVICE
>    RESET already does from the same MMIO context. Patch 2 is a qtest
>    covering the PIO and DMA reply paths.
>
> 2. Concurrent drain -> main-loop deadlock (patch 8, repro in patch 9)
>
>    A multi-sector PIO read whose byte-count limit spans more than one
>    sector must fetch the later sectors of a DRQ burst from inside the
>    completion of the first, asynchronous read. cd_read_sector_sync()
>    did that with a synchronous blk_pread(), which queues behind any
>    drain in progress. When a guest reset reaches
>    virtio_blk_stop_ioeventfd() -> bdrv_drain_all_begin() while such a
>    read is in flight, the nested read is queued while the outer
>    completion still holds blk->in_flight, so the drain never finishes:
>    the main loop wedges in the drain with the BQL held and every
>    QMP/monitor operation blocks behind it.
>
>    Patch 8 reads the whole elementary transfer in a single asynchronous
>    request, so no read is ever issued mid-burst; cd_read_sector_sync()
>    is removed and the DMA path is unchanged. Patch 9 reproduces the
>    hang via x-blockdev-set-iothread, which takes the same
>    bdrv_drain_all_begin() path a guest reset does.
>
> Patches 3-7 are the test groundwork the deadlock fix needs: they fold
> the IDE CD-ROM read tests into one parametrized helper and add the
> previously missing coverage -- multi-sector DMA, and raw 2352-byte
> READ CD on both IDE and AHCI -- so the rewritten ATAPI read path
> (including its in-place 2352 unpack) is exercised on both consumers.
> The test patches pass with or without patch 8, keeping the series
> bisectable.
>
> Patches 1-2 stand alone as the engine-restart crash fix; they are
> included here because the deadlock work shares and extends the same
> ATAPI read path and its test scaffolding.
>
> checkpatch is clean on all nine patches; the new and existing IDE and
> AHCI qtests pass.
>
> Denis V. Lunev (9):
>   hw/ide/ahci: cancel in-flight buffered reads on command engine restart
>   tests/qtest/ahci: test ATAPI read completing after engine restart
>   tests/qtest/ide-test: parametrize the ATAPI CD-ROM read test
>   tests/qtest/ide-test: add a multi-sector ATAPI DMA read test
>   tests/qtest/ide-test: cover raw (2352-byte) ATAPI CD reads
>   tests/qtest/libqos/ahci: support raw (2352-byte) READ CD
>   tests/qtest/ahci: cover raw (2352-byte) ATAPI CD reads
>   hw/ide/atapi: read the whole elementary transfer asynchronously
>   tests/qtest/ahci: regression test for ATAPI read vs. drain
>
>  hw/ide/ahci.c             |   3 +
>  hw/ide/atapi.c            | 180 +++++++++++++++-----------------
>  tests/qtest/ahci-test.c   | 201 ++++++++++++++++++++++++++++++++++++
>  tests/qtest/ide-test.c    | 211 ++++++++++++++++++++++++--------------
>  tests/qtest/libqos/ahci.c |   8 ++
>  tests/qtest/libqos/ahci.h |   2 +
>  6 files changed, 433 insertions(+), 172 deletions(-)
>
ping

Reply via email to