On 2014-10-12 15:34, Kevin Wolf wrote:
Am 11.10.2014 um 09:14 hat Zhang Haoyu geschrieben:
In qcow2_update_snapshot_refcount - qcow2_process_discards() - bdrv_discard()
may free the Qcow2DiscardRegion which is referenced by next pointer in
qcow2_process_discards() now, in next iteration, d = next, so g_free(d)
will double-free this Qcow2DiscardRegion.
qcow2_snapshot_delete
|- qcow2_update_snapshot_refcount
|-- qcow2_process_discards
|--- bdrv_discard
| aio_poll
|- aio_dispatch
|-- bdrv_co_io_em_complete
|--- qemu_coroutine_enter(co-coroutine, NULL); === coroutine entry is
bdrv_co_do_rw
|--- g_free(d) == free first Qcow2DiscardRegion is okay
|--- d = next; == this set is done in QTAILQ_FOREACH_SAFE() macro.
|--- g_free(d); == double-free will happen if during previous iteration,
bdrv_discard had free this object.
Do you have a reproducer for this or did code review lead you to this?
This problem can be reproduced with loop of savevm - delvm - savem -
delvm ..., about 4 hours.
When I delete the vm snapshot, qemu crashed with a core file,
I debug the core file and find the double-free and the stack.
So I add a breakpoint at g_free(d);, and find that indeed a double-free
happened, twice free with the same address.
And only the first discard region have not happened with double-free.
At the moment I can't see how bdrv_discard(bs-file) could ever free a
Qcow2DiscardRegion of bs, as it's working on a completely different
BlockDriverState (which usually won't even be a qcow2 one).
I think the aio_context in bdrv_discard - aio_poll(aio_context, true)
is the qemu_aio_context,
no matter the bs or bs-file passed to bdrv_discard, so
aio_poll(aio_context) will poll all of the aio.
bdrv_co_do_rw
|- bdrv_co_do_writev
|-- bdrv_co_do_pwritev
|--- bdrv_aligned_pwritev
| qcow2_co_writev
|- qcow2_alloc_cluster_link_l2
|-- qcow2_free_any_clusters
|--- qcow2_free_clusters
| update_refcount
|- qcow2_process_discards
|-- g_free(d) == In next iteration, this Qcow2DiscardRegion will be
double-free.
This shouldn't happen in a nested call either, as s-lock can't be taken
recursively.
Could you detail how s-lock prevent that, above stack is from the gdb,
when I add a breakpoint in g_free(d).
Thanks,
Zhang Haoyu
Kevin