Re: [Qemu-devel] [PATCH] qcow2: fix double-free of Qcow2DiscardRegion in qcow2_process_discards

2014-10-12 Thread Kevin Wolf
Am 11.10.2014 um 09:14 hat Zhang Haoyu geschrieben:
 In qcow2_update_snapshot_refcount - qcow2_process_discards() - 
 bdrv_discard()
 may free the Qcow2DiscardRegion which is referenced by next pointer in
 qcow2_process_discards() now, in next iteration, d = next, so g_free(d)
 will double-free this Qcow2DiscardRegion.
 
 qcow2_snapshot_delete
 |- qcow2_update_snapshot_refcount
 |-- qcow2_process_discards
 |--- bdrv_discard
 | aio_poll
 |- aio_dispatch
 |-- bdrv_co_io_em_complete
 |--- qemu_coroutine_enter(co-coroutine, NULL); === coroutine entry is 
 bdrv_co_do_rw
 |--- g_free(d) == free first Qcow2DiscardRegion is okay
 |--- d = next;  == this set is done in QTAILQ_FOREACH_SAFE() macro.
 |--- g_free(d);  == double-free will happen if during previous iteration, 
 bdrv_discard had free this object.

Do you have a reproducer for this or did code review lead you to this?

At the moment I can't see how bdrv_discard(bs-file) could ever free a
Qcow2DiscardRegion of bs, as it's working on a completely different
BlockDriverState (which usually won't even be a qcow2 one).

 bdrv_co_do_rw
 |- bdrv_co_do_writev
 |-- bdrv_co_do_pwritev
 |--- bdrv_aligned_pwritev
 | qcow2_co_writev
 |- qcow2_alloc_cluster_link_l2
 |-- qcow2_free_any_clusters
 |--- qcow2_free_clusters
 | update_refcount
 |- qcow2_process_discards
 |-- g_free(d)  == In next iteration, this Qcow2DiscardRegion will be 
 double-free.

This shouldn't happen in a nested call either, as s-lock can't be taken
recursively.

Kevin



Re: [Qemu-devel] [PATCH] qcow2: fix double-free of Qcow2DiscardRegion in qcow2_process_discards

2014-10-12 Thread Zhang Haoyu


On 2014-10-12 15:34, Kevin Wolf wrote:

Am 11.10.2014 um 09:14 hat Zhang Haoyu geschrieben:

In qcow2_update_snapshot_refcount - qcow2_process_discards() - bdrv_discard()
may free the Qcow2DiscardRegion which is referenced by next pointer in
qcow2_process_discards() now, in next iteration, d = next, so g_free(d)
will double-free this Qcow2DiscardRegion.

qcow2_snapshot_delete
|- qcow2_update_snapshot_refcount
|-- qcow2_process_discards
|--- bdrv_discard
| aio_poll
|- aio_dispatch
|-- bdrv_co_io_em_complete
|--- qemu_coroutine_enter(co-coroutine, NULL); === coroutine entry is 
bdrv_co_do_rw
|--- g_free(d) == free first Qcow2DiscardRegion is okay
|--- d = next;  == this set is done in QTAILQ_FOREACH_SAFE() macro.
|--- g_free(d);  == double-free will happen if during previous iteration, 
bdrv_discard had free this object.

Do you have a reproducer for this or did code review lead you to this?
This problem can be reproduced with loop of savevm - delvm - savem - 
delvm ..., about 4 hours.

When I delete the vm snapshot, qemu crashed with a core file,
I debug the core file and find the double-free and the stack.

So I add a breakpoint at g_free(d);, and find that indeed a double-free 
happened,  twice free with the same address.

And only the first discard region have not happened with double-free.


At the moment I can't see how bdrv_discard(bs-file) could ever free a
Qcow2DiscardRegion of bs, as it's working on a completely different
BlockDriverState (which usually won't even be a qcow2 one).
I think the aio_context in bdrv_discard - aio_poll(aio_context, true) 
is the qemu_aio_context,
no matter the bs or bs-file passed to bdrv_discard, so 
aio_poll(aio_context) will poll all of the aio.



bdrv_co_do_rw
|- bdrv_co_do_writev
|-- bdrv_co_do_pwritev
|--- bdrv_aligned_pwritev
| qcow2_co_writev
|- qcow2_alloc_cluster_link_l2
|-- qcow2_free_any_clusters
|--- qcow2_free_clusters
| update_refcount
|- qcow2_process_discards
|-- g_free(d)  == In next iteration, this Qcow2DiscardRegion will be 
double-free.

This shouldn't happen in a nested call either, as s-lock can't be taken
recursively.
Could you detail how s-lock prevent that, above stack is from the gdb, 
when I add a breakpoint in g_free(d).


Thanks,
Zhang Haoyu


Kevin