Re: [Qemu-block] [Qemu-devel] [PATCH v18 7/8] Implement new driver for block replication

Stefan Hajnoczi Fri, 06 May 2016 08:49:13 -0700

On Fri, Apr 15, 2016 at 04:10:37PM +0800, Changlong Xie wrote:
> +static void replication_close(BlockDriverState *bs)
> +{
> +    BDRVReplicationState *s = bs->opaque;
> +
> +    if (s->mode == REPLICATION_MODE_SECONDARY) {
> +        g_free(s->top_id);
> +    }
> +
> +    if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
> +        replication_stop(s->rs, false, NULL);
> +    }


There is a possible use-after-free with s->top_id.  If we free it above
then replication_stop() must not call backup_job_cleanup().  I think it
could call it from replication_stop().

It would be safer to call replication_stop() before freeing s->top_id.

> +        top_bs = bdrv_lookup_bs(s->top_id, s->top_id, errp);

Please check that bs is a child of top_bs.  If it is not a child then
strange things could happen, for example the AioContexts might not match
(meaning it's not thread-safe) so this should be forbidden.

> +        if (!top_bs) {
> +            aio_context_release(aio_context);
> +            return;
> +        }

Error return paths after reopen_backing_file(s, true, &local_err) should
undo the operation.

> +        bdrv_op_block_all(top_bs, s->blocker);
> +        bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
> +
> +        /*
> +         * Must protect backup target if backup job was stopped/cancelled
> +         * unexpectedly
> +         */
> +        bdrv_ref(s->hidden_disk->bs);
> +
> +        backup_start(s->secondary_disk->bs, s->hidden_disk->bs, 0,
> +                     MIRROR_SYNC_MODE_NONE, NULL, BLOCKDEV_ON_ERROR_REPORT,
> +                     BLOCKDEV_ON_ERROR_REPORT, backup_job_completed,
> +                     s, NULL, &local_err);

Did you run stress tests where the primary is writing to the disk while
the secondary reads from the same sectors?

I thought about this some more and I'm wondering about the following
scenario:

NBD writes to secondary_disk and the guest reads from the disk at the
same time.  There is a coroutine mutex in qcow2.c that protects both
read and write requests, but only until they perform the data I/O.  It
may be possible that the read request from the Secondary VM could be
started before the NBD write but the preadv() syscall isn't entered
because of CPU scheduling decisions.  In the meantime the
secondary_disk->hidden_disk backup operation takes place.  With some
unlucky timing it may be possible for the Secondary VM to read the new
contents from secondary_disk instead of the old contents that were
backed up into hidden_disk.

Extra serialization would be needed.
block/backup.c:wait_for_overlapping_requests() and
block/io.c:mark_request_serialising() are good starting points for
solving this.

> +    cleanup_imgs();

Please use qtest_add_abrt_handler() so cleanup happens even when SIGABRT
is received.

signature.asc
Description: PGP signature

Re: [Qemu-block] [Qemu-devel] [PATCH v18 7/8] Implement new driver for block replication

Reply via email to