Race condition in overlayed qcow2?

dovgaluk Wed, 19 Feb 2020 06:33:11 -0800

Hi!

I encountered a problem with record/replay of QEMU execution and figuredout the following, whenQEMU is started with one virtual disk connected to the qcow2 image withapplied 'snapshot' option.

The patch d710cf575ad5fb3ab329204620de45bfe50caa53 "block/qcow2:introduce parallel subrequest handling in read and write"introduces some kind of race condition, which causes difference in thedata read from the disk.

I detected this by adding the following code, which logs IO operationchecksum. And this checksum may be different in different runs of thesame recorded execution.


logging in blk_aio_complete function:

qemu_log("%"PRId64": blk_aio_complete\n",replay_get_current_icount());

        QEMUIOVector *qiov = acb->rwco.iobuf;
        if (qiov && qiov->iov) {
            size_t i, j;
            uint64_t sum = 0;
            int count = 0;
            for (i = 0 ; i < qiov->niov ; ++i) {
                for (j = 0 ; j < qiov->iov[i].iov_len ; ++j) {
                    sum += ((uint8_t*)qiov->iov[i].iov_base)[j];
                    ++count;
                }
            }

qemu_log("--- iobuf offset %"PRIx64" len %x sum:%"PRIx64"\n", acb->rwco.offset, count, sum);

        }

I tried to get rid of aio task by patching qcow2_co_preadv_part:

ret = qcow2_co_preadv_task(bs, ret, cluster_offset, offset, cur_bytes,qiov, qiov_offset);

That change fixed a bug, but I have no idea what to debug next to figureout the exact reason of the failure.


Do you have any ideas or hints?

Pavel Dovgalyuk

Race condition in overlayed qcow2?

Reply via email to