Hi!
I encountered a problem with record/replay of QEMU execution and figured
out the following, when
QEMU is started with one virtual disk connected to the qcow2 image with
applied 'snapshot' option.
The patch d710cf575ad5fb3ab329204620de45bfe50caa53 "block/qcow2:
introduce parallel subrequest handling in read and write"
introduces some kind of race condition, which causes difference in the
data read from the disk.
I detected this by adding the following code, which logs IO operation
checksum. And this checksum may be different in different runs of the
same recorded execution.
logging in blk_aio_complete function:
qemu_log("%"PRId64": blk_aio_complete\n",
replay_get_current_icount());
QEMUIOVector *qiov = acb->rwco.iobuf;
if (qiov && qiov->iov) {
size_t i, j;
uint64_t sum = 0;
int count = 0;
for (i = 0 ; i < qiov->niov ; ++i) {
for (j = 0 ; j < qiov->iov[i].iov_len ; ++j) {
sum += ((uint8_t*)qiov->iov[i].iov_base)[j];
++count;
}
}
qemu_log("--- iobuf offset %"PRIx64" len %x sum:
%"PRIx64"\n", acb->rwco.offset, count, sum);
}
I tried to get rid of aio task by patching qcow2_co_preadv_part:
ret = qcow2_co_preadv_task(bs, ret, cluster_offset, offset, cur_bytes,
qiov, qiov_offset);
That change fixed a bug, but I have no idea what to debug next to figure
out the exact reason of the failure.
Do you have any ideas or hints?
Pavel Dovgalyuk