On 2015-02-27 at 13:09, Max Reitz wrote:
On 2015-02-27 at 12:42, Paolo Bonzini wrote:
On 27/02/2015 15:05, Max Reitz wrote:
Concurrently modifying the bmap does not seem to be a good idea;
this patch adds
a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for
what
can go wrong without.
Cc: qemu-stable <qemu-sta...@nongnu.org>
Signed-off-by: Max Reitz <mre...@redhat.com>
---
v2:
- Make the mutex cover vdi_co_write() completely [Kevin]
- Add a TODO comment [Kevin]
I think I know what the bug is. Suppose you have two concurrent writes
to a non-allocated block, one at 16K...32K (in bytes) and one at
32K...48K. The first write is enlarged to contain zeros, the second is
not. Then you have two writes in flight:
0 zeros
... zeros
16K data1
... data1
32K zeros data2
... zeros data2
48K zeros
... zeros
64K
And the contents of 32K...48K are undefined. If the above diagnosis is
correct, I'm not even sure why Max's v1 patch worked...
Maybe that's an issue, too; but the test case I sent out does 1 MB
requests (and it fails), so this shouldn't matter there.
Considering that test case didn't work for Stefan (Weil), and it fails
in a pretty strange way for me (no output from the qemu-io command at
all, and while most reads from raw were successful, all reads from vdi
failed (the pattern verification failed), maybe that's something
completely different.
Indeed, when I do sub-MB writes, I get sporadic errors which seem much
more related to the original bug report, so it's probably the issue you
found that's the real problem.
Also, my test case suddenly stopped reproducing the issue on my HDD and
only does it on tmpfs. Weird.
Max
An optimized fix could be to use a CoRwLock, then:
Yes, I'm actually already working on that.
Max
- take it shared (read) around the write in the
"VDI_IS_ALLOCATED(bmap_entry)" path
- take it exclusive (write) around the write in the
"!VDI_IS_ALLOCATED(bmap_entry)" path
Paolo