Look at 03 for the problem and fix. 01 is preparation and 02 is the test. Actually previous version of this thing is [PATCH v2(RFC) 0/3] qcow2: fix parallel rewrite and discard
Still [PATCH v3 0/6] qcow2: compressed write cache includes another fix (more complicated) for the bug, so this is called v4. So, what's new: It's still a CoRwlock based solution as suggested by Kevin. Now I think that "writer" of the lock should be code in update_refcount() which wants to set refcount to zero. If we consider only guest discard request as "writer" we may miss other sources of discarding host clusters (like rewriting compressed cluster to normal, maybe some snapshot operations, who knows what's more). And this means that we want to take rw-lock under qcow2 s->lock. And this brings ordering restriction for the two locks: if we want both locks taken, we should always take s->lock first, and never take s->lock when rw-lock is already taken (otherwise we get classic deadlock). This leads us to taking rd-lock for in-flight writes under s->lock in same critical section where cluster is allocated (or just got from metadata) and releasing after data writing completion. This in turn leads to a bit tricky logic around transferring rd-lock to task coroutine on normal write path (see 03).. But this is still simpler than inflight-write-counters solution in v3.. Vladimir Sementsov-Ogievskiy (3): qemu-io: add aio_discard iotests: add qcow2-discard-during-rewrite block/qcow2: introduce discard_rw_lock: fix discarding host clusters block/qcow2.h | 20 +++ block/qcow2-refcount.c | 22 ++++ block/qcow2.c | 73 +++++++++-- qemu-io-cmds.c | 117 ++++++++++++++++++ .../tests/qcow2-discard-during-rewrite | 99 +++++++++++++++ .../tests/qcow2-discard-during-rewrite.out | 17 +++ 6 files changed, 341 insertions(+), 7 deletions(-) create mode 100755 tests/qemu-iotests/tests/qcow2-discard-during-rewrite create mode 100644 tests/qemu-iotests/tests/qcow2-discard-during-rewrite.out -- 2.29.2