On this weekend, I had discovered that one of my VMs started to act weird.

Due to this, I found out that it and most of the other VMs I have,
have grown an qcow2 corruption.

So after some bisecting, digging through dumps, and debugging,
I think I found the root cause and a fix.

In addition to that I would like to raise few points:

1. I had to use qcow2-dump from (*)
 (it is also on github but without source. wierd...)
 to examine the L1/L2 tables and refcount tables.

 It seems that there were few attempts (**), (***) to make an official tool that
 would dump at least L1/L2/refcount tables, but nothing got accepted
 so far.

 I think that an official tool to dump at least basic qcow2 structure
 would be very helpful to discover/debug qcow2 corruptions.
 I had to study again the qcow2 format for this, so I can help with that.

2. 'qemu-img check -r all' is happy to create clusters that are referenced
 from multiple L2 entries.

 This isn't technically wrong, since write through any of these l2 entries
 will COW the cluster.

 However I would be happy to know that my images don't have such clusters,
 so I would like qemu-img check to at least notify about this.
 Can we add some -check-weird-but-legal flag to it to check this?

Few notes about the condition for this corruption to occur:

I have a bunch of VMs which are running each using two qcow2 files,
base and a snapshot on top of it, which I 'qemu-img commit' once in a while.
Discard is enabled to avoid wasting disk space.

Since discard is enabled, 'qemu-img commit' often discards data on the base 
disk.
The corruption happens after such a commit, and manifests in a stale L2
entry that was supposed to be discarded but now points to an unused cluster.

I wasn't able to reproduce this on small test case so far.

Best regards,
    Maxim Levitsky

(*)https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02760.html
(**) 
https://patchwork.kernel.org/project/qemu-devel/patch/20180328133845.20632-1-be...@igalia.com/
(***) 
https://patchwork.kernel.org/project/qemu-devel/cover/1578990137-308222-1-git-send-email-andrey.shinkev...@virtuozzo.com/

Maxim Levitsky (1):
  Fix qcow2 corruption on discard

 block/qcow2-cluster.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.26.2



Reply via email to