On this weekend, I had discovered that one of my VMs started to act weird. Due to this, I found out that it and most of the other VMs I have, have grown an qcow2 corruption.
So after some bisecting, digging through dumps, and debugging, I think I found the root cause and a fix. In addition to that I would like to raise few points: 1. I had to use qcow2-dump from (*) (it is also on github but without source. wierd...) to examine the L1/L2 tables and refcount tables. It seems that there were few attempts (**), (***) to make an official tool that would dump at least L1/L2/refcount tables, but nothing got accepted so far. I think that an official tool to dump at least basic qcow2 structure would be very helpful to discover/debug qcow2 corruptions. I had to study again the qcow2 format for this, so I can help with that. 2. 'qemu-img check -r all' is happy to create clusters that are referenced from multiple L2 entries. This isn't technically wrong, since write through any of these l2 entries will COW the cluster. However I would be happy to know that my images don't have such clusters, so I would like qemu-img check to at least notify about this. Can we add some -check-weird-but-legal flag to it to check this? Few notes about the condition for this corruption to occur: I have a bunch of VMs which are running each using two qcow2 files, base and a snapshot on top of it, which I 'qemu-img commit' once in a while. Discard is enabled to avoid wasting disk space. Since discard is enabled, 'qemu-img commit' often discards data on the base disk. The corruption happens after such a commit, and manifests in a stale L2 entry that was supposed to be discarded but now points to an unused cluster. I wasn't able to reproduce this on small test case so far. Best regards, Maxim Levitsky (*)https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02760.html (**) https://patchwork.kernel.org/project/qemu-devel/patch/20180328133845.20632-1-be...@igalia.com/ (***) https://patchwork.kernel.org/project/qemu-devel/cover/1578990137-308222-1-git-send-email-andrey.shinkev...@virtuozzo.com/ Maxim Levitsky (1): Fix qcow2 corruption on discard block/qcow2-cluster.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.26.2