On 01/11/2016 07:33 PM, Kevin Wolf wrote:
Am 24.12.2015 um 06:43 hat Denis V. Lunev geschrieben:
On 12/22/2015 07:46 PM, Kevin Wolf wrote:
Enough innocent images have died because users called 'qemu-img snapshot' while
the VM was still running. Educating the users doesn't seem to be a working
strategy, so this series adds locking to qcow2 that refuses to access the image
read-write from two processes.

Eric, this will require a libvirt update to deal with qemu crashes which leave
locked images behind. The simplest thinkable way would be to unconditionally
override the lock in libvirt whenever the option is present. In that case,
libvirt VMs would be protected against concurrent non-libvirt accesses, but not
the other way round. If you want more than that, libvirt would have to check
somehow if it was its own VM that used the image and left the lock behind. I
imagine that can't be too hard either.

Also note that this kind of depends on Max's bdrv_close_all() series, but only
in order to pass test case 142. This is not a bug in this series, but a
preexisting one (bs->file can be closed before bs), and it becomes apparent
when qemu fails to unlock an image due to this bug. Max's series fixes this.


This approach has a hole with qcow2_invalidate_cache()
The lock is released (and can't be kept by design) in
between qcow2_close()/qcow2_open() sequences if
I understand this correctly.
qcow2_invalidate_cache() is only called with BDRV_O_INCOMING set, i.e.
the instance isn't the current user and doesn't release the lock. It
requires, however, that a previous user has released the lock, otherwise
the qcow2_open() would fail.

But even if qcow2_invalidate_cache() was called for an image that is a
user, the behaviour wouldn't be wrong. In the period while the flag is
cleared, there is no write access to the image file. If someone were
opening the image for another process at this point, they would get the
image, but again qcow2_open() would fail and we wouldn't corrupt the
image.

And finally, the goal of this series is to protect the users against
stupid mistakes, not against malicious acts. Even if there were some
windows in which the image wouldn't be protected (though I don't think
they exist), having the common case safe would already be a huge
improvement over the current state.

Kevin
but how we will recover after node powerloss (which will happen
sooner or later)?

We are doomed in this case to make a blind guess whether
we are allowed to clear the flag or not.

Den

Reply via email to