Am 13.09.2010 15:42, schrieb Anthony Liguori:
> On 09/13/2010 08:39 AM, Kevin Wolf wrote:
>>> Yeah, one of the key design points of live migration is to minimize the
>>> number of failure scenarios where you lose a VM.  If someone typed the
>>> wrong command line or shared storage hasn't been mounted yet and we
>>> delay failure until live migration is in the critical path, that would
>>> be terribly unfortunate.
>>>      
>> We would catch most of them if we try to open the image when migration
>> starts and immediately close it again until migration is (almost)
>> completed, so that no other code can possibly use it before the source
>> has really closed it.
>>    
> 
> I think the only real advantage is that we fix NFS migration, right?

That's the one that we know about, yes.

The rest is not a specific scenario, but a strong feeling that having an
image opened twice at the same time feels dangerous. As soon as an
open/close sequence writes to the image for some format, we probably
have a bug. For example, what about this mounted flag that you were
discussing for QED?

> But if we do invalidate_cache() as you suggested with a close/open of 
> the qcow2 layer, and also acquire and release a lock in the file layer 
> by propagating the invalidate_cache(), that should work robustly with NFS.
> 
> I think that's a simpler change.  Do you see additional advantages to 
> delaying the open?

Just that it makes it very obvious if a device model is doing bad things
and accessing the image before it should. The difference is a failed
request vs. silently corrupted data.

Kevin

Reply via email to