Anthony Liguori <anth...@codemonkey.ws> wrote:
> On 06/16/2010 08:11 AM, Juan Quintela wrote:

> It's only ensured if you've got the same disk image running on another
> machine.  Considering that we support migrating from a file and we
> support migrating block devices, I don't think it's practical.
>
>> - outgoing migration
>>
>> After sucessful migration, we can issue "cont" command in source, and
>> having source and target running at the same time ->  disk corruption
>> again.
>>
>> My suggestion:
>> - add a third state "incoming", and cont/stop don't work on that state
>> - add a fourth state "migrated", and "cont" gives an explicit error, and you
>>    have to run "cont --force" or "cont" twice (whatever) to get it to 
>> continue.
>>    
>
> Very few users are going to do manual migration like this and those
> that do have no good reason to execute cont in either of these
> scenarios.

as of today, libvirt uses it (guess who filled that bug to me).

>  A --force command like this is equivalent to popping up a
> message box saying "are you sure you really want to do this" which
> most users find to be extremely annoying.

I had to debug this one from testers/field.  They were testing things
and it was very "practical" to launch guest on machine A, configure
whatever they wanted, migrate to machine B.  test whatever on machine B.
back to machine A, continue.

You can guess what happened.  The problem here is that qemu is not
giving user the _minimal_ advise that something could go wrong.  And it
is not going to be wrong, it is going to cause disk corruption for sure :(

> We should try to inform users when it's likely that they'll stumble
> upon a dangerous action.  cache=volatile is a good example of this
> because a user could have used it pretty easily and it's a reasonable
> expectation that we wouldn't expose a feature that could lead to
> corruption in obscure cases.

This is not _so_ obscure if you run qemu by hand :(
you have a nice "(qemu)" prompt, and if you issue "cont", bad things happen.

> If a user executes cont in either of these scenarios and has two
> copies of a virtual machine running accessing the same resources, then
> they surely ought to expect bad behavior.

It is not _so_ easy O:-).
Consider the example that I showed you:

(host A)                (host B)
launch qemu             launch qemu -incoming
migrate host B
                        .....
                        do your things
                        exit/poweroff/...

At this point you have a qemu launched on machine A, with nothing on
machine B.  running "cont" on machine A, have disastreus consecuences,
and there is no way to prevent it :(

As I have received this bug from users a couple of times, I would like
to be able to prevent this case.

Later, Juan.

Reply via email to