On 04/15/2013 04:34 AM, Paolo Bonzini wrote:
Il 15/04/2013 03:10, Michael R. Hines ha scritto:
And when someone writes them one day, we'll have to carry the old code
around for interoperability as well. Not pretty.  To avoid that, you
need to explicitly say in the documenation that it's experimental and
unsupported.

That's what protocols are for.

As I've already said, I've incorporated this into the design of the
protocol
already.

The protocol already has a field called "repeat" which allows a user to
request multiple chunk registrations at the same time.

If you insist, I can add a capability / command to the protocol called
"unregister chunk",
but I'm not volunteering to implement that command as I don't have any data
showing it to be of any value.
Implementing it on the destination side would be of value because it
would make the implementation interoperable.

A very basic implementation would be "during the bulk phase, unregister
the previous chunk every time you register a chunk".  It would work
great when migrating an idle guest, for example.  It would probably be
faster than TCP (which is now at 4.2 Gbps).

On one hand this should not block merging the patches; on the other
hand, "agreeing to disagree" without having done any test is not very
fruitful.  You can disagree on the priorities (and I agree with you on
this), but what mst is proposing is absolutely reasonable.

Paolo

Ok, I think I understand the disconnect here: So, let's continue to use
the above example that you described and let me ask another question.

Let's say the above mentioned idle VM is chosen, for whatever reason,
*not* to use TCP migration, and instead use RDMA. (I recommend against
choosing RDMA in the current docs, but let's stick to this example for
the sake of argument).

Now, in this example, let's say the migration starts up and the hypervisor
has run out of physical memory and starts swapping during the migration.
(also for the sake of argument).

The next thing that would immediately happen is the
next IB verbs function call: "ib_reg_mr()".

This function call would probably fail because there's nothing else left to pin
and the function call would return an error.

So my question is: Is it not sufficient to send a message back to the primary-VM
side of the connection which says:

"Your migration cannot proceed anymore, please resume the VM and try again somewhere else".

In this case, both the system administrator and the virtual machine are safe,
nothing has been killed, nothing has crashed, and the management software
can proceed to make a new management decision.

Is there something wrong with this sequence of events?

- Michael



Reply via email to