Jamie Lokier wrote:
Anthony Liguori wrote:
This patch replaces the static memory savevm/loadvm handler with a
"live" one. This handler is used even if performing a non-live
migration.
Excellent. One of the annoyances of savevm currently is it pauses the
VM for a significant time, so you can't use it to snapshot production
systems being used.
qcow2 needs some modification to allow this, but yeah, that's on my
todo. When you do a savevm today, you write everything to a chunk of
qcow2 file (presumably at the end). The only thing keeping others from
allocating over you is that you're essentially holding the big qemu lock
(because we're single threaded). With an asynchronous savevm, this no
longer holds. So what we really need to do, is let snapshots chain
within a qcow2 file. We can then write chunks of savevm data at a time
and chain the chunks together.
Shouldn't be that hard and should be reasonable to do in a backwards
compatible way.
The key difference between this handler and the previous is that each page is
prefixed with the address of the page. The QEMUFile rate limiting code, in
combination with the live migration dirty tracking bits, is used to determine
which pages should be sent and how many should be sent.
The live save code "converges" when the number of dirty pages
reaches a fixed amount. Currently, this is 10 pages. This is
something that should eventually be derived from whatever the
bandwidth limitation is.
Does this mean that a snapshot could record the same page many times,
perhaps even unbounded, while the guest is dirtying pages at a high
rate? Or is the guest dirtying rate limited too to ensure the file
writer will converge in bounded time?
With synchronous savevm (non-live), it's all deterministic. Everything
starts out dirty and nothing will get dirtied again because the guest
isn't running. With asynchronous savevm, it's indeterministic.
In general, you can't avoid the indeterminism. In practice, you usually
converge quickly so simply having a max iterations where if you exceed,
you stop the guest and revert to a synchronous savevm is completely
reasonable.
The other options would be to fail after a certain number of iterations
or just completely punt to the management tools and provide a mechanism
to cancel an existing live migration if it takes too long. This
functionality exists in KVM, I simple need to add it to this patch
series. It's quite simple really.
Regards,
Anthony Liguori
Thanks,
-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html