Il 25/10/2013 06:58, Lei Li ha scritto:
> Right now just has inaccurate numbers without the new vmsplice, which
> based on
> the result from info migrate, as the guest ram size increases, although the
> 'total time' is number of times less compared with the current live
> migration, but the 'downtime' performs badly.

Of course.
> 
> For a 1GB ram guest,
> 
> total time: 702 milliseconds
> downtime: 692 milliseconds
> 
> And when the ram size of guest increasesexponentially, those numbers are
> proportional to it.
>  
> I will make a list of the performance with the new vmsplice later, I am
> sure it'd be much better than this at least.

Yes, please.  Is the memory usage is still 2x without vmsplice?

I think you have a nice proof of concept, but on the other hand this
probably needs to be coupled with some kind of postcopy live migration,
that is:

* the source starts sending data

* but the destination starts running immediately

* if the machine needs a page that is missing, the destination asks the
source to send it

* as soon as it arrives, the destination can restart

Using postcopy is problematic for reliability: if the destination fails,
the virtual machine is lost because the source doesn't have the latest
content of memory.  However, this is a much, much smaller problem for
live QEMU upgrade where the network cannot fail.

If you do this, you can achieve pretty much instantaneous live upgrade,
well within your original 200 ms goals.  But the flipping code with
vmsplice should be needed anyway to avoid doubling memory usage, and
it's looking pretty good in this version already!  I'm relieved that the
RDMA code was designed right!

Paolo


Reply via email to