>> The single precopy lazy pass would consist of clearing the dirty
>> bitmap, starting precopy, then if any page is found dirty by the time
>> precopy tries to send it, we skip it. We only send those pages in
>> precopy that haven't been modified yet by the time we reach them in
>> precopy.
>>
>> Pages heavily modified will be sent purely through
>> postcopy. Ultimately postcopy will be a page sorting feature to
>> massively decrease the downtime latency, and to reduce to 2*ramsize
>> the maximum amount of data transferred on the network without having
>> to slow down the guest artificially. We'll also know exactly the
>> maximum time in advance that it takes to migrate a large host no
>> matter the load in it (2*ramsize divided by the network bandwidth
>> available at the migration time). It'll be totally deterministic, no
>> black magic slowdowns anymore.
> 
> There is a trade off;  killing the precopy does reduce network bandwidth,
> but the other side of it is that you would incur more postcopy round trips,
> so your average latency will probably increase.
> 

I agree with David on the latency issue. I (along with my colleague)
have tried the idea of single iteration precopy and then postcopy (with
our own version of pre+post implementation). In case of workloads with
huge writable working set size, the VM remains a bit inactive, because
of transfer of pages. We coined a new term i.e. perceivable downtime,
which can be measured for workloads running some network intensive tasks.

The multiple postcopy round trips will certainly worsen the performance
of a memory intensive workload like mcf of SPECCPU 2006 or even
memcached based guest is migrated (some of the workloads on which we
tested our prototype).

Currently, I don't know how does David's postcopy implementation handles
multiple pages, which I will try to investigate in sometime.

--

Sanidhya

Reply via email to