>> The single precopy lazy pass would consist of clearing the dirty >> bitmap, starting precopy, then if any page is found dirty by the time >> precopy tries to send it, we skip it. We only send those pages in >> precopy that haven't been modified yet by the time we reach them in >> precopy. >> >> Pages heavily modified will be sent purely through >> postcopy. Ultimately postcopy will be a page sorting feature to >> massively decrease the downtime latency, and to reduce to 2*ramsize >> the maximum amount of data transferred on the network without having >> to slow down the guest artificially. We'll also know exactly the >> maximum time in advance that it takes to migrate a large host no >> matter the load in it (2*ramsize divided by the network bandwidth >> available at the migration time). It'll be totally deterministic, no >> black magic slowdowns anymore. > > There is a trade off; killing the precopy does reduce network bandwidth, > but the other side of it is that you would incur more postcopy round trips, > so your average latency will probably increase. >
I agree with David on the latency issue. I (along with my colleague) have tried the idea of single iteration precopy and then postcopy (with our own version of pre+post implementation). In case of workloads with huge writable working set size, the VM remains a bit inactive, because of transfer of pages. We coined a new term i.e. perceivable downtime, which can be measured for workloads running some network intensive tasks. The multiple postcopy round trips will certainly worsen the performance of a memory intensive workload like mcf of SPECCPU 2006 or even memcached based guest is migrated (some of the workloads on which we tested our prototype). Currently, I don't know how does David's postcopy implementation handles multiple pages, which I will try to investigate in sometime. -- Sanidhya