On 06/19/2012 06:42 PM, Chegu Vinod wrote:
Hello,

Wanted to share some preliminary data from live migration experiments on a setup
that is perhaps one of the larger ones.

We used Juan's "huge_memory" patches (without the separate migration thread) and
measured the total migration time and the time taken for stage 3 ("downtime").
Note: We didn't change the default "downtime" (30ms?). We had a private 10Gig
back-to-back link between the two hosts..and we set the migration speed to
10Gig.

The "workloads" chosen were ones that we could easily setup. All experiments
were done without using virsh/virt-manager (i.e. direct interaction with the
qemu monitor prompt).  Pl. see the data below.

As the guest size increased (and for busier the workloads) we observed that
network connections were getting dropped not only during the "downtime" (i.e.
stage 3) but also during at times during iterative pre-copy phase (i.e. stage
2).  Perhaps some of this will get fixed when we have the migration thread
implemented.

We had also briefly tried the proposed delta compression changes (easier to say
than XBZRLE :)) on a smaller configuration. For the simple workloads (perhaps
there was not much temporal locality in them) it didn't seem to show
improvements instead took much longer time to migrate (high cache miss
penalty?). Waiting for the updated version of the XBZRLE for further experiments
to see how well it scales on this larger set up...

FYI
Vinod

---
10VCPUs/128G
---
1) Idle guest
Total migration time : 124585 ms,
Stage_3_time : 941 ms ,
Total MB transferred : 2720


2) AIM7-compute (2000 users)
Total migration time : 123540 ms,
Stage_3_time : 726 ms ,
Total MB transferred : 3580

3) SpecJBB (modified to run 10 warehouse threads for a long duration of time)
Total migration time : 165720 ms,
Stage_3_time : 6851 ms ,
Total MB transferred : 19656

6.8s downtime may be unacceptable for some applications. Does it converges with maximum downtime of 1sec? In theory this is where post copy can shine. But what we're missing in the (good) performance data is how the application perform during live migration. This is exactly where the live migration thread and dirtybit optimization should help us.

Our 'friends' have nice old analysis of live migration performance:
- http://www.cl.cam.ac.uk/research/srg/netos/papers/2005-migration-nsdi-pre.pdf
 - http://www.vmware.com/files/pdf/techpaper/VMW_Netioc_BestPractices.pdf

Cheers,
Dor


4) Google SAT  (-s 3600 -C 5 -i 5)
Total migration time : 411827 ms,
Stage_3_time : 77807 ms ,
Total MB transferred : 142136



---
20VCPUs /256G
---

1) Idle  guest
Total migration time : 259938 ms,
Stage_3_time : 1998 ms ,
Total MB transferred : 5114

2) AIM7-compute (2000 users)
Total migration time : 261336 ms,
Stage_3_time : 2107 ms ,
Total MB transferred : 5473

3) SpecJBB (modified to run 20 warehouse threads for a long duration of time)
Total migration time : 390548 ms,
Stage_3_time : 19596 ms ,
Total MB transferred : 48109

4) Google SAT  (-s 3600 -C 10 -i 10)
Total migration time : 780150 ms,
Stage_3_time : 90346 ms ,
Total MB transferred : 251287

----
30VCPUs/384G
---

1) Idle guest
(qemu) Total migration time : 501704 ms,
Stage_3_time : 2835 ms ,
Total MB transferred : 15731


2) AIM7-compute (2000 users)
Total migration time : 496001 ms,
Stage_3_time : 3884 ms ,
Total MB transferred : 9375


3) SpecJBB (modified to run 30 warehouse threads for a long duration of time)
Total migration time : 611075 ms,
Stage_3_time : 17107 ms ,
Total MB transferred : 48862


4) Google SAT  (-s 3600 -C 15 -i 15)  (look at /tmp/kvm_30w_Goog)
Total migration time : 1348102 ms,
Stage_3_time : 128531 ms ,
Total MB transferred : 367524



---
40VCPUs/512G
---

1) Idle guest
Total migration time : 780257 ms,
Stage_3_time : 3770 ms ,
Total MB transferred : 13330


2) AIM7-compute (2000 users)
Total migration time : 720963 ms,
Stage_3_time : 3966 ms ,
Total MB transferred : 10595

3) SpecJBB (modified to run 40 warehouse threads for a long duration of time)
Total migration time : 863577 ms,
Stage_3_time : 25149 ms ,
Total MB transferred : 54685

4) Google SAT  (-s 3600 -C 20 -i 20)
Total migration time : 2585039 ms,
Stage_3_time : 177625 ms ,
Total MB transferred : 493575


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to