Re: [Qemu-devel] KVM call agenda for Tuesday, June 19th

2012-07-11 Thread Dor Laor

On 06/19/2012 06:42 PM, Chegu Vinod wrote:

Hello,

Wanted to share some preliminary data from live migration experiments on a setup
that is perhaps one of the larger ones.

We used Juan's huge_memory patches (without the separate migration thread) and
measured the total migration time and the time taken for stage 3 (downtime).
Note: We didn't change the default downtime (30ms?). We had a private 10Gig
back-to-back link between the two hosts..and we set the migration speed to
10Gig.

The workloads chosen were ones that we could easily setup. All experiments
were done without using virsh/virt-manager (i.e. direct interaction with the
qemu monitor prompt).  Pl. see the data below.

As the guest size increased (and for busier the workloads) we observed that
network connections were getting dropped not only during the downtime (i.e.
stage 3) but also during at times during iterative pre-copy phase (i.e. stage
2).  Perhaps some of this will get fixed when we have the migration thread
implemented.

We had also briefly tried the proposed delta compression changes (easier to say
than XBZRLE :)) on a smaller configuration. For the simple workloads (perhaps
there was not much temporal locality in them) it didn't seem to show
improvements instead took much longer time to migrate (high cache miss
penalty?). Waiting for the updated version of the XBZRLE for further experiments
to see how well it scales on this larger set up...

FYI
Vinod

---
10VCPUs/128G
---
1) Idle guest
Total migration time : 124585 ms,
Stage_3_time : 941 ms ,
Total MB transferred : 2720


2) AIM7-compute (2000 users)
Total migration time : 123540 ms,
Stage_3_time : 726 ms ,
Total MB transferred : 3580

3) SpecJBB (modified to run 10 warehouse threads for a long duration of time)
Total migration time : 165720 ms,
Stage_3_time : 6851 ms ,
Total MB transferred : 19656


6.8s downtime may be unacceptable for some applications. Does it 
converges with maximum downtime of 1sec?
In theory this is where post copy can shine. But what we're missing in 
the (good) performance data is how the application perform during live 
migration. This is exactly where the live migration thread and dirtybit 
optimization should help us.


Our 'friends' have nice old analysis of live migration performance:
 - 
http://www.cl.cam.ac.uk/research/srg/netos/papers/2005-migration-nsdi-pre.pdf

 - http://www.vmware.com/files/pdf/techpaper/VMW_Netioc_BestPractices.pdf

Cheers,
Dor



4) Google SAT  (-s 3600 -C 5 -i 5)
Total migration time : 411827 ms,
Stage_3_time : 77807 ms ,
Total MB transferred : 142136



---
20VCPUs /256G
---

1) Idle  guest
Total migration time : 259938 ms,
Stage_3_time : 1998 ms ,
Total MB transferred : 5114

2) AIM7-compute (2000 users)
Total migration time : 261336 ms,
Stage_3_time : 2107 ms ,
Total MB transferred : 5473

3) SpecJBB (modified to run 20 warehouse threads for a long duration of time)
Total migration time : 390548 ms,
Stage_3_time : 19596 ms ,
Total MB transferred : 48109

4) Google SAT  (-s 3600 -C 10 -i 10)
Total migration time : 780150 ms,
Stage_3_time : 90346 ms ,
Total MB transferred : 251287


30VCPUs/384G
---

1) Idle guest
(qemu) Total migration time : 501704 ms,
Stage_3_time : 2835 ms ,
Total MB transferred : 15731


2) AIM7-compute (2000 users)
Total migration time : 496001 ms,
Stage_3_time : 3884 ms ,
Total MB transferred : 9375


3) SpecJBB (modified to run 30 warehouse threads for a long duration of time)
Total migration time : 611075 ms,
Stage_3_time : 17107 ms ,
Total MB transferred : 48862


4) Google SAT  (-s 3600 -C 15 -i 15)  (look at /tmp/kvm_30w_Goog)
Total migration time : 1348102 ms,
Stage_3_time : 128531 ms ,
Total MB transferred : 367524



---
40VCPUs/512G
---

1) Idle guest
Total migration time : 780257 ms,
Stage_3_time : 3770 ms ,
Total MB transferred : 13330


2) AIM7-compute (2000 users)
Total migration time : 720963 ms,
Stage_3_time : 3966 ms ,
Total MB transferred : 10595

3) SpecJBB (modified to run 40 warehouse threads for a long duration of time)
Total migration time : 863577 ms,
Stage_3_time : 25149 ms ,
Total MB transferred : 54685

4) Google SAT  (-s 3600 -C 20 -i 20)
Total migration time : 2585039 ms,
Stage_3_time : 177625 ms ,
Total MB transferred : 493575


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for Tuesday, June 19th

2012-07-11 Thread Dor Laor

On 06/19/2012 08:22 PM, Michael Roth wrote:

On Tue, Jun 19, 2012 at 11:34:42PM +0900, Takuya Yoshikawa wrote:

On Tue, 19 Jun 2012 09:01:36 -0500
Anthony Liguori anth...@codemonkey.ws wrote:


I'm not at all convinced that postcopy is a good idea.  There needs a clear
expression of what the value proposition is that's backed by benchmarks.  Those
benchmarks need to include latency measurements of downtime which so far, I've
not seen.

I don't want to take any postcopy patches until this discussion happens.


FWIW:

I rather see postcopy as a way of migrating guests forcibly and I know
a service in which such a way is needed: emergency migration.  There is
also a product which does live migration when some hardware problems are
detected (as a semi-FT solution) -- in such cases, we cannot wait until
the guest becomes calm.


Ignoring max downtime values when we've determined that the target is no
longer converging would be another option. Essentially having a
use_strict_max_downtime that can be set on a per-migration basis, where
if not set we can give up on maintaining the max_downtime when it's
been determined that progress is no longer being made.


There is no need for a new parameter. Management software like 
ovirt/virt-manager can track the mount of pages-to-migrate left and if 
the number start rising, realize that the current max limit won't 
converge and either increase the number or cancel the migration.






Although I am not certain whether QEMU can be used for such products,
it may be worth thinking about.

Thanks,
Takuya


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] KVM call agenda for Tuesday, June 19th

2012-07-11 Thread Vinod, Chegu


-Original Message-
From: Dor Laor [mailto:dl...@redhat.com] 
Sent: Wednesday, July 11, 2012 2:59 AM
To: Vinod, Chegu
Cc: kvm@vger.kernel.org
Subject: Re: [Qemu-devel] KVM call agenda for Tuesday, June 19th

On 06/19/2012 06:42 PM, Chegu Vinod wrote:
 Hello,

 Wanted to share some preliminary data from live migration experiments 
 on a setup that is perhaps one of the larger ones.

 We used Juan's huge_memory patches (without the separate migration 
 thread) and measured the total migration time and the time taken for stage 3 
 (downtime).
 Note: We didn't change the default downtime (30ms?). We had a 
 private 10Gig back-to-back link between the two hosts..and we set the 
 migration speed to 10Gig.

 The workloads chosen were ones that we could easily setup. All 
 experiments were done without using virsh/virt-manager (i.e. direct 
 interaction with the qemu monitor prompt).  Pl. see the data below.

 As the guest size increased (and for busier the workloads) we observed 
 that network connections were getting dropped not only during the downtime 
 (i.e.
 stage 3) but also during at times during iterative pre-copy phase 
 (i.e. stage 2).  Perhaps some of this will get fixed when we have the 
 migration thread implemented.

 We had also briefly tried the proposed delta compression changes 
 (easier to say than XBZRLE :)) on a smaller configuration. For the 
 simple workloads (perhaps there was not much temporal locality in 
 them) it didn't seem to show improvements instead took much longer 
 time to migrate (high cache miss penalty?). Waiting for the updated 
 version of the XBZRLE for further experiments to see how well it scales on 
 this larger set up...

 FYI
 Vinod

 ---
 10VCPUs/128G
 ---
 1) Idle guest
 Total migration time : 124585 ms,
 Stage_3_time : 941 ms ,
 Total MB transferred : 2720


 2) AIM7-compute (2000 users)
 Total migration time : 123540 ms,
 Stage_3_time : 726 ms ,
 Total MB transferred : 3580

 3) SpecJBB (modified to run 10 warehouse threads for a long duration 
 of time) Total migration time : 165720 ms, Stage_3_time : 6851 ms , 
 Total MB transferred : 19656

6.8s downtime may be unacceptable for some applications. Does it converges with 
maximum downtime of 1sec?
In theory this is where post copy can shine. But what we're missing in the 
(good) performance data is how the application perform during live migration. 
This is exactly where the live migration thread and dirtybit optimization 
should help us.

Our 'friends' have nice old analysis of live migration performance:
  -
http://www.cl.cam.ac.uk/research/srg/netos/papers/2005-migration-nsdi-pre.pdf
  - http://www.vmware.com/files/pdf/techpaper/VMW_Netioc_BestPractices.pdf

Cheers,
Dor





There have been some recent fixes (from Juan) that are supposed to honor the 
user requested downtime. I am in the middle of redoing some of my 
experiments...and will share when they are ready (in about 3-4 days).  Initial 
observations are that the time take for the total migration considerably 
increases but there are no observed stalls or ping timeouts etc. Will know more 
after I finish my experiments (i.e. the non-XBZRLE ones).

As expected the 10G [back -to-back] connection is not really getting saturated 
with the migration traffic... so the there is some other layer that is 
consuming time (possibly the overhead of  tracking dirty pages).  

I haven't yet  had the time to try to quantify the performance degradation on 
the workload during the live migration (stage 2)... need to look at that next. 

Thanks for the pointers to the old artcles. 

Thanks
Vinod



 4) Google SAT  (-s 3600 -C 5 -i 5)
 Total migration time : 411827 ms,
 Stage_3_time : 77807 ms ,
 Total MB transferred : 142136



 ---
 20VCPUs /256G
 ---

 1) Idle  guest
 Total migration time : 259938 ms,
 Stage_3_time : 1998 ms ,
 Total MB transferred : 5114

 2) AIM7-compute (2000 users)
 Total migration time : 261336 ms,
 Stage_3_time : 2107 ms ,
 Total MB transferred : 5473

 3) SpecJBB (modified to run 20 warehouse threads for a long duration 
 of time) Total migration time : 390548 ms, Stage_3_time : 19596 ms , 
 Total MB transferred : 48109

 4) Google SAT  (-s 3600 -C 10 -i 10)
 Total migration time : 780150 ms,
 Stage_3_time : 90346 ms ,
 Total MB transferred : 251287

 
 30VCPUs/384G
 ---

 1) Idle guest
 (qemu) Total migration time : 501704 ms, Stage_3_time : 2835 ms , 
 Total MB transferred : 15731


 2) AIM7-compute (2000 users)
 Total migration time : 496001 ms,
 Stage_3_time : 3884 ms ,
 Total MB transferred : 9375


 3) SpecJBB (modified to run 30 warehouse threads for a long duration 
 of time) Total migration time : 611075 ms, Stage_3_time : 17107 ms , 
 Total MB transferred : 48862


 4) Google SAT  (-s 3600 -C 15 -i 15)  (look at /tmp/kvm_30w_Goog) 
 Total migration time : 1348102 ms, Stage_3_time : 128531 ms , Total MB 
 transferred : 367524



 ---
 40VCPUs/512G
 ---

 1) Idle guest
 Total migration time : 780257 ms,
 Stage_3_time

Re: [Qemu-devel] KVM call agenda for Tuesday, June 19th

2012-07-11 Thread Takuya Yoshikawa
On Thu, 12 Jul 2012 02:02:24 +0100
Vinod, Chegu chegu_vi...@hp.com wrote:

 There have been some recent fixes (from Juan) that are supposed to honor the 
 user requested downtime. I am in the middle of redoing some of my 
 experiments...and will share when they are ready (in about 3-4 days).  
 Initial observations are that the time take for the total migration 
 considerably increases but there are no observed stalls or ping timeouts etc. 
 Will know more after I finish my experiments (i.e. the non-XBZRLE ones).
 
 As expected the 10G [back -to-back] connection is not really getting 
 saturated with the migration traffic... so the there is some other layer that 
 is consuming time (possibly the overhead of  tracking dirty pages).  
 
 I haven't yet  had the time to try to quantify the performance degradation on 
 the workload during the live migration (stage 2)... need to look at that 
 next. 
 

I recommend you to try the latest kvm.git next branch as well since
it now has Xiao's fast(lock-less) page fault handling work.

Although I am still testing that branch, it seems working well here.

Thanks,
Takuya

 Thanks for the pointers to the old artcles. 
 
 Thanks
 Vinod
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for Tuesday, June 19th

2012-06-19 Thread Juan Quintela
Juan Quintela quint...@redhat.com wrote:
 Hi

 Please send in any agenda items you are interested in covering.

 Anthony suggested for last week:
 - multithreading vhost (and general vhost improvements)

 I suggest:
 - status of migration: post-copy, IDL, XBRLE, huge memory, ...
   Will send an email with an status before tomorrow call.

XBRLE: v12 is coming today or so.


This three patches should be a no-brainer (just refactoring code).
1st one is shared with postcopy.

[PATCH v11 1/9] Add MigrationParams structure
[PATCH v11 5/9] Add uleb encoding/decoding functions
[PATCH v11 6/9] Add save_block_hdr function

This ones can be be the ones that we can discuss.

[PATCH v11 2/9] Add migration capabilites
[PATCH v11 3/9] Add XBZRLE documentation
[PATCH v11 4/9] Add cache handling functions
[PATCH v11 7/9] Add XBZRLE to ram_save_block and ram_save_live
[PATCH v11 8/9] Add set_cachesize command

Postcopy:  This is just refactoring that can be integrated.

[PATCH v2 01/41] arch_init: export sort_ram_list() and ram_save_block()
[PATCH v2 02/41] arch_init: export RAM_SAVE_xxx flags for postcopy
[PATCH v2 03/41] arch_init/ram_save: introduce constant for ram save version = 4
[PATCH v2 04/41] arch_init: refactor host_from_stream_offset()
[PATCH v2 05/41] arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case
[PATCH v2 06/41] arch_init: refactor ram_save_block()
[PATCH v2 07/41] arch_init/ram_save_live: factor out ram_save_limit
[PATCH v2 08/41] arch_init/ram_load: refactor ram_load
[PATCH v2 09/41] arch_init: introduce helper function to find ram block with id 
string
[PATCH v2 10/41] arch_init: simplify a bit by ram_find_block()
[PATCH v2 11/41] arch_init: factor out counting transferred bytes
[PATCH v2 12/41] arch_init: factor out setting last_block, last_offset
[PATCH v2 13/41] exec.c: factor out qemu_get_ram_ptr()
[PATCH v2 14/41] exec.c: export last_ram_offset()
[PATCH v2 15/41] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip
[PATCH v2 16/41] savevm: qemu_pending_size() to return pending buffered size
[PATCH v2 17/41] savevm, buffered_file: introduce method to drain buffer of 
buffered file
[PATCH v2 18/41] QEMUFile: add qemu_file_fd() for later use
[PATCH v2 19/41] savevm/QEMUFile: drop qemu_stdio_fd
[PATCH v2 20/41] savevm/QEMUFileSocket: drop duplicated member fd
[PATCH v2 21/41] savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to 
fd_close
[PATCH v2 22/41] savevm/QEMUFile: introduce qemu_fopen_fd
[PATCH v2 23/41] migration.c: remove redundant line in migrate_init()
[PATCH v2 24/41] migration: export migrate_fd_completed() and 
migrate_fd_cleanup()
[PATCH v2 25/41] migration: factor out parameters into MigrationParams
[PATCH v2 26/41] buffered_file: factor out buffer management logic
[PATCH v2 27/41] buffered_file: Introduce QEMUFileNonblock for nonblock write
[PATCH v2 28/41] buffered_file: add qemu_file to read/write to buffer in memory

This is postcopy properly.  From this one, postcopy needs to be the
things addressed on previous review, and from there probably (at least)
another review.  Thing to have in account is that the umem (or whatever
you want to call it), should be able to work over RDMA.  Anyone that
knows anything about RDMA to comment on this?

[PATCH v2 29/41] umem.h: import Linux umem.h
[PATCH v2 30/41] update-linux-headers.sh: teach umem.h to 
update-linux-headers.sh
[PATCH v2 31/41] configure: add CONFIG_POSTCOPY option
[PATCH v2 32/41] savevm: add new section that is used by postcopy
[PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option
[PATCH v2 34/41] postcopy outgoing: add -p and -n option to migrate command
[PATCH v2 35/41] postcopy: introduce helper functions for postcopy
[PATCH v2 36/41] postcopy: implement incoming part of postcopy live migration
[PATCH v2 37/41] postcopy: implement outgoing part of postcopy live migration
[PATCH v2 38/41] postcopy/outgoing: add forward, backward option to specify the 
size of prefault
[PATCH v2 39/41] postcopy/outgoing: implement prefault
[PATCH v2 40/41] migrate: add -m (movebg) option to migrate command
[PATCH v2 41/41] migration/postcopy: add movebg mode

Huge memory migration.
This ones should be trivial, and integrated.

[PATCH 1/7] Add spent time for migration
[PATCH 2/7] Add tracepoints for savevm section start/end
[PATCH 3/7] No need to iterate if we already are over the limit
[PATCH 4/7] Only TCG needs TLB handling
[PATCH 5/7] Only calculate expected_time for stage 2

This one is also trivial, but Anthony on previous reviews wanted to have
migration-thread before we integrated this one.

[PATCH 6/7] Exit loop if we have been there too long

This one, Anthony wanted a different approach improving bitmap
handling.  Not done yet.

[PATCH 7/7] Maintaing number of dirty pages

IDL patchset.  I am not against generating the VMState information, but
I am trying to understand how the patch works.  Notice that I don't grok
Python, this is is one of the reasos it is taking long.

This was 

Re: [Qemu-devel] KVM call agenda for Tuesday, June 19th

2012-06-19 Thread Anthony Liguori

On 06/19/2012 08:54 AM, Juan Quintela wrote:

Juan Quintelaquint...@redhat.com  wrote:

Hi

Please send in any agenda items you are interested in covering.

Anthony suggested for last week:
- multithreading vhost (and general vhost improvements)

I suggest:
- status of migration: post-copy, IDL, XBRLE, huge memory, ...
   Will send an email with an status before tomorrow call.


XBRLE: v12 is coming today or so.


This three patches should be a no-brainer (just refactoring code).
1st one is shared with postcopy.

[PATCH v11 1/9] Add MigrationParams structure
[PATCH v11 5/9] Add uleb encoding/decoding functions
[PATCH v11 6/9] Add save_block_hdr function

This ones can be be the ones that we can discuss.

[PATCH v11 2/9] Add migration capabilites
[PATCH v11 3/9] Add XBZRLE documentation
[PATCH v11 4/9] Add cache handling functions
[PATCH v11 7/9] Add XBZRLE to ram_save_block and ram_save_live
[PATCH v11 8/9] Add set_cachesize command

Postcopy:  This is just refactoring that can be integrated.

[PATCH v2 01/41] arch_init: export sort_ram_list() and ram_save_block()
[PATCH v2 02/41] arch_init: export RAM_SAVE_xxx flags for postcopy
[PATCH v2 03/41] arch_init/ram_save: introduce constant for ram save version = 4
[PATCH v2 04/41] arch_init: refactor host_from_stream_offset()
[PATCH v2 05/41] arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case
[PATCH v2 06/41] arch_init: refactor ram_save_block()
[PATCH v2 07/41] arch_init/ram_save_live: factor out ram_save_limit
[PATCH v2 08/41] arch_init/ram_load: refactor ram_load
[PATCH v2 09/41] arch_init: introduce helper function to find ram block with id 
string
[PATCH v2 10/41] arch_init: simplify a bit by ram_find_block()
[PATCH v2 11/41] arch_init: factor out counting transferred bytes
[PATCH v2 12/41] arch_init: factor out setting last_block, last_offset
[PATCH v2 13/41] exec.c: factor out qemu_get_ram_ptr()
[PATCH v2 14/41] exec.c: export last_ram_offset()
[PATCH v2 15/41] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip
[PATCH v2 16/41] savevm: qemu_pending_size() to return pending buffered size
[PATCH v2 17/41] savevm, buffered_file: introduce method to drain buffer of 
buffered file
[PATCH v2 18/41] QEMUFile: add qemu_file_fd() for later use
[PATCH v2 19/41] savevm/QEMUFile: drop qemu_stdio_fd
[PATCH v2 20/41] savevm/QEMUFileSocket: drop duplicated member fd
[PATCH v2 21/41] savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to 
fd_close
[PATCH v2 22/41] savevm/QEMUFile: introduce qemu_fopen_fd
[PATCH v2 23/41] migration.c: remove redundant line in migrate_init()
[PATCH v2 24/41] migration: export migrate_fd_completed() and 
migrate_fd_cleanup()
[PATCH v2 25/41] migration: factor out parameters into MigrationParams
[PATCH v2 26/41] buffered_file: factor out buffer management logic
[PATCH v2 27/41] buffered_file: Introduce QEMUFileNonblock for nonblock write
[PATCH v2 28/41] buffered_file: add qemu_file to read/write to buffer in memory

This is postcopy properly.  From this one, postcopy needs to be the
things addressed on previous review, and from there probably (at least)
another review.  Thing to have in account is that the umem (or whatever
you want to call it), should be able to work over RDMA.  Anyone that
knows anything about RDMA to comment on this?

[PATCH v2 29/41] umem.h: import Linux umem.h
[PATCH v2 30/41] update-linux-headers.sh: teach umem.h to 
update-linux-headers.sh
[PATCH v2 31/41] configure: add CONFIG_POSTCOPY option
[PATCH v2 32/41] savevm: add new section that is used by postcopy
[PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option
[PATCH v2 34/41] postcopy outgoing: add -p and -n option to migrate command
[PATCH v2 35/41] postcopy: introduce helper functions for postcopy
[PATCH v2 36/41] postcopy: implement incoming part of postcopy live migration
[PATCH v2 37/41] postcopy: implement outgoing part of postcopy live migration
[PATCH v2 38/41] postcopy/outgoing: add forward, backward option to specify the 
size of prefault
[PATCH v2 39/41] postcopy/outgoing: implement prefault
[PATCH v2 40/41] migrate: add -m (movebg) option to migrate command
[PATCH v2 41/41] migration/postcopy: add movebg mode


I'm not at all convinced that postcopy is a good idea.  There needs a clear 
expression of what the value proposition is that's backed by benchmarks.  Those 
benchmarks need to include latency measurements of downtime which so far, I've 
not seen.


I don't want to take any postcopy patches until this discussion happens.

Regards,

Anthony Liguori



Huge memory migration.
This ones should be trivial, and integrated.

[PATCH 1/7] Add spent time for migration
[PATCH 2/7] Add tracepoints for savevm section start/end
[PATCH 3/7] No need to iterate if we already are over the limit
[PATCH 4/7] Only TCG needs TLB handling
[PATCH 5/7] Only calculate expected_time for stage 2

This one is also trivial, but Anthony on previous reviews wanted to have
migration-thread before we integrated this one.


Re: [Qemu-devel] KVM call agenda for Tuesday, June 19th

2012-06-19 Thread Takuya Yoshikawa
On Tue, 19 Jun 2012 09:01:36 -0500
Anthony Liguori anth...@codemonkey.ws wrote:

 I'm not at all convinced that postcopy is a good idea.  There needs a clear 
 expression of what the value proposition is that's backed by benchmarks.  
 Those 
 benchmarks need to include latency measurements of downtime which so far, 
 I've 
 not seen.
 
 I don't want to take any postcopy patches until this discussion happens.

FWIW:

I rather see postcopy as a way of migrating guests forcibly and I know
a service in which such a way is needed: emergency migration.  There is
also a product which does live migration when some hardware problems are
detected (as a semi-FT solution) -- in such cases, we cannot wait until
the guest becomes calm.

Although I am not certain whether QEMU can be used for such products,
it may be worth thinking about.

Thanks,
Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for Tuesday, June 19th

2012-06-19 Thread Chegu Vinod
Hello,

Wanted to share some preliminary data from live migration experiments on a 
setup 
that is perhaps one of the larger ones.  

We used Juan's huge_memory patches (without the separate migration thread) 
and 
measured the total migration time and the time taken for stage 3 (downtime). 
Note: We didn't change the default downtime (30ms?). We had a private 10Gig 
back-to-back link between the two hosts..and we set the migration speed to 
10Gig. 

The workloads chosen were ones that we could easily setup. All experiments 
were done without using virsh/virt-manager (i.e. direct interaction with the 
qemu monitor prompt).  Pl. see the data below. 

As the guest size increased (and for busier the workloads) we observed that 
network connections were getting dropped not only during the downtime (i.e. 
stage 3) but also during at times during iterative pre-copy phase (i.e. stage 
2).  Perhaps some of this will get fixed when we have the migration thread 
implemented.

We had also briefly tried the proposed delta compression changes (easier to say 
than XBZRLE :)) on a smaller configuration. For the simple workloads (perhaps 
there was not much temporal locality in them) it didn't seem to show 
improvements instead took much longer time to migrate (high cache miss 
penalty?). Waiting for the updated version of the XBZRLE for further 
experiments 
to see how well it scales on this larger set up... 

FYI
Vinod

---
10VCPUs/128G
---
1) Idle guest
Total migration time : 124585 ms, 
Stage_3_time : 941 ms , 
Total MB transferred : 2720


2) AIM7-compute (2000 users)
Total migration time : 123540 ms, 
Stage_3_time : 726 ms , 
Total MB transferred : 3580

3) SpecJBB (modified to run 10 warehouse threads for a long duration of time)
Total migration time : 165720 ms, 
Stage_3_time : 6851 ms , 
Total MB transferred : 19656


4) Google SAT  (-s 3600 -C 5 -i 5)
Total migration time : 411827 ms, 
Stage_3_time : 77807 ms , 
Total MB transferred : 142136



---
20VCPUs /256G
---

1) Idle  guest
Total migration time : 259938 ms, 
Stage_3_time : 1998 ms , 
Total MB transferred : 5114

2) AIM7-compute (2000 users)
Total migration time : 261336 ms, 
Stage_3_time : 2107 ms , 
Total MB transferred : 5473

3) SpecJBB (modified to run 20 warehouse threads for a long duration of time)
Total migration time : 390548 ms, 
Stage_3_time : 19596 ms , 
Total MB transferred : 48109

4) Google SAT  (-s 3600 -C 10 -i 10)
Total migration time : 780150 ms, 
Stage_3_time : 90346 ms , 
Total MB transferred : 251287


30VCPUs/384G
---

1) Idle guest
(qemu) Total migration time : 501704 ms, 
Stage_3_time : 2835 ms , 
Total MB transferred : 15731


2) AIM7-compute (2000 users)
Total migration time : 496001 ms, 
Stage_3_time : 3884 ms , 
Total MB transferred : 9375


3) SpecJBB (modified to run 30 warehouse threads for a long duration of time)
Total migration time : 611075 ms, 
Stage_3_time : 17107 ms , 
Total MB transferred : 48862


4) Google SAT  (-s 3600 -C 15 -i 15)  (look at /tmp/kvm_30w_Goog)
Total migration time : 1348102 ms, 
Stage_3_time : 128531 ms , 
Total MB transferred : 367524



---
40VCPUs/512G
---

1) Idle guest
Total migration time : 780257 ms, 
Stage_3_time : 3770 ms , 
Total MB transferred : 13330


2) AIM7-compute (2000 users)
Total migration time : 720963 ms, 
Stage_3_time : 3966 ms , 
Total MB transferred : 10595

3) SpecJBB (modified to run 40 warehouse threads for a long duration of time)
Total migration time : 863577 ms, 
Stage_3_time : 25149 ms , 
Total MB transferred : 54685

4) Google SAT  (-s 3600 -C 20 -i 20)
Total migration time : 2585039 ms, 
Stage_3_time : 177625 ms , 
Total MB transferred : 493575


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for Tuesday, June 19th

2012-06-19 Thread Michael Roth
On Tue, Jun 19, 2012 at 03:54:23PM +0200, Juan Quintela wrote:
 Juan Quintela quint...@redhat.com wrote:
  Hi
 
  Please send in any agenda items you are interested in covering.
 
  Anthony suggested for last week:
  - multithreading vhost (and general vhost improvements)
 
  I suggest:
  - status of migration: post-copy, IDL, XBRLE, huge memory, ...
Will send an email with an status before tomorrow call.
 
 XBRLE: v12 is coming today or so.
 
 
 This three patches should be a no-brainer (just refactoring code).
 1st one is shared with postcopy.
 
 [PATCH v11 1/9] Add MigrationParams structure
 [PATCH v11 5/9] Add uleb encoding/decoding functions
 [PATCH v11 6/9] Add save_block_hdr function
 
 This ones can be be the ones that we can discuss.
 
 [PATCH v11 2/9] Add migration capabilites
 [PATCH v11 3/9] Add XBZRLE documentation
 [PATCH v11 4/9] Add cache handling functions
 [PATCH v11 7/9] Add XBZRLE to ram_save_block and ram_save_live
 [PATCH v11 8/9] Add set_cachesize command
 
 Postcopy:  This is just refactoring that can be integrated.
 
 [PATCH v2 01/41] arch_init: export sort_ram_list() and ram_save_block()
 [PATCH v2 02/41] arch_init: export RAM_SAVE_xxx flags for postcopy
 [PATCH v2 03/41] arch_init/ram_save: introduce constant for ram save version 
 = 4
 [PATCH v2 04/41] arch_init: refactor host_from_stream_offset()
 [PATCH v2 05/41] arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE 
 case
 [PATCH v2 06/41] arch_init: refactor ram_save_block()
 [PATCH v2 07/41] arch_init/ram_save_live: factor out ram_save_limit
 [PATCH v2 08/41] arch_init/ram_load: refactor ram_load
 [PATCH v2 09/41] arch_init: introduce helper function to find ram block with 
 id string
 [PATCH v2 10/41] arch_init: simplify a bit by ram_find_block()
 [PATCH v2 11/41] arch_init: factor out counting transferred bytes
 [PATCH v2 12/41] arch_init: factor out setting last_block, last_offset
 [PATCH v2 13/41] exec.c: factor out qemu_get_ram_ptr()
 [PATCH v2 14/41] exec.c: export last_ram_offset()
 [PATCH v2 15/41] savevm: export qemu_peek_buffer, qemu_peek_byte, 
 qemu_file_skip
 [PATCH v2 16/41] savevm: qemu_pending_size() to return pending buffered size
 [PATCH v2 17/41] savevm, buffered_file: introduce method to drain buffer of 
 buffered file
 [PATCH v2 18/41] QEMUFile: add qemu_file_fd() for later use
 [PATCH v2 19/41] savevm/QEMUFile: drop qemu_stdio_fd
 [PATCH v2 20/41] savevm/QEMUFileSocket: drop duplicated member fd
 [PATCH v2 21/41] savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to 
 fd_close
 [PATCH v2 22/41] savevm/QEMUFile: introduce qemu_fopen_fd
 [PATCH v2 23/41] migration.c: remove redundant line in migrate_init()
 [PATCH v2 24/41] migration: export migrate_fd_completed() and 
 migrate_fd_cleanup()
 [PATCH v2 25/41] migration: factor out parameters into MigrationParams
 [PATCH v2 26/41] buffered_file: factor out buffer management logic
 [PATCH v2 27/41] buffered_file: Introduce QEMUFileNonblock for nonblock write
 [PATCH v2 28/41] buffered_file: add qemu_file to read/write to buffer in 
 memory
 
 This is postcopy properly.  From this one, postcopy needs to be the
 things addressed on previous review, and from there probably (at least)
 another review.  Thing to have in account is that the umem (or whatever
 you want to call it), should be able to work over RDMA.  Anyone that
 knows anything about RDMA to comment on this?
 
 [PATCH v2 29/41] umem.h: import Linux umem.h
 [PATCH v2 30/41] update-linux-headers.sh: teach umem.h to 
 update-linux-headers.sh
 [PATCH v2 31/41] configure: add CONFIG_POSTCOPY option
 [PATCH v2 32/41] savevm: add new section that is used by postcopy
 [PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option
 [PATCH v2 34/41] postcopy outgoing: add -p and -n option to migrate command
 [PATCH v2 35/41] postcopy: introduce helper functions for postcopy
 [PATCH v2 36/41] postcopy: implement incoming part of postcopy live migration
 [PATCH v2 37/41] postcopy: implement outgoing part of postcopy live migration
 [PATCH v2 38/41] postcopy/outgoing: add forward, backward option to specify 
 the size of prefault
 [PATCH v2 39/41] postcopy/outgoing: implement prefault
 [PATCH v2 40/41] migrate: add -m (movebg) option to migrate command
 [PATCH v2 41/41] migration/postcopy: add movebg mode
 
 Huge memory migration.
 This ones should be trivial, and integrated.
 
 [PATCH 1/7] Add spent time for migration
 [PATCH 2/7] Add tracepoints for savevm section start/end
 [PATCH 3/7] No need to iterate if we already are over the limit
 [PATCH 4/7] Only TCG needs TLB handling
 [PATCH 5/7] Only calculate expected_time for stage 2
 
 This one is also trivial, but Anthony on previous reviews wanted to have
 migration-thread before we integrated this one.
 
 [PATCH 6/7] Exit loop if we have been there too long
 
 This one, Anthony wanted a different approach improving bitmap
 handling.  Not done yet.
 
 [PATCH 7/7] Maintaing number of dirty pages
 
 IDL patchset.  I am not against 

Re: [Qemu-devel] KVM call agenda for Tuesday, June 19th

2012-06-19 Thread Michael Roth
On Tue, Jun 19, 2012 at 11:34:42PM +0900, Takuya Yoshikawa wrote:
 On Tue, 19 Jun 2012 09:01:36 -0500
 Anthony Liguori anth...@codemonkey.ws wrote:
 
  I'm not at all convinced that postcopy is a good idea.  There needs a clear 
  expression of what the value proposition is that's backed by benchmarks.  
  Those 
  benchmarks need to include latency measurements of downtime which so far, 
  I've 
  not seen.
  
  I don't want to take any postcopy patches until this discussion happens.
 
 FWIW:
 
 I rather see postcopy as a way of migrating guests forcibly and I know
 a service in which such a way is needed: emergency migration.  There is
 also a product which does live migration when some hardware problems are
 detected (as a semi-FT solution) -- in such cases, we cannot wait until
 the guest becomes calm.

Ignoring max downtime values when we've determined that the target is no
longer converging would be another option. Essentially having a
use_strict_max_downtime that can be set on a per-migration basis, where
if not set we can give up on maintaining the max_downtime when it's
been determined that progress is no longer being made.

 
 Although I am not certain whether QEMU can be used for such products,
 it may be worth thinking about.
 
 Thanks,
   Takuya
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for Tuesday, June 19th

2012-06-18 Thread Juan Quintela

Hi

Please send in any agenda items you are interested in covering.

Anthony suggested for last week:
- multithreading vhost (and general vhost improvements)

I suggest:
- status of migration: post-copy, IDL, XBRLE, huge memory, ...
  Will send an email with an status before tomorrow call.

Thanks, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html