Re: [RFC PATCH 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service

2014-08-01 Thread Dr. David Alan Gilbert
* Yang Hongyang (yan...@cn.fujitsu.com) wrote:
 Virtual machine (VM) replication is a well known technique for
 providing application-agnostic software-implemented hardware fault
 tolerance non-stop service. COLO is a high availability solution.
 Both primary VM (PVM) and secondary VM (SVM) run in parallel. They
 receive the same request from client, and generate response in parallel
 too. If the response packets from PVM and SVM are identical, they are
 released immediately. Otherwise, a VM checkpoint (on demand) is
 conducted. The idea is presented in Xen summit 2012, and 2013,
 and academia paper in SOCC 2013. It's also presented in KVM forum
 2013:
 http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf
 Please refer to above document for detailed information. 
 Please also refer to previous posted RFC proposal:
 http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html

Hi Yang,
  Thanks for this set of patches (and I've replied to many individually).

 The patchset is also hosted on github:
 https://github.com/macrosheep/qemu/tree/colo_v0.1
 
 This patchset is RFC, implements the frame of colo, without
 failover and nic/disk replication. But it is ready for demo
 the COLO idea above QEMU-Kvm.
 Steps using this patchset to get an overview of COLO:
 1. configure the source with --enable-colo option
 2. compile
 3. just like QEMU's normal migration, run 2 QEMU VM:
- Primary VM 
- Secondary VM with -incoming tcp:[IP]:[PORT] option
 4. on Primary VM's QEMU monitor, run following command:
migrate_set_capability colo on
migrate tcp:[IP]:[PORT]
 5. done
 you will see two runing VMs, whenever you make changes to PVM, SVM
 will be synced to PVM's state.
 
 TODO list:
 1. failover
 2. nic replication
 3. disk replication[COLO Disk manager]

I wonder if there are any parts that can be borrowed from other code
to get it going; I notice that the reverse execution patchset
has a network packet record/replay mode:

https://lists.gnu.org/archive/html/qemu-devel/2014-07/msg00157.html

What was used for the nic comparison in the 2013 kvm forum paper?

Dave

 
 Any comments/feedbacks are warmly welcomed.
 
 Thanks,
 Yang
 
 Yang Hongyang (17):
   configure: add CONFIG_COLO to switch COLO support
   COLO: introduce an api colo_supported() to indicate COLO support
   COLO migration: add a migration capability 'colo'
   COLO info: use colo info to tell migration target colo is enabled
   COLO save: integrate COLO checkpointed save into qemu migration
   COLO restore: integrate COLO checkpointed restore into qemu restore
   COLO buffer: implement colo buffer as well as QEMUFileOps based on it
   COLO: disable qdev hotplug
   COLO ctl: implement API's that communicate with colo agent
   COLO ctl: introduce is_slave() and is_master()
   COLO ctl: implement colo checkpoint protocol
   COLO ctl: add a RunState RUN_STATE_COLO
   COLO ctl: implement colo save
   COLO ctl: implement colo restore
   COLO save: reuse migration bitmap under colo checkpoint
   COLO ram cache: implement colo ram cache on slaver
   HACK: trigger checkpoint every 500ms
 
  Makefile.objs  |   2 +
  arch_init.c| 174 +-
  configure  |  14 +
  include/exec/cpu-all.h |   1 +
  include/migration/migration-colo.h |  36 +++
  include/migration/migration.h  |  13 +
  include/qapi/qmp/qerror.h  |   3 +
  migration-colo-comm.c  |  78 +
  migration-colo.c   | 643 
 +
  migration.c|  45 ++-
  qapi-schema.json   |   9 +-
  stubs/Makefile.objs|   1 +
  stubs/migration-colo.c |  34 ++
  vl.c   |  12 +
  14 files changed, 1044 insertions(+), 21 deletions(-)
  create mode 100644 include/migration/migration-colo.h
  create mode 100644 migration-colo-comm.c
  create mode 100644 migration-colo.c
  create mode 100644 stubs/migration-colo.c
 
 -- 
 1.9.1
 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service

2014-07-23 Thread Yang Hongyang
Virtual machine (VM) replication is a well known technique for
providing application-agnostic software-implemented hardware fault
tolerance non-stop service. COLO is a high availability solution.
Both primary VM (PVM) and secondary VM (SVM) run in parallel. They
receive the same request from client, and generate response in parallel
too. If the response packets from PVM and SVM are identical, they are
released immediately. Otherwise, a VM checkpoint (on demand) is
conducted. The idea is presented in Xen summit 2012, and 2013,
and academia paper in SOCC 2013. It's also presented in KVM forum
2013:
http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf
Please refer to above document for detailed information. 
Please also refer to previous posted RFC proposal:
http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html

The patchset is also hosted on github:
https://github.com/macrosheep/qemu/tree/colo_v0.1

This patchset is RFC, implements the frame of colo, without
failover and nic/disk replication. But it is ready for demo
the COLO idea above QEMU-Kvm.
Steps using this patchset to get an overview of COLO:
1. configure the source with --enable-colo option
2. compile
3. just like QEMU's normal migration, run 2 QEMU VM:
   - Primary VM 
   - Secondary VM with -incoming tcp:[IP]:[PORT] option
4. on Primary VM's QEMU monitor, run following command:
   migrate_set_capability colo on
   migrate tcp:[IP]:[PORT]
5. done
you will see two runing VMs, whenever you make changes to PVM, SVM
will be synced to PVM's state.

TODO list:
1. failover
2. nic replication
3. disk replication[COLO Disk manager]

Any comments/feedbacks are warmly welcomed.

Thanks,
Yang

Yang Hongyang (17):
  configure: add CONFIG_COLO to switch COLO support
  COLO: introduce an api colo_supported() to indicate COLO support
  COLO migration: add a migration capability 'colo'
  COLO info: use colo info to tell migration target colo is enabled
  COLO save: integrate COLO checkpointed save into qemu migration
  COLO restore: integrate COLO checkpointed restore into qemu restore
  COLO buffer: implement colo buffer as well as QEMUFileOps based on it
  COLO: disable qdev hotplug
  COLO ctl: implement API's that communicate with colo agent
  COLO ctl: introduce is_slave() and is_master()
  COLO ctl: implement colo checkpoint protocol
  COLO ctl: add a RunState RUN_STATE_COLO
  COLO ctl: implement colo save
  COLO ctl: implement colo restore
  COLO save: reuse migration bitmap under colo checkpoint
  COLO ram cache: implement colo ram cache on slaver
  HACK: trigger checkpoint every 500ms

 Makefile.objs  |   2 +
 arch_init.c| 174 +-
 configure  |  14 +
 include/exec/cpu-all.h |   1 +
 include/migration/migration-colo.h |  36 +++
 include/migration/migration.h  |  13 +
 include/qapi/qmp/qerror.h  |   3 +
 migration-colo-comm.c  |  78 +
 migration-colo.c   | 643 +
 migration.c|  45 ++-
 qapi-schema.json   |   9 +-
 stubs/Makefile.objs|   1 +
 stubs/migration-colo.c |  34 ++
 vl.c   |  12 +
 14 files changed, 1044 insertions(+), 21 deletions(-)
 create mode 100644 include/migration/migration-colo.h
 create mode 100644 migration-colo-comm.c
 create mode 100644 migration-colo.c
 create mode 100644 stubs/migration-colo.c

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service

2014-07-23 Thread Eric Blake
On 07/23/2014 08:25 AM, Yang Hongyang wrote:
 Virtual machine (VM) replication is a well known technique for
 providing application-agnostic software-implemented hardware fault
 tolerance non-stop service. COLO is a high availability solution.
 Both primary VM (PVM) and secondary VM (SVM) run in parallel. They
 receive the same request from client, and generate response in parallel
 too. If the response packets from PVM and SVM are identical, they are
 released immediately. Otherwise, a VM checkpoint (on demand) is
 conducted. The idea is presented in Xen summit 2012, and 2013,
 and academia paper in SOCC 2013. It's also presented in KVM forum
 2013:
 http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf
 Please refer to above document for detailed information. 
 Please also refer to previous posted RFC proposal:
 http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html
 
 The patchset is also hosted on github:
 https://github.com/macrosheep/qemu/tree/colo_v0.1
 
 This patchset is RFC, implements the frame of colo, without
 failover and nic/disk replication. But it is ready for demo
 the COLO idea above QEMU-Kvm.
 Steps using this patchset to get an overview of COLO:
 1. configure the source with --enable-colo option

Code that has to be opt-in tends to bitrot, because people don't
configure their build-bots to opt in.  What sort of penalties does
opting in cause to the code if colo is not used?  I'd much rather make
the default to compile colo unless configured --disable-colo.  Are there
any pre-req libraries required for it to work?  That would be the only
reason to make the default of on or off conditional, rather than
defaulting to on.


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC PATCH 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service

2014-07-23 Thread Hongyang Yang

On 07/23/2014 11:44 PM, Eric Blake wrote:

On 07/23/2014 08:25 AM, Yang Hongyang wrote:

Virtual machine (VM) replication is a well known technique for
providing application-agnostic software-implemented hardware fault
tolerance non-stop service. COLO is a high availability solution.
Both primary VM (PVM) and secondary VM (SVM) run in parallel. They
receive the same request from client, and generate response in parallel
too. If the response packets from PVM and SVM are identical, they are
released immediately. Otherwise, a VM checkpoint (on demand) is
conducted. The idea is presented in Xen summit 2012, and 2013,
and academia paper in SOCC 2013. It's also presented in KVM forum
2013:
http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf
Please refer to above document for detailed information.
Please also refer to previous posted RFC proposal:
http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html

The patchset is also hosted on github:
https://github.com/macrosheep/qemu/tree/colo_v0.1

This patchset is RFC, implements the frame of colo, without
failover and nic/disk replication. But it is ready for demo
the COLO idea above QEMU-Kvm.
Steps using this patchset to get an overview of COLO:
1. configure the source with --enable-colo option


Code that has to be opt-in tends to bitrot, because people don't
configure their build-bots to opt in.  What sort of penalties does
opting in cause to the code if colo is not used?  I'd much rather make
the default to compile colo unless configured --disable-colo.  Are there
any pre-req libraries required for it to work?  That would be the only
reason to make the default of on or off conditional, rather than
defaulting to on.


Thanks for all your comments on this patchset, will address them.
For this one, it will not affect the rest of the code if COLO is compiled
but not used, and it does not require pre-req libraries for now, so we can
make COLO support default to on next time.






--
Thanks,
Yang.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html