This is the initial design of block replication.
The blkcolo block driver enables disk replication for continuous
checkpoints. It is designed for COLO that Secondary VM is running.
It can also be applied for FT/HA scene that Secondary VM is not
running.

Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <la...@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yan...@cn.fujitsu.com>
---
 docs/blkcolo.txt | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)
 create mode 100644 docs/blkcolo.txt

diff --git a/docs/blkcolo.txt b/docs/blkcolo.txt
new file mode 100644
index 0000000..41c2a05
--- /dev/null
+++ b/docs/blkcolo.txt
@@ -0,0 +1,85 @@
+Disk replication using blkcolo
+----------------------------------------
+Copyright Fujitsu, Corp. 2014
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+The blkcolo block driver enables disk replication for continuous checkpoints.
+It is designed for COLO that Secondary VM is running. It can also be applied
+for FT/HA scene that Secondary VM is not running.
+
+This document gives an overview of blkcolo's design.
+
+== Background ==
+High availability solutions such as micro checkpoint and COLO will do
+consecutive checkpoint. The VM state of Primary VM and Secondary VM is
+identical right after a VM checkpoint, but becomes different as the VM
+executes till the next checkpoint. To support disk contents checkpoint,
+the modified disk contents in the Secondary VM must be buffered, and are
+only dropped at next checkpoint time. To reduce the network transportation
+effort at the time of checkpoint, the disk modification operations of
+Primary disk are asynchronously forwarded to the Secondary node.
+
+== Disk Buffer ==
+The following is the image of Disk buffer:
+
+        +----------------------+            +------------------------+
+        |Primary Write Requests|            |Secondary Write Requests|
+        +----------------------+            +------------------------+
+                  |                                       |
+                  |                                      (4)
+                  |                                       V
+                  |                              /-------------\
+                  |      Copy and Forward        |             |
+                  |---------(1)----------+       | Disk Buffer |
+                  |                      |       |             |
+                  |                     (3)      \-------------/
+                  |                 speculative      ^
+                  |                write through    (2)
+                  |                      |           |
+                  V                      V           |
+           +--------------+           +----------------+
+           | Primary Disk |           | Secondary Disk |
+           +--------------+           +----------------+
+    1) Primary write requests will be copied and forwarded to Secondary
+       QEMU.
+    2) Before Primary write requests are written to Secondary disk, the
+       original sector content will be read from Secondary disk and
+       buffered in the Disk buffer, but it will not overwrite the existing
+       sector content in the Disk buffer.
+    3) Primary write requests will be written to Secondary disk.
+    4) Secondary write requests will be bufferd in the Disk buffer and it
+       will overwrite the existing sector content in the buffer.
+
+== Capture I/O request ==
+The blkcolo is a new block driver protocol, so all I/O requests can be
+captured in the driver interface bdrv_co_readv()/bdrv_co_writev().
+
+== Checkpoint & failover ==
+The blkcolo buffers the write requests in Secondary QEMU. And the buffer
+should be dropped at a checkpoint, or be flushed to Secondary disk when
+failover. We add four block driver interfaces to do this:
+a. bdrv_prepare_checkpoint()
+   This interface may block, and return when all Primary write
+   requests are forwarded to Secondary QEMU.
+b. bdrv_do_checkpoint()
+   This interface is called after all VM state is transfered to
+   Secondary QEMU. The Disk buffer will be dropped in this interface.
+c. bdrv_get_sent_data_size()
+   This is used on Primary node.
+   It should be called by migration/checkpoint thread in order
+   to decide whether to start a new checkpoint or not. If the data
+   amount being sent is too large, we should start a new checkpoint.
+d. bdrv_stop_replication()
+   It is called when failover. We will flush the Disk buffer into
+   Secondary Disk and stop disk replication.
+
+== Usage ==
+On both Primary/Secondary host, invoke QEMU with the following parameters:
+    "-drive file=blkcolo:host:port:/path/to/image"
+a. host
+   Hostname or IP of the Secondary host.
+b. port
+   The Secondary QEMU will listen on this port, and the Primary QEMU
+   will connect to this port.
-- 
1.9.1


Reply via email to