On 2016/11/28 14:00, Changlong Xie wrote:
On 11/28/2016 01:13 PM, Hailiang Zhang wrote:
On 2016/10/25 17:03, Changlong Xie wrote:
On 10/20/2016 09:57 PM, zhanghailiang wrote:
Introuduce the scenario of shared-disk block replication
and how to use it.
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com>
---
docs/block-replication.txt | 131
+++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 127 insertions(+), 4 deletions(-)
diff --git a/docs/block-replication.txt b/docs/block-replication.txt
index 6bde673..97fcfc1 100644
--- a/docs/block-replication.txt
+++ b/docs/block-replication.txt
@@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the
network transportation
effort during a vmstate checkpoint, the disk modification
operations of
the Primary disk are asynchronously forwarded to the Secondary node.
-== Workflow ==
+== Non-shared disk workflow ==
The following is the image of block replication workflow:
+----------------------+
+------------------------+
@@ -57,7 +57,7 @@ The following is the image of block replication
workflow:
4) Secondary write requests will be buffered in the Disk
buffer and it
will overwrite the existing sector content in the buffer.
-== Architecture ==
+== None-shared disk architecture ==
s/None-shared/Non-shared/g
We are going to implement block replication from many basic
blocks that are already in QEMU.
@@ -106,6 +106,74 @@ any state that would otherwise be lost by the
speculative write-through
of the NBD server into the secondary disk. So before block
replication,
the primary disk and secondary disk should contain the same data.
+== Shared Disk Mode Workflow ==
+The following is the image of block replication workflow:
+
+ +----------------------+ +------------------------+
+ |Primary Write Requests| |Secondary Write Requests|
+ +----------------------+ +------------------------+
+ | |
+ | (4)
+ | V
+ | /-------------\
+ | (2)Forward and write through | |
+ | +--------------------------> | Disk Buffer |
+ | | | |
+ | | \-------------/
+ | |(1)read |
+ | | |
+ (3)write | | | backing file
+ V | |
+ +-----------------------------+ |
+ | Shared Disk | <-----+
+ +-----------------------------+
+
+ 1) Primary writes will read original data and forward it to
Secondary
+ QEMU.
+ 2) Before Primary write requests are written to Shared disk, the
+ original sector content will be read from Shared disk and
+ forwarded and buffered in the Disk buffer on the secondary site,
+ but it will not overwrite the existing
extra spaces at the end of line
+ sector content(it could be from either "Secondary Write
Requests" or
Need a space before "(" for better style.
+ previous COW of "Primary Write Requests") in the Disk buffer.
+ 3) Primary write requests will be written to Shared disk.
+ 4) Secondary write requests will be buffered in the Disk buffer
and it
+ will overwrite the existing sector content in the buffer.
+
+== Shared Disk Mode Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+ virtio-blk
|| .----------
+ /
|| | Secondary
+ /
|| '----------
+ /
|| virtio-blk
+ /
|| |
+ |
|| replication(5)
+ | NBD --------> NBD
(2) |
+ | client || server ---> hidden
disk <-- active disk(4)
+ | ^ || |
+ | replication(1) || |
+ | | || |
+ | +-----------------' || |
+ (3) |drive-backup sync=none || |
+--------. | +-----------------+ || |
+Primary | | | || backing |
+--------' | | || |
+ V | |
+ +-------------------------------------------+ |
+ | shared disk | <----------+
+ +-------------------------------------------+
+
+
+ 1) Primary writes will read original data and forward it to
Secondary
+ QEMU.
+ 2) The hidden-disk buffers the original content that is modified
by the
+ primary VM. It should also be an empty disk, and
extra spaces at end of line
+ the driver supports bdrv_make_empty() and backing file.
+ 3) Primary write requests will be written to Shared disk.
+ 4) Secondary write requests will be buffered in the active disk
and it
+ will overwrite the existing sector content in the buffer.
+
== Failure Handling ==
There are 7 internal errors when block replication is running:
1. I/O error on primary disk
@@ -145,7 +213,7 @@ d. replication_stop_all()
things except failover. The caller must hold the I/O mutex lock
if it is
in migration/checkpoint thread.
-== Usage ==
+== Non-shared disk usage ==
Primary:
-drive
if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\
children.0.file.filename=1.raw,\
@@ -234,6 +302,61 @@ Secondary:
The primary host is down, so we should do the following thing:
{ 'execute': 'nbd-server-stop' }
+== Shared disk usage ==
Keep the some coding style with "== Non-shared disk usage ==" part is
good to me.
+Primary:
+ -drive if=virtio,id=primary_disk0,file.filename=1.raw,driver=raw
+
+Issue qmp command:
+ {'execute': 'human-monitor-command',
two space indentation for the whole "{...}" part
+ 'arguments': {
+ 'command-line': 'drive_add-nbuddydriver=replication,
missing spaces
+ mode=primary,
+ file.driver=nbd,
+ file.host=9.42.3.17,
+ file.port=9998,
+ file.export=hidden_disk0,
+ shared-disk-id=primary_disk0,
+ shared-disk=on,
+ node-name=rep'
Keep the whole commands after "command-line" in one line, or you can
execute it correctly. IIRC
Hmm, i will change this hmp command to qmp 'blockdev-add' command in next
version, because it is supported now, though it is ready for production.
It's a good start, but i'm not sure here.
http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg01062.html
Yes, i noticed that, but for COLO, it is not ready for production either.
So I think it is OK to use it here ...
Thanks
-Xie
+ }
+ }
Secondary:
+ -drive
if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
+ backing.driver=raw,backing.file.filename=1.raw \
+ -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
+ file.driver=qcow2,top-id=active-disk0,\
+ file.file.filename=/mnt/ramfs/active_disk.img,\
+ file.backing=hidden_disk0,shared-disk=on
+
+Issue qmp command:
+1. {'execute': 'nbd-server-start',
+ 'arguments': {
+ 'addr': {
+ 'type': 'inet',
+ 'data': {
+ 'host': '0',
s/0/9.42.3.17/g, since you use designated ip address above
+ 'port': '9998'
+ }
+ }
+ }
+ }
+2. {
+ 'execute': 'nbd-server-add',
+ 'arguments': {
+ 'device': 'hidden_disk0',
+ 'writable': true
+ }
+ }
+
+After Failover:
+Primary:
+{'execute': 'human-monitor-command',
+ 'arguments': {
+ 'command-line': 'drive_delrep'
drive_del rep
I'll use the qmp command instead here.
+ }
+}
+
+Secondary:
+ {'execute': 'nbd-server-stop' }
+
TODO:
1. Continuous block replication
-2. Shared disk
I will fix all the above problems in next version, thanks.
.
.
.