Re: [Qemu-devel] [Patch v12 resend 05/10] docs: block replication's description

2016-01-26 Thread Stefan Hajnoczi
On Mon, Jan 04, 2016 at 02:03:16PM +0800, Wen Congyang wrote:
> On 12/23/2015 05:26 PM, Stefan Hajnoczi wrote:
> > On Wed, Dec 02, 2015 at 01:31:46PM +0800, Wen Congyang wrote:
> >> +== Failure Handling ==
> >> +There are 6 internal errors when block replication is running:
> >> +1. I/O error on primary disk
> >> +2. Forwarding primary write requests failed
> >> +3. Backup failed
> >> +4. I/O error on secondary disk
> >> +5. I/O error on active disk
> >> +6. Making active disk or hidden disk empty failed
> >> +In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
> >> +4 and 6, we just report block replication's error to FT/HA manager (which
> >> +decides when to do a new checkpoint, when to do failover).
> >> +There is no internal error when doing failover.
> > 
> > Not sure this is true.
> > 
> > Below it says the following for failover: "We will flush the Disk buffer
> > into Secondary Disk and stop block replication".  Flushing the disk
> > buffer can result in I/O errors.  This means that failover operations
> > are not guaranteed to succeed.
> 
> We don't use mirror job now. We may use it in the next version.
> Is there any way to know the I/O error when the mirror job is running?
> Get the job's status?

Block jobs have an error status which is exposed via QMP.  The block job
emits a QMP event notifying the client.  If the client issues
query-block-jobs it will also see the iostatus field.

I'm not aware of an internal API to monitor QMP events.  It would be
possible to add it but first I wonder why you want to use mirror?

> > In practice I think this is similar to a successful failover followed by
> > immediately getting I/O errors on the new Primary Disk.  It means that
> > right after failover there is another failure and the system may not be
> > able to continue.
> 
> Block replication is not designed for such case. For example, we don't do
> failover on primary disk's failure. In such case, we just report the error
> to the disk layer(It is the case 1 in the above Failure Handling).
> 
> Sorry for the late reply. Your mail is sent at 2015-12-23, but I receive
> it at 2016-01-04

What is supposed to happen when flushing the Disk Buffer into the
Secondary Disk fails?

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Patch v12 resend 05/10] docs: block replication's description

2016-01-04 Thread Dr. David Alan Gilbert
* Stefan Hajnoczi (stefa...@redhat.com) wrote:
> On Wed, Dec 02, 2015 at 01:31:46PM +0800, Wen Congyang wrote:
> > +== Failure Handling ==
> > +There are 6 internal errors when block replication is running:
> > +1. I/O error on primary disk
> > +2. Forwarding primary write requests failed
> > +3. Backup failed
> > +4. I/O error on secondary disk
> > +5. I/O error on active disk
> > +6. Making active disk or hidden disk empty failed
> > +In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
> > +4 and 6, we just report block replication's error to FT/HA manager (which
> > +decides when to do a new checkpoint, when to do failover).
> > +There is no internal error when doing failover.
> 
> Not sure this is true.
> 
> Below it says the following for failover: "We will flush the Disk buffer
> into Secondary Disk and stop block replication".  Flushing the disk
> buffer can result in I/O errors.  This means that failover operations
> are not guaranteed to succeed.
> 
> In practice I think this is similar to a successful failover followed by
> immediately getting I/O errors on the new Primary Disk.  It means that
> right after failover there is another failure and the system may not be
> able to continue.

Yes, I think that's true.

> So this really only matters in the case where there is a new Secondary
> ready after failover.  In that case the user might expect failover to
> continue to the new Secondary (Host 3):
> 
>[X][X]
>   Host 1 <-> Host 2 <-> Host 3

Since COLO is just doing a 1+1 redundency, I think it's not expecting to
cope with a double host failure; it's going to take some time (seconds?) to
sync Host 3 back in when you add it after a failover and the aim would
be not to have distrubed the application for that long, so it should
already be running on Host 2 during that resync.

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [Patch v12 resend 05/10] docs: block replication's description

2016-01-03 Thread Stefan Hajnoczi
On Wed, Dec 02, 2015 at 01:31:46PM +0800, Wen Congyang wrote:
> +== Failure Handling ==
> +There are 6 internal errors when block replication is running:
> +1. I/O error on primary disk
> +2. Forwarding primary write requests failed
> +3. Backup failed
> +4. I/O error on secondary disk
> +5. I/O error on active disk
> +6. Making active disk or hidden disk empty failed
> +In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
> +4 and 6, we just report block replication's error to FT/HA manager (which
> +decides when to do a new checkpoint, when to do failover).
> +There is no internal error when doing failover.

Not sure this is true.

Below it says the following for failover: "We will flush the Disk buffer
into Secondary Disk and stop block replication".  Flushing the disk
buffer can result in I/O errors.  This means that failover operations
are not guaranteed to succeed.

In practice I think this is similar to a successful failover followed by
immediately getting I/O errors on the new Primary Disk.  It means that
right after failover there is another failure and the system may not be
able to continue.

So this really only matters in the case where there is a new Secondary
ready after failover.  In that case the user might expect failover to
continue to the new Secondary (Host 3):

   [X][X]
  Host 1 <-> Host 2 <-> Host 3


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Patch v12 resend 05/10] docs: block replication's description

2016-01-03 Thread Wen Congyang
On 12/23/2015 05:26 PM, Stefan Hajnoczi wrote:
> On Wed, Dec 02, 2015 at 01:31:46PM +0800, Wen Congyang wrote:
>> +== Failure Handling ==
>> +There are 6 internal errors when block replication is running:
>> +1. I/O error on primary disk
>> +2. Forwarding primary write requests failed
>> +3. Backup failed
>> +4. I/O error on secondary disk
>> +5. I/O error on active disk
>> +6. Making active disk or hidden disk empty failed
>> +In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
>> +4 and 6, we just report block replication's error to FT/HA manager (which
>> +decides when to do a new checkpoint, when to do failover).
>> +There is no internal error when doing failover.
> 
> Not sure this is true.
> 
> Below it says the following for failover: "We will flush the Disk buffer
> into Secondary Disk and stop block replication".  Flushing the disk
> buffer can result in I/O errors.  This means that failover operations
> are not guaranteed to succeed.

We don't use mirror job now. We may use it in the next version.
Is there any way to know the I/O error when the mirror job is running?
Get the job's status?

> 
> In practice I think this is similar to a successful failover followed by
> immediately getting I/O errors on the new Primary Disk.  It means that
> right after failover there is another failure and the system may not be
> able to continue.

Block replication is not designed for such case. For example, we don't do
failover on primary disk's failure. In such case, we just report the error
to the disk layer(It is the case 1 in the above Failure Handling).

Sorry for the late reply. Your mail is sent at 2015-12-23, but I receive
it at 2016-01-04

> 
> So this really only matters in the case where there is a new Secondary
> ready after failover.  In that case the user might expect failover to
> continue to the new Secondary (Host 3):
> 
>[X][X]
>   Host 1 <-> Host 2 <-> Host 3
> 






[Qemu-devel] [Patch v12 resend 05/10] docs: block replication's description

2015-12-01 Thread Wen Congyang
Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 docs/block-replication.txt | 227 +
 1 file changed, 227 insertions(+)
 create mode 100644 docs/block-replication.txt

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
new file mode 100644
index 000..c7bad0e
--- /dev/null
+++ b/docs/block-replication.txt
@@ -0,0 +1,227 @@
+Block replication
+
+Copyright Fujitsu, Corp. 2015
+Copyright (c) 2015 Intel Corporation
+Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+Block replication is used for continuous checkpoints. It is designed
+for COLO (COurse-grain LOck-stepping) where the Secondary VM is running.
+It can also be applied for FT/HA (Fault-tolerance/High Assurance) scenario,
+where the Secondary VM is not running.
+
+This document gives an overview of block replication's design.
+
+== Background ==
+High availability solutions such as micro checkpoint and COLO will do
+consecutive checkpoints. The VM state of Primary VM and Secondary VM is
+identical right after a VM checkpoint, but becomes different as the VM
+executes till the next checkpoint. To support disk contents checkpoint,
+the modified disk contents in the Secondary VM must be buffered, and are
+only dropped at next checkpoint time. To reduce the network transportation
+effort at the time of checkpoint, the disk modification operations of
+Primary disk are asynchronously forwarded to the Secondary node.
+
+== Workflow ==
+The following is the image of block replication workflow:
+
++--+++
+|Primary Write Requests||Secondary Write Requests|
++--+++
+  |   |
+  |  (4)
+  |   V
+  |  /-\
+  |  Copy and Forward| |
+  |-(1)--+   | Disk Buffer |
+  |  |   | |
+  | (3)  \-/
+  | speculative  ^
+  |write through(2)
+  |  |   |
+  V  V   |
+   +--+   ++
+   | Primary Disk |   | Secondary Disk |
+   +--+   ++
+
+1) Primary write requests will be copied and forwarded to Secondary
+   QEMU.
+2) Before Primary write requests are written to Secondary disk, the
+   original sector content will be read from Secondary disk and
+   buffered in the Disk buffer, but it will not overwrite the existing
+   sector content (it could be from either "Secondary Write Requests" or
+   previous COW of "Primary Write Requests") in the Disk buffer.
+3) Primary write requests will be written to Secondary disk.
+4) Secondary write requests will be buffered in the Disk buffer and it
+   will overwrite the existing sector content in the buffer.
+
+== Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+
+ virtio-blk   ||
+ ^||.--
+ |||| Secondary
+1 Quorum  ||'--
+ /  \ ||
+/\||
+   Primary2 filter
+ disk ^
 virtio-blk
+  |
  ^
+3 NBD  --->  3 NBD 
  |
+client|| server
  2 filter
+  ||^  
  ^
+. |||  
  |
+Primary | ||  Secondary disk <- hidden-disk 5 
<- active-disk 4
+' |||  backing^   backing
+  ||| |
+  ||| |
+  ||'-'
+  ||