On 02/04/2016 05:07 PM, Dr. David Alan Gilbert wrote:
* Changlong Xie (xiecl.f...@cn.fujitsu.com) wrote:
On 02/01/2016 09:18 AM, Wen Congyang wrote:
On 01/29/2016 06:47 PM, Dr. David Alan Gilbert wrote:
* Wen Congyang (we...@cn.fujitsu.com) wrote:
On 01/29/2016 06:07 PM, Dr. David Alan Gilbert wrote:
* Wen Congyang (we...@cn.fujitsu.com) wrote:
On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
Hi,
   I've got a block error if I kill the secondary.

Start both primary & secondary
kill -9 secondary qemu
x_colo_lost_heartbeat on primary

The guest sees a block error and the ext4 root switches to read-only.

I gdb'd the primary with a breakpoint on quorum_report_bad; see
backtrace below.
(This is based on colo-v2.4-periodic-mode of the framework
code with the block and network proxy merged in; so it could be my
merging but I don't think so ?)


(gdb) where
#0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, 
acb=0x7f2946cb3910, acb=0x7f2946cb3910)
     at /root/colo/jan-2016/qemu/block/quorum.c:222
#1  0x00007f2943b23058 in quorum_aio_cb (opaque=<optimized out>, ret=<optimized 
out>)
     at /root/colo/jan-2016/qemu/block/quorum.c:315
#2  0x00007f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at 
/root/colo/jan-2016/qemu/block/io.c:2122
#3  0x00007f2943ae777d in aio_bh_call (bh=<optimized out>) at 
/root/colo/jan-2016/qemu/async.c:64
#4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at 
/root/colo/jan-2016/qemu/async.c:92
#5  0x00007f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at 
/root/colo/jan-2016/qemu/aio-posix.c:305
#6  0x00007f2943ae756e in aio_ctx_dispatch (source=<optimized out>, 
callback=<optimized out>,
     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
#7  0x00007f293b84a79a in g_main_context_dispatch () from 
/lib64/libglib-2.0.so.0
#8  0x00007f2943af3a00 in glib_pollfds_poll () at 
/root/colo/jan-2016/qemu/main-loop.c:211
#9  os_host_main_loop_wait (timeout=<optimized out>) at 
/root/colo/jan-2016/qemu/main-loop.c:256
#10 main_loop_wait (nonblocking=<optimized out>) at 
/root/colo/jan-2016/qemu/main-loop.c:504
#11 0x00007f29438529ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
#12 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at 
/root/colo/jan-2016/qemu/vl.c:4707

(gdb) p s->num_children
$1 = 2
(gdb) p acb->success_count
$2 = 0
(gdb) p acb->is_read
$5 = false

Sorry for the late reply.

No problem.

What it the value of acb->count?

(gdb) p acb->count
$1 = 1

Note, the count is 1, not 2. Writing to children.0 is in flight. If writing to 
children.0 successes,
the guest doesn't know this error.
If secondary host is down, you should remove quorum's children.1. Otherwise, 
you will get
I/O error event.

Is that safe?  If the secondary fails, do you always have time to issue the 
command to
remove the children.1  before the guest sees the error?

We will write to two children, and expect that writing to children.0 will 
success. If so,
the guest doesn't know this error. You just get the I/O error event.

I think children.0 is the disk, and that should be OK - so only the 
children.1/replication should
be failing - so in that case why do I see the error?

I don't know, and I will check the codes.

The 'node0' in the backtrace above is the name of the replication, so it does 
look like the error
is coming from the replication.

No, the backtrace is just report an I/O error events to the management 
application.


Anyway, I tried removing children.1 but it segfaults now, I guess the 
replication is unhappy:

(qemu) x_block_change colo-disk0 -d children.1
(qemu) x_colo_lost_heartbeat

Hmm, you should not remove the child before failover. I will check it how to 
avoid it in the codes.

  But you said 'If secondary host is down, you should remove quorum's 
children.1' - is that not
what you meant?

Yes, you should excute 'x_colo_lost_heartbeat' fist, and then excute 
'x_block_change ... -d ...'.

Hi david

Hi Xie,
   Thanks for the response.

It seems we missed 'drive_del' command, and will document it in next
version. Here is the right commands order:

{ "execute": "x-colo-lost-heartbeat" }
{ 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk',
'child': 'children.1'}}
{ 'execute': 'human-monitor-command', 'arguments': {'command-line':
'drive_del xxxxx'}}

OK,  however, you should fix the seg fault if you don't issue the drive_del;
qemu should never crash.
(Also I still get the IO error in the guest if I do the x-colo-lost-heartbeat).


Here is a quick fix, i just tested for several times. It work well to me.

    bugfix

    Signed-off-by: Changlong Xie <xiecl.f...@cn.fujitsu.com>

diff --git a/block/quorum.c b/block/quorum.c
index e5a7e4f..f4f1d28 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -458,6 +458,11 @@ static QuorumVoteVersion *quorum_get_vote_winner(QuorumVotes *votes)
         if (candidate->vote_count > max) {
             max = candidate->vote_count;
             winner = candidate;
+            continue;
+        }
+        if (candidate->vote_count == max &&
+                    candidate->value.l > winner->value.l) {
+            winner = candidate;
         }
     }


Dave

Thanks
        -Xie

12973 Segmentation fault      (core dumped) 
./try/x86_64-softmmu/qemu-system-x86_64 -enable-kvm $console_param -S -boot c 
-m 4080 -smp 4 -machine pc-i440fx-2.5,accel=kvm -name debug-threads=on -trace 
events=trace-file -device virtio-rng-pci $block_param $net_param

#0  0x00007f0a398a864c in bdrv_stop_replication (bs=0x7f0a3b0a8430, 
failover=true, errp=0x7fff6a5c3420)
     at /root/colo/jan-2016/qemu/block.c:4426

(gdb) p drv
$1 = (BlockDriver *) 0x5d2a

   it looks like the whole of bs is bogus.

#1  0x00007f0a398d87f6 in quorum_stop_replication (bs=<optimized out>, 
failover=<optimized out>,
     errp=<optimized out>) at /root/colo/jan-2016/qemu/block/quorum.c:1213

(gdb) p s->replication_index
$3 = 1

I guess quorum_del_child needs to stop replication before it removes the child?

Yes, but in the newest version, quorum doesn't know the block replication, and 
I think
we shoud add an reference to the bs when starting block replication.

Do you have a new version ready to test?  I'm interested to try it (and also 
interested
to try the latest version of the colo-proxy)

I think we can post the newest version this week.

Thanks
Wen Congyang


Dave

Thanks
Wen Congyang

(although it would have to be careful not to block on the dead nbd).

#2  0x00007f0a398a8901 in bdrv_stop_replication_all 
(failover=failover@entry=true, errp=errp@entry=0x7fff6a5c3478)
     at /root/colo/jan-2016/qemu/block.c:4504
#3  0x00007f0a3984b0af in primary_vm_do_failover () at 
/root/colo/jan-2016/qemu/migration/colo.c:144
#4  colo_do_failover (s=<optimized out>) at 
/root/colo/jan-2016/qemu/migration/colo.c:162
#5  0x00007f0a3989d7fd in aio_bh_call (bh=<optimized out>) at 
/root/colo/jan-2016/qemu/async.c:64
#6  aio_bh_poll (ctx=ctx@entry=0x7f0a3a6c21d0) at 
/root/colo/jan-2016/qemu/async.c:92
#7  0x00007f0a398ab110 in aio_dispatch (ctx=0x7f0a3a6c21d0) at 
/root/colo/jan-2016/qemu/aio-posix.c:305
#8  0x00007f0a3989d5ee in aio_ctx_dispatch (source=<optimized out>, 
callback=<optimized out>,
     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
#9  0x00007f0a3160079a in g_main_context_dispatch () from 
/lib64/libglib-2.0.so.0
#10 0x00007f0a398a9a80 in glib_pollfds_poll () at 
/root/colo/jan-2016/qemu/main-loop.c:211
#11 os_host_main_loop_wait (timeout=<optimized out>) at 
/root/colo/jan-2016/qemu/main-loop.c:256
#12 main_loop_wait (nonblocking=<optimized out>) at 
/root/colo/jan-2016/qemu/main-loop.c:504
#13 0x00007f0a396089ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
#14 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at 
/root/colo/jan-2016/qemu/vl.c:4707

Dave

Thanks
Wen Congyang


(qemu) info block
colo-disk0 (#block080): json:{"children": [{"driver": "raw", "file": {"driver": "file", "filename": "/root/colo/bugzilla.raw"}}, {"driver": "replication", "mode": "primary", "file": {"port": 
"8889", "host": "ibpair", "driver": "nbd", "export": "colo-disk0"}}], "driver": "quorum", "blkverify": false, "rewrite-corrupted": false, "vote-threshold": 1} (quorum)
     Cache mode:       writeback, direct

Dave

* Changlong Xie (xiecl.f...@cn.fujitsu.com) wrote:
Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

You can get the detailed information about block replication from here:
http://wiki.qemu.org/Features/BlockReplication

Usage:
Please refer to docs/block-replication.txt

This patch series is based on the following patch series:
1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html

You can get the patch here:
https://github.com/Pating/qemu/tree/changlox/block-replication-v13

You can get the patch with framework here:
https://github.com/Pating/qemu/tree/changlox/colo_framework_v12

TODO:
1. Continuous block replication. It will be started after basic functions
    are accepted.

Changs Log:
V13:
1. Rebase to the newest codes
2. Remove redundant marcos and semicolon in replication.c
3. Fix typos in block-replication.txt
V12:
1. Rebase to the newest codes
2. Use backing reference to replcace 'allow-write-backing-file'
V11:
1. Reopen the backing file when starting blcok replication if it is not
    opened in R/W mode
2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
    when opening backing file
3. Block the top BDS so there is only one block job for the top BDS and
    its backing chain.
V10:
1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
    reference.
2. Address the comments from Eric Blake
V9:
1. Update the error messages
2. Rebase to the newest qemu
3. Split child add/delete support. These patches are sent in another patchset.
V8:
1. Address Alberto Garcia's comments
V7:
1. Implement adding/removing quorum child. Remove the option non-connect.
2. Simplify the backing refrence option according to Stefan Hajnoczi's 
suggestion
V6:
1. Rebase to the newest qemu.
V5:
1. Address the comments from Gong Lei
2. Speed the failover up. The secondary vm can take over very quickly even
    if there are too many I/O requests.
V4:
1. Introduce a new driver replication to avoid touch nbd and qcow2.
V3:
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Wen Congyang (10):
   unblock backup operations in backing file
   Store parent BDS in BdrvChild
   Backup: clear all bitmap when doing block checkpoint
   Allow creating backup jobs when opening BDS
   docs: block replication's description
   Add new block driver interfaces to control block replication
   quorum: implement block driver interfaces for block replication
   Implement new driver for block replication
   support replication driver in blockdev-add
   Add a new API to start/stop replication, do checkpoint to all BDSes

  block.c                    | 145 ++++++++++++
  block/Makefile.objs        |   3 +-
  block/backup.c             |  14 ++
  block/quorum.c             |  78 +++++++
  block/replication.c        | 545 +++++++++++++++++++++++++++++++++++++++++++++
  blockjob.c                 |  11 +
  docs/block-replication.txt | 227 +++++++++++++++++++
  include/block/block.h      |   9 +
  include/block/block_int.h  |  15 ++
  include/block/blockjob.h   |  12 +
  qapi/block-core.json       |  33 ++-
  11 files changed, 1089 insertions(+), 3 deletions(-)
  create mode 100644 block/replication.c
  create mode 100644 docs/block-replication.txt

--
1.9.3



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


.




--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


.




--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


.


.



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


.




Reply via email to