Re: [Qemu-devel] COLO: how to flip a secondary to a primary?

2016-01-25 Thread Dr. David Alan Gilbert
* Wen Congyang (we...@cn.fujitsu.com) wrote:
> On 01/23/2016 03:35 AM, Dr. David Alan Gilbert wrote:
> > Hi,
> >   I've been looking at what's needed to add a new secondary after
> > a primary failed; from the block side it doesn't look as hard
> > as I'd expected, perhaps you can tell me if I'm missing something!
> > 
> > The normal primary setup is:
> > 
> >quorum
> >   Real disk
> >   nbd client
> 
> quorum
>real disk
>replication
>   nbd client
> 
> > 
> > The normal secondary setup is:
> >replication
> >   active-disk
> >   hidden-disk
> >   Real-disk
> 
> IIRC, we can do it like this:
> quorum
>replication
>   active-disk
>   hidden-disk
>   real-disk

Yes.

> > With a couple of minor code hacks; I changed the secondary to be:
> > 
> >quorum
> >   replication
> > active-disk
> > hidden-disk
> > Real-disk
> >   dummy-disk
> 
> after failover,
> quorum
>replicaion(old, mode is secondary)
>  active-disk
>  hidden-disk*
>  real-disk*
>replication(new, mode is primary)
>  nbd-client

Do you need to keep the old secondary-replication?
Does that just pass straight through?

> In the newest version, we active commit active-disk to real-disk.
> So it will be:
> quorum
>replicaion(old, mode is secondary)
>  active-disk(it is real disk now)
>replication(new, mode is primary)
>  nbd-client

How does that active-commit work?  I didn't think you
could change the real disk until you had the full checkpoint,
since you don't know whether the primary or secondaries
changes need to be written?

> > and then after the primary fails, I start a new secondary
> > on another host and then on the old secondary do:
> > 
> >   nbd_server_stop
> >   stop
> >   x_block_change top-quorum -d children.0 # deletes use of real 
> > disk, leaves dummy
> >   drive_del active-disk0
> >   x_block_change top-quorum -a node-real-disk
> >   x_block_change top-quorum -d children.1 # Seems to have deleted 
> > the dummy?!, the disk is now child 0
> >   drive_add buddy 
> > driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none
> >   x_block_change top-quorum -a nbd-client
> >   c
> >   migrate_set_capability x-colo on
> >   migrate -d -b tcp:ibpair:
> > 
> > and I think that means what was the secondary, has the same disk
> > structure as a normal primary.
> > That's not quite happy yet, and I've not figured out why - but the
> > order/structure of the block devices looks right?
> > 
> > Notes:
> >a) The dummy serves two purposes, 1) it works around the segfault
> >   I reported in the other mail, 2) when I delete the real disk in the
> >   first x_block_change it means the quorum still has 1 disk so doesn't
> >   get upset.
> 
> I don't understand the purpose 2.

quorum wont allow you to delete all it's members ('The number of children 
cannot be lower than the vote threshold 1')
and it's very tricky getting the order correct with add/delete; for example
I tried:

drive_add buddy 
driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none
# gets children.1
x_block_change top-quorum -a nbd-client
# deletes the secondary replication
x_block_change top-quorum -d children.0
drive_del active-disk0
# ends up as children.0 but in the 2nd slot
x_block_change top-quorum -a node-real-disk

info block shows me:
top-quorum (#block615): json:{"children": [
{"driver": "replication", "mode": "primary", "file": {"port": "8889", 
"host": "ibpair", "driver": "nbd", "export": "colo-disk0"}},
{"driver": "raw", "file": {"driver": "file", "filename": 
"/home/localvms/bugzilla.raw"}}
   ],
   "driver": "quorum", "blkverify": false, "rewrite-corrupted": false, 
"vote-threshold": 1} (quorum)
Cache mode:   writeback

that has the replication first and the file second; that's the opposite
from the normal primary startup - does it matter?

I can't add node-real-disk until I drive_del active-disk0 (which
previously used it);  and I can't drive_del until I remove
it from the quorum; but I can't remove that from the quorum first,
because that leaves an empty quorum.

> >b) I had to remove the restriction in quorum_start_replication
> >   on which mode it would run in. 
> 
> IIRC, this check will be removed.
> 
> >c) I'm not really sure everything knows it's in secondary mode yet, and
> >   I'm not convinced whether the replication is doing the right thing.
> >d) The migrate -d -b   eventually fails on the destination, not worked 
> > out why
> >   yet.
> 
> Can you give me the error message?

I need to repeat it to check; it was something like a bad flag from the block 
migration
code; it happened after the block migration hit 100%.

> >e) Adding/deleting children on quorum is hard having to 

Re: [Qemu-devel] COLO: how to flip a secondary to a primary?

2016-01-25 Thread Dr. David Alan Gilbert
* Li Zhijian (lizhij...@cn.fujitsu.com) wrote:
> 
> 
> On 01/25/2016 09:32 AM, Wen Congyang wrote:
> >>>f) I've not thought about the colo-proxy that much yet - I guess that
> >>>   existing connections need to keep their sequence number offset but
> 
> Strictly speaking, after failover, we only need to keep servicing for the tcp 
> connections which are
> established after the last checkpoint but not all existing connections. 
> Because after a checkpoint
> (primary and secondary node works well), primary vm and secondary vm is same, 
> that means the existing
> tcp connection has the same sequence。
> 
> >>>   new connections made by what is now the primary dont need to do 
> >>> anything
> >>>   special.
> Yes, you are right.

I wonder whether we need to do something special to the new-secondary;
consider this:

   1 primary (P1) & secondary (S1) run together
   2 New connection opened
   3secondary records an offset
   4 
   5 primary (P1) fails; do failover to secondary
   6 secondary (S1) still rewrites sequence for connection opened at (2)
   7 Start new-secondary (S2), send checkpoint from S1->S2
   8 S2 has same guest contents as S1; so the
 sequence numbers are still offset compared to the outside world.

So S2 needs to be sent the offsets for existing connections, otherwise
is S1 was then to fail, S2 would send the wrong output on the existing
connection?

Dave
 
> 
> 
> >Hailiang or Zhijian can answer this question.
> 
> 
> Thanks
> Li Zhijian
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] COLO: how to flip a secondary to a primary?

2016-01-25 Thread Li Zhijian



On 01/26/2016 04:20 AM, Dr. David Alan Gilbert wrote:

* Li Zhijian (lizhij...@cn.fujitsu.com) wrote:



On 01/25/2016 09:32 AM, Wen Congyang wrote:

f) I've not thought about the colo-proxy that much yet - I guess that
   existing connections need to keep their sequence number offset but


Strictly speaking, after failover, we only need to keep servicing for the tcp 
connections which are
established after the last checkpoint but not all existing connections. Because 
after a checkpoint
(primary and secondary node works well), primary vm and secondary vm is same, 
that means the existing
tcp connection has the same sequence。


   new connections made by what is now the primary dont need to do anything
   special.

Yes, you are right.


I wonder whether we need to do something special to the new-secondary;
consider this:

1 primary (P1) & secondary (S1) run together
2 New connection opened
3secondary records an offset
4 
5 primary (P1) fails; do failover to secondary
6 secondary (S1) still rewrites sequence for connection opened at (2)
7 Start new-secondary (S2), send checkpoint from S1->S2
8 S2 has same guest contents as S1; so the
  sequence numbers are still offset compared to the outside world.

So S2 needs to be sent the offsets for existing connections, otherwise
is S1 was then to fail, S2 would send the wrong output on the existing
connection?


Thanks for the example.
Sure, if we support continuous FT, colo proxy need to implement migration_save 
and migration_load.
At the beginning of (7), we need to save colo_proxy info(including connection 
info and sequence offset) at S1
and load colo_proxy at S2. S1/S2 need to keep doing tcp re-writer for the 
connections opened at (2)
until they are closed.

Thanks
Li Zhijian



Dave





Hailiang or Zhijian can answer this question.



Thanks
Li Zhijian



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


.



--
Best regards.
Li Zhijian (8555)





Re: [Qemu-devel] COLO: how to flip a secondary to a primary?

2016-01-25 Thread Wen Congyang
On 01/26/2016 02:59 AM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (we...@cn.fujitsu.com) wrote:
>> On 01/23/2016 03:35 AM, Dr. David Alan Gilbert wrote:
>>> Hi,
>>>   I've been looking at what's needed to add a new secondary after
>>> a primary failed; from the block side it doesn't look as hard
>>> as I'd expected, perhaps you can tell me if I'm missing something!
>>>
>>> The normal primary setup is:
>>>
>>>quorum
>>>   Real disk
>>>   nbd client
>>
>> quorum
>>real disk
>>replication
>>   nbd client
>>
>>>
>>> The normal secondary setup is:
>>>replication
>>>   active-disk
>>>   hidden-disk
>>>   Real-disk
>>
>> IIRC, we can do it like this:
>> quorum
>>replication
>>   active-disk
>>   hidden-disk
>>   real-disk
> 
> Yes.
> 
>>> With a couple of minor code hacks; I changed the secondary to be:
>>>
>>>quorum
>>>   replication
>>> active-disk
>>> hidden-disk
>>> Real-disk
>>>   dummy-disk
>>
>> after failover,
>> quorum
>>replicaion(old, mode is secondary)
>>  active-disk
>>  hidden-disk*
>>  real-disk*
>>replication(new, mode is primary)
>>  nbd-client
> 
> Do you need to keep the old secondary-replication?
> Does that just pass straight through?

Yes, the old secondary-replication can work in the newest mode.
For example, we don't start colo again after failover, we do nothing.

> 
>> In the newest version, we active commit active-disk to real-disk.
>> So it will be:
>> quorum
>>replicaion(old, mode is secondary)
>>  active-disk(it is real disk now)
>>replication(new, mode is primary)
>>  nbd-client
> 
> How does that active-commit work?  I didn't think you
> could change the real disk until you had the full checkpoint,
> since you don't know whether the primary or secondaries
> changes need to be written?

I start the active-commit work when doing failover. After failover,
the primary changes after last checkpoint should be dropped(How to cancel
the inprogress write ops?).

> 
>>> and then after the primary fails, I start a new secondary
>>> on another host and then on the old secondary do:
>>>
>>>   nbd_server_stop
>>>   stop
>>>   x_block_change top-quorum -d children.0 # deletes use of real 
>>> disk, leaves dummy
>>>   drive_del active-disk0
>>>   x_block_change top-quorum -a node-real-disk
>>>   x_block_change top-quorum -d children.1 # Seems to have deleted 
>>> the dummy?!, the disk is now child 0
>>>   drive_add buddy 
>>> driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none
>>>   x_block_change top-quorum -a nbd-client
>>>   c
>>>   migrate_set_capability x-colo on
>>>   migrate -d -b tcp:ibpair:
>>>
>>> and I think that means what was the secondary, has the same disk
>>> structure as a normal primary.
>>> That's not quite happy yet, and I've not figured out why - but the
>>> order/structure of the block devices looks right?
>>>
>>> Notes:
>>>a) The dummy serves two purposes, 1) it works around the segfault
>>>   I reported in the other mail, 2) when I delete the real disk in the
>>>   first x_block_change it means the quorum still has 1 disk so doesn't
>>>   get upset.
>>
>> I don't understand the purpose 2.
> 
> quorum wont allow you to delete all it's members ('The number of children 
> cannot be lower than the vote threshold 1')
> and it's very tricky getting the order correct with add/delete; for example
> I tried:
> 
> drive_add buddy 
> driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none
> # gets children.1
> x_block_change top-quorum -a nbd-client
> # deletes the secondary replication
> x_block_change top-quorum -d children.0
> drive_del active-disk0

The active-disk0 contains some data, and you should not delete it.
If we do active-commit after failover, the active-disk0 is the real disk.

> # ends up as children.0 but in the 2nd slot
> x_block_change top-quorum -a node-real-disk
> 
> info block shows me:
> top-quorum (#block615): json:{"children": [
> {"driver": "replication", "mode": "primary", "file": {"port": "8889", 
> "host": "ibpair", "driver": "nbd", "export": "colo-disk0"}},
> {"driver": "raw", "file": {"driver": "file", "filename": 
> "/home/localvms/bugzilla.raw"}}
>],
>"driver": "quorum", "blkverify": false, "rewrite-corrupted": false, 
> "vote-threshold": 1} (quorum)
> Cache mode:   writeback
> 
> that has the replication first and the file second; that's the opposite
> from the normal primary startup - does it matter?

it is OK. But reading from children.0 always fails and will read data from 
children.1

> 
> I can't add node-real-disk until I drive_del active-disk0 (which
> previously used it);  and I can't drive_del until I remove
> it from the quorum; but I can't remove that from the quorum 

Re: [Qemu-devel] COLO: how to flip a secondary to a primary?

2016-01-24 Thread Li Zhijian



On 01/25/2016 09:32 AM, Wen Congyang wrote:

>f) I've not thought about the colo-proxy that much yet - I guess that
>   existing connections need to keep their sequence number offset but


Strictly speaking, after failover, we only need to keep servicing for the tcp 
connections which are
established after the last checkpoint but not all existing connections. Because 
after a checkpoint
(primary and secondary node works well), primary vm and secondary vm is same, 
that means the existing
tcp connection has the same sequence。


>   new connections made by what is now the primary dont need to do anything
>   special.

Yes, you are right.



Hailiang or Zhijian can answer this question.



Thanks
Li Zhijian





Re: [Qemu-devel] COLO: how to flip a secondary to a primary?

2016-01-24 Thread Wen Congyang
On 01/23/2016 03:35 AM, Dr. David Alan Gilbert wrote:
> Hi,
>   I've been looking at what's needed to add a new secondary after
> a primary failed; from the block side it doesn't look as hard
> as I'd expected, perhaps you can tell me if I'm missing something!
> 
> The normal primary setup is:
> 
>quorum
>   Real disk
>   nbd client

quorum
   real disk
   replication
  nbd client

> 
> The normal secondary setup is:
>replication
>   active-disk
>   hidden-disk
>   Real-disk

IIRC, we can do it like this:
quorum
   replication
  active-disk
  hidden-disk
  real-disk

> 
> With a couple of minor code hacks; I changed the secondary to be:
> 
>quorum
>   replication
> active-disk
> hidden-disk
> Real-disk
>   dummy-disk

after failover,
quorum
   replicaion(old, mode is secondary)
 active-disk
 hidden-disk*
 real-disk*
   replication(new, mode is primary)
 nbd-client

In the newest version, we active commit active-disk to real-disk.
So it will be:
quorum
   replicaion(old, mode is secondary)
 active-disk(it is real disk now)
   replication(new, mode is primary)
 nbd-client

> 
> and then after the primary fails, I start a new secondary
> on another host and then on the old secondary do:
> 
>   nbd_server_stop
>   stop
>   x_block_change top-quorum -d children.0 # deletes use of real disk, 
> leaves dummy
>   drive_del active-disk0
>   x_block_change top-quorum -a node-real-disk
>   x_block_change top-quorum -d children.1 # Seems to have deleted the 
> dummy?!, the disk is now child 0
>   drive_add buddy 
> driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none
>   x_block_change top-quorum -a nbd-client
>   c
>   migrate_set_capability x-colo on
>   migrate -d -b tcp:ibpair:
> 
> and I think that means what was the secondary, has the same disk
> structure as a normal primary.
> That's not quite happy yet, and I've not figured out why - but the
> order/structure of the block devices looks right?
> 
> Notes:
>a) The dummy serves two purposes, 1) it works around the segfault
>   I reported in the other mail, 2) when I delete the real disk in the
>   first x_block_change it means the quorum still has 1 disk so doesn't
>   get upset.

I don't understand the purpose 2.

>b) I had to remove the restriction in quorum_start_replication
>   on which mode it would run in. 

IIRC, this check will be removed.

>c) I'm not really sure everything knows it's in secondary mode yet, and
>   I'm not convinced whether the replication is doing the right thing.
>d) The migrate -d -b   eventually fails on the destination, not worked out 
> why
>   yet.

Can you give me the error message?

>e) Adding/deleting children on quorum is hard having to use the 
> children.0/1
>   notation when you've added children using node names - it's worrying
>   which number is which; is there a way to give them a name?

No. I think we can improve 'info block' output.

>f) I've not thought about the colo-proxy that much yet - I guess that
>   existing connections need to keep their sequence number offset but
>   new connections made by what is now the primary dont need to do anything
>   special.

Hailiang or Zhijian can answer this question.

Thanks
Wen Congyang

> 
> Dave
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
> 
> .
> 






[Qemu-devel] COLO: how to flip a secondary to a primary?

2016-01-22 Thread Dr. David Alan Gilbert
Hi,
  I've been looking at what's needed to add a new secondary after
a primary failed; from the block side it doesn't look as hard
as I'd expected, perhaps you can tell me if I'm missing something!

The normal primary setup is:

   quorum
  Real disk
  nbd client

The normal secondary setup is:
   replication
  active-disk
  hidden-disk
  Real-disk

With a couple of minor code hacks; I changed the secondary to be:

   quorum
  replication
active-disk
hidden-disk
Real-disk
  dummy-disk

and then after the primary fails, I start a new secondary
on another host and then on the old secondary do:

  nbd_server_stop
  stop
  x_block_change top-quorum -d children.0 # deletes use of real disk, 
leaves dummy
  drive_del active-disk0
  x_block_change top-quorum -a node-real-disk
  x_block_change top-quorum -d children.1 # Seems to have deleted the 
dummy?!, the disk is now child 0
  drive_add buddy 
driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none
  x_block_change top-quorum -a nbd-client
  c
  migrate_set_capability x-colo on
  migrate -d -b tcp:ibpair:

and I think that means what was the secondary, has the same disk
structure as a normal primary.
That's not quite happy yet, and I've not figured out why - but the
order/structure of the block devices looks right?

Notes:
   a) The dummy serves two purposes, 1) it works around the segfault
  I reported in the other mail, 2) when I delete the real disk in the
  first x_block_change it means the quorum still has 1 disk so doesn't
  get upset.
   b) I had to remove the restriction in quorum_start_replication
  on which mode it would run in. 
   c) I'm not really sure everything knows it's in secondary mode yet, and
  I'm not convinced whether the replication is doing the right thing.
   d) The migrate -d -b   eventually fails on the destination, not worked out 
why
  yet.
   e) Adding/deleting children on quorum is hard having to use the children.0/1
  notation when you've added children using node names - it's worrying
  which number is which; is there a way to give them a name?
   f) I've not thought about the colo-proxy that much yet - I guess that
  existing connections need to keep their sequence number offset but
  new connections made by what is now the primary dont need to do anything
  special.

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK