Re: [Qemu-devel] COLO: how to flip a secondary to a primary?
* Wen Congyang (we...@cn.fujitsu.com) wrote: > On 01/23/2016 03:35 AM, Dr. David Alan Gilbert wrote: > > Hi, > > I've been looking at what's needed to add a new secondary after > > a primary failed; from the block side it doesn't look as hard > > as I'd expected, perhaps you can tell me if I'm missing something! > > > > The normal primary setup is: > > > >quorum > > Real disk > > nbd client > > quorum >real disk >replication > nbd client > > > > > The normal secondary setup is: > >replication > > active-disk > > hidden-disk > > Real-disk > > IIRC, we can do it like this: > quorum >replication > active-disk > hidden-disk > real-disk Yes. > > With a couple of minor code hacks; I changed the secondary to be: > > > >quorum > > replication > > active-disk > > hidden-disk > > Real-disk > > dummy-disk > > after failover, > quorum >replicaion(old, mode is secondary) > active-disk > hidden-disk* > real-disk* >replication(new, mode is primary) > nbd-client Do you need to keep the old secondary-replication? Does that just pass straight through? > In the newest version, we active commit active-disk to real-disk. > So it will be: > quorum >replicaion(old, mode is secondary) > active-disk(it is real disk now) >replication(new, mode is primary) > nbd-client How does that active-commit work? I didn't think you could change the real disk until you had the full checkpoint, since you don't know whether the primary or secondaries changes need to be written? > > and then after the primary fails, I start a new secondary > > on another host and then on the old secondary do: > > > > nbd_server_stop > > stop > > x_block_change top-quorum -d children.0 # deletes use of real > > disk, leaves dummy > > drive_del active-disk0 > > x_block_change top-quorum -a node-real-disk > > x_block_change top-quorum -d children.1 # Seems to have deleted > > the dummy?!, the disk is now child 0 > > drive_add buddy > > driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none > > x_block_change top-quorum -a nbd-client > > c > > migrate_set_capability x-colo on > > migrate -d -b tcp:ibpair: > > > > and I think that means what was the secondary, has the same disk > > structure as a normal primary. > > That's not quite happy yet, and I've not figured out why - but the > > order/structure of the block devices looks right? > > > > Notes: > >a) The dummy serves two purposes, 1) it works around the segfault > > I reported in the other mail, 2) when I delete the real disk in the > > first x_block_change it means the quorum still has 1 disk so doesn't > > get upset. > > I don't understand the purpose 2. quorum wont allow you to delete all it's members ('The number of children cannot be lower than the vote threshold 1') and it's very tricky getting the order correct with add/delete; for example I tried: drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none # gets children.1 x_block_change top-quorum -a nbd-client # deletes the secondary replication x_block_change top-quorum -d children.0 drive_del active-disk0 # ends up as children.0 but in the 2nd slot x_block_change top-quorum -a node-real-disk info block shows me: top-quorum (#block615): json:{"children": [ {"driver": "replication", "mode": "primary", "file": {"port": "8889", "host": "ibpair", "driver": "nbd", "export": "colo-disk0"}}, {"driver": "raw", "file": {"driver": "file", "filename": "/home/localvms/bugzilla.raw"}} ], "driver": "quorum", "blkverify": false, "rewrite-corrupted": false, "vote-threshold": 1} (quorum) Cache mode: writeback that has the replication first and the file second; that's the opposite from the normal primary startup - does it matter? I can't add node-real-disk until I drive_del active-disk0 (which previously used it); and I can't drive_del until I remove it from the quorum; but I can't remove that from the quorum first, because that leaves an empty quorum. > >b) I had to remove the restriction in quorum_start_replication > > on which mode it would run in. > > IIRC, this check will be removed. > > >c) I'm not really sure everything knows it's in secondary mode yet, and > > I'm not convinced whether the replication is doing the right thing. > >d) The migrate -d -b eventually fails on the destination, not worked > > out why > > yet. > > Can you give me the error message? I need to repeat it to check; it was something like a bad flag from the block migration code; it happened after the block migration hit 100%. > >e) Adding/deleting children on quorum is hard having to
Re: [Qemu-devel] COLO: how to flip a secondary to a primary?
* Li Zhijian (lizhij...@cn.fujitsu.com) wrote: > > > On 01/25/2016 09:32 AM, Wen Congyang wrote: > >>>f) I've not thought about the colo-proxy that much yet - I guess that > >>> existing connections need to keep their sequence number offset but > > Strictly speaking, after failover, we only need to keep servicing for the tcp > connections which are > established after the last checkpoint but not all existing connections. > Because after a checkpoint > (primary and secondary node works well), primary vm and secondary vm is same, > that means the existing > tcp connection has the same sequence。 > > >>> new connections made by what is now the primary dont need to do > >>> anything > >>> special. > Yes, you are right. I wonder whether we need to do something special to the new-secondary; consider this: 1 primary (P1) & secondary (S1) run together 2 New connection opened 3secondary records an offset 4 5 primary (P1) fails; do failover to secondary 6 secondary (S1) still rewrites sequence for connection opened at (2) 7 Start new-secondary (S2), send checkpoint from S1->S2 8 S2 has same guest contents as S1; so the sequence numbers are still offset compared to the outside world. So S2 needs to be sent the offsets for existing connections, otherwise is S1 was then to fail, S2 would send the wrong output on the existing connection? Dave > > > >Hailiang or Zhijian can answer this question. > > > Thanks > Li Zhijian > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] COLO: how to flip a secondary to a primary?
On 01/26/2016 04:20 AM, Dr. David Alan Gilbert wrote: * Li Zhijian (lizhij...@cn.fujitsu.com) wrote: On 01/25/2016 09:32 AM, Wen Congyang wrote: f) I've not thought about the colo-proxy that much yet - I guess that existing connections need to keep their sequence number offset but Strictly speaking, after failover, we only need to keep servicing for the tcp connections which are established after the last checkpoint but not all existing connections. Because after a checkpoint (primary and secondary node works well), primary vm and secondary vm is same, that means the existing tcp connection has the same sequence。 new connections made by what is now the primary dont need to do anything special. Yes, you are right. I wonder whether we need to do something special to the new-secondary; consider this: 1 primary (P1) & secondary (S1) run together 2 New connection opened 3secondary records an offset 4 5 primary (P1) fails; do failover to secondary 6 secondary (S1) still rewrites sequence for connection opened at (2) 7 Start new-secondary (S2), send checkpoint from S1->S2 8 S2 has same guest contents as S1; so the sequence numbers are still offset compared to the outside world. So S2 needs to be sent the offsets for existing connections, otherwise is S1 was then to fail, S2 would send the wrong output on the existing connection? Thanks for the example. Sure, if we support continuous FT, colo proxy need to implement migration_save and migration_load. At the beginning of (7), we need to save colo_proxy info(including connection info and sequence offset) at S1 and load colo_proxy at S2. S1/S2 need to keep doing tcp re-writer for the connections opened at (2) until they are closed. Thanks Li Zhijian Dave Hailiang or Zhijian can answer this question. Thanks Li Zhijian -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK . -- Best regards. Li Zhijian (8555)
Re: [Qemu-devel] COLO: how to flip a secondary to a primary?
On 01/26/2016 02:59 AM, Dr. David Alan Gilbert wrote: > * Wen Congyang (we...@cn.fujitsu.com) wrote: >> On 01/23/2016 03:35 AM, Dr. David Alan Gilbert wrote: >>> Hi, >>> I've been looking at what's needed to add a new secondary after >>> a primary failed; from the block side it doesn't look as hard >>> as I'd expected, perhaps you can tell me if I'm missing something! >>> >>> The normal primary setup is: >>> >>>quorum >>> Real disk >>> nbd client >> >> quorum >>real disk >>replication >> nbd client >> >>> >>> The normal secondary setup is: >>>replication >>> active-disk >>> hidden-disk >>> Real-disk >> >> IIRC, we can do it like this: >> quorum >>replication >> active-disk >> hidden-disk >> real-disk > > Yes. > >>> With a couple of minor code hacks; I changed the secondary to be: >>> >>>quorum >>> replication >>> active-disk >>> hidden-disk >>> Real-disk >>> dummy-disk >> >> after failover, >> quorum >>replicaion(old, mode is secondary) >> active-disk >> hidden-disk* >> real-disk* >>replication(new, mode is primary) >> nbd-client > > Do you need to keep the old secondary-replication? > Does that just pass straight through? Yes, the old secondary-replication can work in the newest mode. For example, we don't start colo again after failover, we do nothing. > >> In the newest version, we active commit active-disk to real-disk. >> So it will be: >> quorum >>replicaion(old, mode is secondary) >> active-disk(it is real disk now) >>replication(new, mode is primary) >> nbd-client > > How does that active-commit work? I didn't think you > could change the real disk until you had the full checkpoint, > since you don't know whether the primary or secondaries > changes need to be written? I start the active-commit work when doing failover. After failover, the primary changes after last checkpoint should be dropped(How to cancel the inprogress write ops?). > >>> and then after the primary fails, I start a new secondary >>> on another host and then on the old secondary do: >>> >>> nbd_server_stop >>> stop >>> x_block_change top-quorum -d children.0 # deletes use of real >>> disk, leaves dummy >>> drive_del active-disk0 >>> x_block_change top-quorum -a node-real-disk >>> x_block_change top-quorum -d children.1 # Seems to have deleted >>> the dummy?!, the disk is now child 0 >>> drive_add buddy >>> driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none >>> x_block_change top-quorum -a nbd-client >>> c >>> migrate_set_capability x-colo on >>> migrate -d -b tcp:ibpair: >>> >>> and I think that means what was the secondary, has the same disk >>> structure as a normal primary. >>> That's not quite happy yet, and I've not figured out why - but the >>> order/structure of the block devices looks right? >>> >>> Notes: >>>a) The dummy serves two purposes, 1) it works around the segfault >>> I reported in the other mail, 2) when I delete the real disk in the >>> first x_block_change it means the quorum still has 1 disk so doesn't >>> get upset. >> >> I don't understand the purpose 2. > > quorum wont allow you to delete all it's members ('The number of children > cannot be lower than the vote threshold 1') > and it's very tricky getting the order correct with add/delete; for example > I tried: > > drive_add buddy > driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none > # gets children.1 > x_block_change top-quorum -a nbd-client > # deletes the secondary replication > x_block_change top-quorum -d children.0 > drive_del active-disk0 The active-disk0 contains some data, and you should not delete it. If we do active-commit after failover, the active-disk0 is the real disk. > # ends up as children.0 but in the 2nd slot > x_block_change top-quorum -a node-real-disk > > info block shows me: > top-quorum (#block615): json:{"children": [ > {"driver": "replication", "mode": "primary", "file": {"port": "8889", > "host": "ibpair", "driver": "nbd", "export": "colo-disk0"}}, > {"driver": "raw", "file": {"driver": "file", "filename": > "/home/localvms/bugzilla.raw"}} >], >"driver": "quorum", "blkverify": false, "rewrite-corrupted": false, > "vote-threshold": 1} (quorum) > Cache mode: writeback > > that has the replication first and the file second; that's the opposite > from the normal primary startup - does it matter? it is OK. But reading from children.0 always fails and will read data from children.1 > > I can't add node-real-disk until I drive_del active-disk0 (which > previously used it); and I can't drive_del until I remove > it from the quorum; but I can't remove that from the quorum
Re: [Qemu-devel] COLO: how to flip a secondary to a primary?
On 01/25/2016 09:32 AM, Wen Congyang wrote: >f) I've not thought about the colo-proxy that much yet - I guess that > existing connections need to keep their sequence number offset but Strictly speaking, after failover, we only need to keep servicing for the tcp connections which are established after the last checkpoint but not all existing connections. Because after a checkpoint (primary and secondary node works well), primary vm and secondary vm is same, that means the existing tcp connection has the same sequence。 > new connections made by what is now the primary dont need to do anything > special. Yes, you are right. Hailiang or Zhijian can answer this question. Thanks Li Zhijian
Re: [Qemu-devel] COLO: how to flip a secondary to a primary?
On 01/23/2016 03:35 AM, Dr. David Alan Gilbert wrote: > Hi, > I've been looking at what's needed to add a new secondary after > a primary failed; from the block side it doesn't look as hard > as I'd expected, perhaps you can tell me if I'm missing something! > > The normal primary setup is: > >quorum > Real disk > nbd client quorum real disk replication nbd client > > The normal secondary setup is: >replication > active-disk > hidden-disk > Real-disk IIRC, we can do it like this: quorum replication active-disk hidden-disk real-disk > > With a couple of minor code hacks; I changed the secondary to be: > >quorum > replication > active-disk > hidden-disk > Real-disk > dummy-disk after failover, quorum replicaion(old, mode is secondary) active-disk hidden-disk* real-disk* replication(new, mode is primary) nbd-client In the newest version, we active commit active-disk to real-disk. So it will be: quorum replicaion(old, mode is secondary) active-disk(it is real disk now) replication(new, mode is primary) nbd-client > > and then after the primary fails, I start a new secondary > on another host and then on the old secondary do: > > nbd_server_stop > stop > x_block_change top-quorum -d children.0 # deletes use of real disk, > leaves dummy > drive_del active-disk0 > x_block_change top-quorum -a node-real-disk > x_block_change top-quorum -d children.1 # Seems to have deleted the > dummy?!, the disk is now child 0 > drive_add buddy > driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none > x_block_change top-quorum -a nbd-client > c > migrate_set_capability x-colo on > migrate -d -b tcp:ibpair: > > and I think that means what was the secondary, has the same disk > structure as a normal primary. > That's not quite happy yet, and I've not figured out why - but the > order/structure of the block devices looks right? > > Notes: >a) The dummy serves two purposes, 1) it works around the segfault > I reported in the other mail, 2) when I delete the real disk in the > first x_block_change it means the quorum still has 1 disk so doesn't > get upset. I don't understand the purpose 2. >b) I had to remove the restriction in quorum_start_replication > on which mode it would run in. IIRC, this check will be removed. >c) I'm not really sure everything knows it's in secondary mode yet, and > I'm not convinced whether the replication is doing the right thing. >d) The migrate -d -b eventually fails on the destination, not worked out > why > yet. Can you give me the error message? >e) Adding/deleting children on quorum is hard having to use the > children.0/1 > notation when you've added children using node names - it's worrying > which number is which; is there a way to give them a name? No. I think we can improve 'info block' output. >f) I've not thought about the colo-proxy that much yet - I guess that > existing connections need to keep their sequence number offset but > new connections made by what is now the primary dont need to do anything > special. Hailiang or Zhijian can answer this question. Thanks Wen Congyang > > Dave > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > > > . >
[Qemu-devel] COLO: how to flip a secondary to a primary?
Hi, I've been looking at what's needed to add a new secondary after a primary failed; from the block side it doesn't look as hard as I'd expected, perhaps you can tell me if I'm missing something! The normal primary setup is: quorum Real disk nbd client The normal secondary setup is: replication active-disk hidden-disk Real-disk With a couple of minor code hacks; I changed the secondary to be: quorum replication active-disk hidden-disk Real-disk dummy-disk and then after the primary fails, I start a new secondary on another host and then on the old secondary do: nbd_server_stop stop x_block_change top-quorum -d children.0 # deletes use of real disk, leaves dummy drive_del active-disk0 x_block_change top-quorum -a node-real-disk x_block_change top-quorum -d children.1 # Seems to have deleted the dummy?!, the disk is now child 0 drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none x_block_change top-quorum -a nbd-client c migrate_set_capability x-colo on migrate -d -b tcp:ibpair: and I think that means what was the secondary, has the same disk structure as a normal primary. That's not quite happy yet, and I've not figured out why - but the order/structure of the block devices looks right? Notes: a) The dummy serves two purposes, 1) it works around the segfault I reported in the other mail, 2) when I delete the real disk in the first x_block_change it means the quorum still has 1 disk so doesn't get upset. b) I had to remove the restriction in quorum_start_replication on which mode it would run in. c) I'm not really sure everything knows it's in secondary mode yet, and I'm not convinced whether the replication is doing the right thing. d) The migrate -d -b eventually fails on the destination, not worked out why yet. e) Adding/deleting children on quorum is hard having to use the children.0/1 notation when you've added children using node names - it's worrying which number is which; is there a way to give them a name? f) I've not thought about the colo-proxy that much yet - I guess that existing connections need to keep their sequence number offset but new connections made by what is now the primary dont need to do anything special. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK