On 01/23/2016 03:35 AM, Dr. David Alan Gilbert wrote: > Hi, > I've been looking at what's needed to add a new secondary after > a primary failed; from the block side it doesn't look as hard > as I'd expected, perhaps you can tell me if I'm missing something! > > The normal primary setup is: > > quorum > Real disk > nbd client
quorum real disk replication nbd client > > The normal secondary setup is: > replication > active-disk > hidden-disk > Real-disk IIRC, we can do it like this: quorum replication active-disk hidden-disk real-disk > > With a couple of minor code hacks; I changed the secondary to be: > > quorum > replication > active-disk > hidden-disk > Real-disk > dummy-disk after failover, quorum replicaion(old, mode is secondary) active-disk hidden-disk* real-disk* replication(new, mode is primary) nbd-client In the newest version, we active commit active-disk to real-disk. So it will be: quorum replicaion(old, mode is secondary) active-disk(it is real disk now) replication(new, mode is primary) nbd-client > > and then after the primary fails, I start a new secondary > on another host and then on the old secondary do: > > nbd_server_stop > stop > x_block_change top-quorum -d children.0 # deletes use of real disk, > leaves dummy > drive_del active-disk0 > x_block_change top-quorum -a node-real-disk > x_block_change top-quorum -d children.1 # Seems to have deleted the > dummy?!, the disk is now child 0 > drive_add buddy > driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none > x_block_change top-quorum -a nbd-client > c > migrate_set_capability x-colo on > migrate -d -b tcp:ibpair:8888 > > and I think that means what was the secondary, has the same disk > structure as a normal primary. > That's not quite happy yet, and I've not figured out why - but the > order/structure of the block devices looks right? > > Notes: > a) The dummy serves two purposes, 1) it works around the segfault > I reported in the other mail, 2) when I delete the real disk in the > first x_block_change it means the quorum still has 1 disk so doesn't > get upset. I don't understand the purpose 2. > b) I had to remove the restriction in quorum_start_replication > on which mode it would run in. IIRC, this check will be removed. > c) I'm not really sure everything knows it's in secondary mode yet, and > I'm not convinced whether the replication is doing the right thing. > d) The migrate -d -b eventually fails on the destination, not worked out > why > yet. Can you give me the error message? > e) Adding/deleting children on quorum is hard having to use the > children.0/1 > notation when you've added children using node names - it's worrying > which number is which; is there a way to give them a name? No. I think we can improve 'info block' output. > f) I've not thought about the colo-proxy that much yet - I guess that > existing connections need to keep their sequence number offset but > new connections made by what is now the primary dont need to do anything > special. Hailiang or Zhijian can answer this question. Thanks Wen Congyang > > Dave > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > > > . >