Hi, I've been looking at what's needed to add a new secondary after a primary failed; from the block side it doesn't look as hard as I'd expected, perhaps you can tell me if I'm missing something!
The normal primary setup is: quorum Real disk nbd client The normal secondary setup is: replication active-disk hidden-disk Real-disk With a couple of minor code hacks; I changed the secondary to be: quorum replication active-disk hidden-disk Real-disk dummy-disk and then after the primary fails, I start a new secondary on another host and then on the old secondary do: nbd_server_stop stop x_block_change top-quorum -d children.0 # deletes use of real disk, leaves dummy drive_del active-disk0 x_block_change top-quorum -a node-real-disk x_block_change top-quorum -d children.1 # Seems to have deleted the dummy?!, the disk is now child 0 drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none x_block_change top-quorum -a nbd-client c migrate_set_capability x-colo on migrate -d -b tcp:ibpair:8888 and I think that means what was the secondary, has the same disk structure as a normal primary. That's not quite happy yet, and I've not figured out why - but the order/structure of the block devices looks right? Notes: a) The dummy serves two purposes, 1) it works around the segfault I reported in the other mail, 2) when I delete the real disk in the first x_block_change it means the quorum still has 1 disk so doesn't get upset. b) I had to remove the restriction in quorum_start_replication on which mode it would run in. c) I'm not really sure everything knows it's in secondary mode yet, and I'm not convinced whether the replication is doing the right thing. d) The migrate -d -b eventually fails on the destination, not worked out why yet. e) Adding/deleting children on quorum is hard having to use the children.0/1 notation when you've added children using node names - it's worrying which number is which; is there a way to give them a name? f) I've not thought about the colo-proxy that much yet - I guess that existing connections need to keep their sequence number offset but new connections made by what is now the primary dont need to do anything special. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK