The core principle of Ganeti is that VMs will continue to function regardless of Ganeti's behavior, so you do not have to worry about data loss for the time being. Just do not issue instance-affecting commands. Also, back up the /var/lib/ganeti/ directory in its entirety on all three nodes - it may come to be useful later.
First off, you say that both the nodes are masters - how did you ascertain this? If no actions have been undertaken since the failed master-failover, you are likely to still be in a good state. Now, check which daemons are alive with "ps aux | grep ganeti" and post it here, and also execute this on every node and tell me the result: "cat /var/lib/ganeti/config.data | python -mjson.tool | grep master_node" Thanks, Riba On Thu, Dec 17, 2015 at 4:08 PM, Chencho Tshering < [email protected]> wrote: > Dear Hrvoje, > > It is ok, but with my very limited knowledge i have end up creating dual > master. Now i can see two master. To give you more detail on it, all of > them has debian 8.1 jessie and gnt-cluster version (ganeti v2.12.4) > 2.12.4 running on it.The details of those 3 nodes are > > 1) Hostname : master.ilcs.edu.bt > Type : master node > RAM : 32 GB > IBM x3630 M4 > > 2) Hostname : slave1.ilcs.edu.bt > Type : master node > RAM : 24 GB > IBM x3500 M3 > > 3) Hostname : slave2.ilcs.edu.bt > Type : slave node > RAM : 8 GB > Dell Optiplex 790 > > On both the master nodes i get same error when i try to verify the cluster > (i.e gnt-cluster verify) or list-instance or start instance. The error > message is "Timeout while talking to the master daemon. Jobs might have > beensubmitted and will continue to run even if the call timed out. Useful > commands in this situation are 'gnt-job list', 'gnt-job cancel' and > 'gnt-job watch'. Error: Connect timed out". All of them can ping each > other but ganeti clustering is not working. Can you please tell me the > problem here ? Now I am really confuse and tense because data might get > erased or lost. > > > regards, > > Chencho Tshering, > ICT Officer, > Institute of Language and Culture Studies, Taktse > Royal University of Bhutan > > On Thu, Dec 17, 2015 at 8:53 PM, Hrvoje Ribicic <[email protected]> wrote: > >> Hi Chencho, >> >> Sorry for the delayed response. I believe you've been hit by the >> following bug: >> >> https://code.google.com/p/ganeti/issues/detail?id=1159 >> >> To prevent this problem from occurring repeatedly, you can manually apply >> the attached patch (this is for 2.12, so you might have to fiddle around). >> Make sure you know the basics of Python and programming before attempting >> this :) >> >> To fix the current situation: >> >> 1. Check that no node thinks it is the master daemon: commands like >> gnt-cluster info should fail or timeout everywhere. If you do not do this >> and still execute the commands, you could end up in a dual-master >> situation, and you do not want this to happen. >> 2. Find the node which was supposed to become the master and on which >> gnt-cluster master-failover failed, and modify /etc/default/ganeti to >> contain the following lines: >> >> WCONFD_ARGS="--no-voting --yes-do-it" >> LUXID_ARGS="--no-voting --yes-do-it" >> >> 3. Restart Ganeti on the node - either by running "service ganeti >> restart" or "/etc/init.d/ganeti restart". >> 4. Ganeti should be working again. If not, stop here and reply to this >> mail. >> 5. Remove the modified lines from /etc/default/ganeti >> 6. Run gnt-cluster verify, and if errors occur, gnt-cluster redist-conf. >> >> Ping me if more help is needed! >> >> Cheers, >> Riba >> >> On Tue, Dec 15, 2015 at 3:51 PM, Chencho Tshering < >> [email protected]> wrote: >> >>> My ganeti-cluster version is >>> gnt-cluster (ganeti v2.12.4) 2.12.4 >>> >>> Chencho Tshering, >>> ICT Officer, >>> Institute of Language and Culture Studies, Taktse >>> Royal University of Bhutan >>> >>> On Tue, Dec 15, 2015 at 8:12 PM, Chencho Tshering < >>> [email protected]> wrote: >>> >>>> >>>> Chencho Tshering, >>>> ICT Officer, >>>> Institute of Language and Culture Studies, Taktse >>>> Royal University of Bhutan >>>> >>>> ---------- Forwarded message ---------- >>>> From: Chencho Tshering <[email protected]> >>>> Date: Tue, Dec 15, 2015 at 8:08 PM >>>> Subject: Urgent!!! Ganeti Verify couldn't be done >>>> To: [email protected] >>>> >>>> >>>> Hi, >>>> I am very new to Ganeti clustering. My friend has installed into our >>>> server before i take up this job and He is gone and could not contact. >>>> Please help me with the issues that i am facing right now. >>>> >>>> I have 3 node and 4 instance running on it. But suddenly after power >>>> off for so long my master node is not responding in the sense that i >>>> couldn't verify ganeti clustering (i.e gnt-cluster verify). i rather always >>>> get this error message like "Timeout while talking to the master >>>> daemon. Jobs might have beensubmitted and will continue to run even if the >>>> call timed out. Useful commands in this situation are 'gnt-job list', >>>> 'gnt-job cancel' and 'gnt-job watch'. Error: Connect timed out". I am >>>> using debain on master node as well to 2 slave nodes. I am not sure about >>>> the version of ganeti because i don't know how to check it. >>>> >>>> I tried master failover using only 2 node (master node and 1 slave) >>>> using this command "gnt-cluster master-failover -no--voting" and it didn't >>>> help. While executing this command the master node is shutdown as >>>> suggested. I am attaching my error message below. >>>> >>>> >>>> regards, >>>> >>>> Chencho Tshering, >>>> ICT Officer, >>>> Institute of Language and Culture Studies, Taktse >>>> Royal University of Bhutan >>>> >>>> >>> >> Hrvoje Ribicic >> Ganeti Engineering >> Google Germany GmbH >> Dienerstr. 12, 80331, München >> >> Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle >> Registergericht und -nummer: Hamburg, HRB 86891 >> Sitz der Gesellschaft: Hamburg >> >> Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind, >> leiten Sie diese bitte nicht weiter, informieren Sie den Absender und >> löschen Sie die E-Mail und alle Anhänge. Vielen Dank. >> >> This e-mail is confidential. If you are not the right addressee please do >> not forward it, please inform the sender, and please erase this e-mail >> including any attachments. Thanks. >> >> > Hrvoje Ribicic Ganeti Engineering Google Germany GmbH Dienerstr. 12, 80331, München Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind, leiten Sie diese bitte nicht weiter, informieren Sie den Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank. This e-mail is confidential. If you are not the right addressee please do not forward it, please inform the sender, and please erase this e-mail including any attachments. Thanks.
