Hi Chencho, Sorry for the delayed response. I believe you've been hit by the following bug:
https://code.google.com/p/ganeti/issues/detail?id=1159 To prevent this problem from occurring repeatedly, you can manually apply the attached patch (this is for 2.12, so you might have to fiddle around). Make sure you know the basics of Python and programming before attempting this :) To fix the current situation: 1. Check that no node thinks it is the master daemon: commands like gnt-cluster info should fail or timeout everywhere. If you do not do this and still execute the commands, you could end up in a dual-master situation, and you do not want this to happen. 2. Find the node which was supposed to become the master and on which gnt-cluster master-failover failed, and modify /etc/default/ganeti to contain the following lines: WCONFD_ARGS="--no-voting --yes-do-it" LUXID_ARGS="--no-voting --yes-do-it" 3. Restart Ganeti on the node - either by running "service ganeti restart" or "/etc/init.d/ganeti restart". 4. Ganeti should be working again. If not, stop here and reply to this mail. 5. Remove the modified lines from /etc/default/ganeti 6. Run gnt-cluster verify, and if errors occur, gnt-cluster redist-conf. Ping me if more help is needed! Cheers, Riba On Tue, Dec 15, 2015 at 3:51 PM, Chencho Tshering < [email protected]> wrote: > My ganeti-cluster version is > gnt-cluster (ganeti v2.12.4) 2.12.4 > > Chencho Tshering, > ICT Officer, > Institute of Language and Culture Studies, Taktse > Royal University of Bhutan > > On Tue, Dec 15, 2015 at 8:12 PM, Chencho Tshering < > [email protected]> wrote: > >> >> Chencho Tshering, >> ICT Officer, >> Institute of Language and Culture Studies, Taktse >> Royal University of Bhutan >> >> ---------- Forwarded message ---------- >> From: Chencho Tshering <[email protected]> >> Date: Tue, Dec 15, 2015 at 8:08 PM >> Subject: Urgent!!! Ganeti Verify couldn't be done >> To: [email protected] >> >> >> Hi, >> I am very new to Ganeti clustering. My friend has installed into our >> server before i take up this job and He is gone and could not contact. >> Please help me with the issues that i am facing right now. >> >> I have 3 node and 4 instance running on it. But suddenly after power off >> for so long my master node is not responding in the sense that i couldn't >> verify ganeti clustering (i.e gnt-cluster verify). i rather always get this >> error message like "Timeout while talking to the master daemon. Jobs >> might have beensubmitted and will continue to run even if the call timed >> out. Useful commands in this situation are 'gnt-job list', 'gnt-job cancel' >> and 'gnt-job watch'. Error: Connect timed out". I am using debain on >> master node as well to 2 slave nodes. I am not sure about the version of >> ganeti because i don't know how to check it. >> >> I tried master failover using only 2 node (master node and 1 slave) using >> this command "gnt-cluster master-failover -no--voting" and it didn't help. >> While executing this command the master node is shutdown as suggested. I am >> attaching my error message below. >> >> >> regards, >> >> Chencho Tshering, >> ICT Officer, >> Institute of Language and Culture Studies, Taktse >> Royal University of Bhutan >> >> > Hrvoje Ribicic Ganeti Engineering Google Germany GmbH Dienerstr. 12, 80331, München Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind, leiten Sie diese bitte nicht weiter, informieren Sie den Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank. This e-mail is confidential. If you are not the right addressee please do not forward it, please inform the sender, and please erase this e-mail including any attachments. Thanks.
commit ba923a1df309b073e18cd68167180c9f559b1c3e Author: Hrvoje Ribicic <[email protected]> Date: Thu Dec 17 00:18:50 2015 +0000 Pass arguments to correct daemons during master-failover A master-failover can be executed with the --no-voting flag, making Ganeti start daemons despite a lack of votes. This is necessary to fail over a cluster reduced to two nodes. The feature has not been working since 2.12 daemon refactoring, as the daemon parameters were passed through environmental variables that were not updated. This commit passes the parameters correctly, and fixes issue 1159. Signed-off-by: Hrvoje Ribicic <[email protected]> Reviewed-by: Helga Velroyen <[email protected]> diff --git a/lib/backend.py b/lib/backend.py index 5a04c7f..eafb5d1 100644 --- a/lib/backend.py +++ b/lib/backend.py @@ -434,12 +434,13 @@ def StartMasterDaemons(no_voting): """ if no_voting: - masterd_args = "--no-voting --yes-do-it" + daemon_args = "--no-voting --yes-do-it" else: - masterd_args = "" + daemon_args = "" env = { - "EXTRA_MASTERD_ARGS": masterd_args, + "EXTRA_LUXID_ARGS": daemon_args, + "EXTRA_WCONFD_ARGS": daemon_args, } result = utils.RunCmd([pathutils.DAEMON_UTIL, "start-master"], env=env)
