Dear Riba,

Thank you very much for your information, it relief me from this unbearable
tense. I will make sure not to do anything dangerous.

Regarding ending up with dual master, as far as i remember after repeated
doing master failover i found that Slave2 is master and my assumption was
it is Slave node. So when i check gnt-cluster getmaster in slave1 machine i
get slave1.ilcs.edu.bt and when i do gnt-cluster getmaster in other two
nodes i get i get master.ilcs.edu.bt. Since i have done master failover in
slave1, the expected output should be slave1.ilcs.edu.bt i guess.

I have ran two mentioned commands and output is attached here.

regards,



Chencho Tshering,
ICT Officer,
Institute of Language and Culture Studies, Taktse
Royal University of Bhutan

On Fri, Dec 18, 2015 at 12:01 AM, Hrvoje Ribicic <[email protected]> wrote:

> The core principle of Ganeti is that VMs will continue to function
> regardless of Ganeti's behavior, so you do not have to worry about data
> loss for the time being. Just do not issue instance-affecting commands.
> Also, back up the /var/lib/ganeti/ directory in its entirety on all three
> nodes - it may come to be useful later.
>
> First off, you say that both the nodes are masters - how did you ascertain
> this?
>
> If no actions have been undertaken since the failed master-failover, you
> are likely to still be in a good state.
>
> Now, check which daemons are alive with "ps aux | grep ganeti" and post it
> here, and also execute this on every node and tell me the result:
> "cat /var/lib/ganeti/config.data | python -mjson.tool | grep master_node"
>
> Thanks,
> Riba
>
>
> On Thu, Dec 17, 2015 at 4:08 PM, Chencho Tshering <
> [email protected]> wrote:
>
>> Dear Hrvoje,
>>
>> It is ok, but with my very limited knowledge i have end up creating dual
>> master. Now i can see two master. To give you more detail on it, all of
>> them has debian 8.1 jessie and gnt-cluster version (ganeti v2.12.4)
>> 2.12.4 running on it.The details of those 3 nodes are
>>
>>  1) Hostname : master.ilcs.edu.bt
>>     Type        : master node
>>     RAM        : 32 GB
>>     IBM x3630 M4
>>
>>  2) Hostname : slave1.ilcs.edu.bt
>>     Type        : master node
>>     RAM        : 24 GB
>>     IBM x3500 M3
>>
>>  3) Hostname : slave2.ilcs.edu.bt
>>     Type        : slave node
>>     RAM        : 8 GB
>>     Dell Optiplex 790
>>
>> On both the master nodes i get same error when i try to verify the
>> cluster (i.e gnt-cluster verify) or list-instance or start instance. The
>> error message is "Timeout while talking to the master daemon. Jobs might
>> have beensubmitted and will continue to run even if the call timed out.
>> Useful commands in this situation are 'gnt-job list', 'gnt-job cancel' and
>> 'gnt-job watch'. Error: Connect timed out". All of them can ping each
>> other  but ganeti clustering is not working. Can you please tell me the
>> problem here ? Now I am really confuse and tense because data might get
>> erased or lost.
>>
>>
>> regards,
>>
>> Chencho Tshering,
>> ICT Officer,
>> Institute of Language and Culture Studies, Taktse
>> Royal University of Bhutan
>>
>> On Thu, Dec 17, 2015 at 8:53 PM, Hrvoje Ribicic <[email protected]> wrote:
>>
>>> Hi Chencho,
>>>
>>> Sorry for the delayed response. I believe you've been hit by the
>>> following bug:
>>>
>>> https://code.google.com/p/ganeti/issues/detail?id=1159
>>>
>>> To prevent this problem from occurring repeatedly, you can manually
>>> apply the attached patch (this is for 2.12, so you might have to fiddle
>>> around).
>>> Make sure you know the basics of Python and programming before
>>> attempting this :)
>>>
>>> To fix the current situation:
>>>
>>> 1. Check that no node thinks it is the master daemon: commands like
>>> gnt-cluster info should fail or timeout everywhere. If you do not do this
>>> and still execute the commands, you could end up in a dual-master
>>> situation, and you do not want this to happen.
>>> 2. Find the node which was supposed to become the master and on which
>>> gnt-cluster master-failover failed, and modify /etc/default/ganeti to
>>> contain the following lines:
>>>
>>> WCONFD_ARGS="--no-voting --yes-do-it"
>>> LUXID_ARGS="--no-voting --yes-do-it"
>>>
>>> 3. Restart Ganeti on the node - either by running "service ganeti
>>> restart" or "/etc/init.d/ganeti restart".
>>> 4. Ganeti should be working again. If not, stop here and reply to this
>>> mail.
>>> 5. Remove the modified lines from /etc/default/ganeti
>>> 6. Run gnt-cluster verify, and if errors occur, gnt-cluster redist-conf.
>>>
>>> Ping me if more help is needed!
>>>
>>> Cheers,
>>> Riba
>>>
>>> On Tue, Dec 15, 2015 at 3:51 PM, Chencho Tshering <
>>> [email protected]> wrote:
>>>
>>>> My ganeti-cluster version is
>>>> gnt-cluster (ganeti v2.12.4) 2.12.4
>>>>
>>>> Chencho Tshering,
>>>> ICT Officer,
>>>> Institute of Language and Culture Studies, Taktse
>>>> Royal University of Bhutan
>>>>
>>>> On Tue, Dec 15, 2015 at 8:12 PM, Chencho Tshering <
>>>> [email protected]> wrote:
>>>>
>>>>>
>>>>> Chencho Tshering,
>>>>> ICT Officer,
>>>>> Institute of Language and Culture Studies, Taktse
>>>>> Royal University of Bhutan
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: Chencho Tshering <[email protected]>
>>>>> Date: Tue, Dec 15, 2015 at 8:08 PM
>>>>> Subject: Urgent!!! Ganeti Verify couldn't be done
>>>>> To: [email protected]
>>>>>
>>>>>
>>>>> Hi,
>>>>> I am very new to Ganeti clustering. My friend has installed into our
>>>>> server before i take up this job and He is gone and could not contact.
>>>>> Please help me with the issues that i am facing right now.
>>>>>
>>>>> I have 3 node and 4 instance running on it. But suddenly after power
>>>>> off for so long my master node is not responding in the sense that i
>>>>> couldn't verify ganeti clustering (i.e gnt-cluster verify). i rather 
>>>>> always
>>>>> get this error message like "Timeout while talking to the master
>>>>> daemon. Jobs might have beensubmitted and will continue to run even if the
>>>>> call timed out. Useful commands in this situation are 'gnt-job list',
>>>>> 'gnt-job cancel' and 'gnt-job watch'. Error: Connect timed out". I am
>>>>> using debain on master node as well to 2 slave nodes. I am not sure about
>>>>> the version of ganeti because i don't know how to check it.
>>>>>
>>>>> I tried master failover using only 2 node (master node and 1 slave)
>>>>> using this command "gnt-cluster master-failover -no--voting" and it didn't
>>>>> help. While executing this command the master node is shutdown as
>>>>> suggested. I am attaching my error message below.
>>>>>
>>>>>
>>>>> regards,
>>>>>
>>>>> Chencho Tshering,
>>>>> ICT Officer,
>>>>> Institute of Language and Culture Studies, Taktse
>>>>> Royal University of Bhutan
>>>>>
>>>>>
>>>>
>>> Hrvoje Ribicic
>>> Ganeti Engineering
>>> Google Germany GmbH
>>> Dienerstr. 12, 80331, München
>>>
>>> Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
>>> Registergericht und -nummer: Hamburg, HRB 86891
>>> Sitz der Gesellschaft: Hamburg
>>>
>>> Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind,
>>> leiten Sie diese bitte nicht weiter, informieren Sie den Absender und
>>> löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
>>>
>>> This e-mail is confidential. If you are not the right addressee please
>>> do not forward it, please inform the sender, and please erase this e-mail
>>> including any attachments. Thanks.
>>>
>>>
>>
> Hrvoje Ribicic
> Ganeti Engineering
> Google Germany GmbH
> Dienerstr. 12, 80331, München
>
> Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
>
> Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind,
> leiten Sie diese bitte nicht weiter, informieren Sie den Absender und
> löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
>
> This e-mail is confidential. If you are not the right addressee please do
> not forward it, please inform the sender, and please erase this e-mail
> including any attachments. Thanks.
>
>

Reply via email to