Re: [Linux-HA] Replacing Existing Node

Alan Robertson Tue, 27 Mar 2007 11:54:22 -0800

Andrew Beekhof wrote:
> 
> On Mar 21, 2007, at 12:55 PM, Alan Robertson wrote:
> 
>> Andrew Beekhof wrote:
>>> On 3/20/07, Alan Robertson <[EMAIL PROTECTED]> wrote:
>>>
>>>> Max Hofer wrote:
>>>
>>>>
>>>>> OK,
>>>
>>>>
>>>>>
>>>
>>>>
>>>>> i lost a day just trying to figure out how to replace a cluster node
>>>
>>>> with
>>>
>>>>
>>>>> a spare part. I just thought someone else needs this info or maybe
>>>
>>>>
>>>>> knows a better way as How I did.
>>>
>>>>
>>>>>
>>>
>>>>
>>>>> Situation:
>>>
>>>>
>>>>> - cluster with 2 nodes (routing1, routing2)
>>>
>>>>
>>>>> - routing2 should be replaced with a spare part
>>>
>>>>
>>>>> - routing1 and routing2 use a file system on a drbd to share
>>>
>>>>
>>>>>  common data
>>>
>>>>
>>>>>
>>>
>>>>
>>>>> Precondition:
>>>
>>>>
>>>>> - routing2 crashed and hb_uuid is not recoverable
>>>
>>>>
>>>
>>>> FYI: It's in the CIB, and also in the hb_uuid files on every machine.
>>>
>>>>
>>>
>>>>
>>>>> - spare part is configured to not start heartbeat after power-on
>>>
>>>>
>>>>>
>>>
>>>>
>>>>> Steps I did:
>>>
>>>>
>>>>> * replaced crashed routing2 with spare part (cabling etc.)
>>>
>>>>
>>>>> * powered on routing2
>>>
>>>>
>>>>> * on routing2 invalidate data on drbd device (---> sync from routing1
>>>
>>>>
>>>>> to routing2)
>>>
>>>>
>>>>> * on routing1 delete routing2 (I found a bug that pingd resets to 0
>>>
>>>>
>>>>> when calling hb_delnode ---> see bug #1535)
>>>
>>>>
>>>>> # /usr/lib/heartbeat/hb_delnode routing2 && killall pingd
>>>
>>>>
>>>>> (!!!NOTE: if your cluster configuration triggers a failover on a pingd
>>>
>>>>
>>>>> failure set the cluster in unmanaged mode, stop pingd, delete
>>>
>>>>
>>>>> the node and then restart pingd, setting the cluster in managed mode
>>>
>>>>
>>>>> again)
>>>
>>>>
>>>>> * on routing1 delete removed hostcache (I'm not sure if this setp is
>>>
>>>>
>>>>> neccessary but someone in the mailing list explained it has to be
>>>>> done)
>>>
>>>>
>>>>> # rm /var/lib/heartbeat/delhostcache
>>>
>>>>
>>>>> * on routing1 add routing2 again
>>>
>>>>
>>>>> # /usr/lib/heartbeat/hb_addnode routing2
>>>
>>>>
>>>>> * start heartbeat on routing2
>>>
>>>>
>>>>>
>>>
>>>>
>>>>> Finished .....
>>>
>>>>
>>>>>
>>>
>>>>
>>>>> What i really find stupid about the whole proccedure:
>>>
>>>>
>>>>> * the assumption the UUID file (/var/lib/heartbeat/hb_uuid) should can
>>>
>>>>
>>>>> be used on the spare part is probably never the case (except you
>>>
>>>>
>>>>> perform a planned replacement ... )
>>>
>>>>
>>>
>>>> See note above...
>>>
>>>>
>>>
>>>>
>>>>> * this assumption does not work well if the spare part is installed to
>>>
>>>>
>>>>> be a replacement for different cluster nodes. The UUDI is created
>>>
>>>>
>>>>> on the veiry first install of heartbeat (and thus is not part of my
>>>
>>>>
>>>>> configuration data). It would be a cofiguration hell to "save all
>>>
>>>>
>>>>> UUID of all clusters after cluster actvation" on a system with a
>>>
>>>>
>>>>> couple nodes
>>>
>>>>
>>>
>>>> It's already saved for you - in two places on every machine...
>>>
>>>>
>>>
>>>> What's missing is the conversion from ASCII to binary.  Could you
>>>> make a
>>>
>>>> bugzilla for that and assign it to me?
>>>
>>>>
>>>
>>> been there done that:
>>>  crm_uuid -w
>>
>> Andrew:  Is there a man page or other documentation outside the command
>> for this?
> 
> it will be in the set novell is making available to us


Does "us" mean novell customers?

As a note, there does need to be a man page specifically, and it needs
to be created with the man page macros through roff.  Man is the
UNIX/Linux documentation standard, unfortunately.


-- 
    Alan Robertson <[EMAIL PROTECTED]>

"Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Replacing Existing Node

Reply via email to