Andrew Beekhof wrote: > > On Mar 21, 2007, at 12:55 PM, Alan Robertson wrote: > >> Andrew Beekhof wrote: >>> On 3/20/07, Alan Robertson <[EMAIL PROTECTED]> wrote: >>> >>>> Max Hofer wrote: >>> >>>> >>>>> OK, >>> >>>> >>>>> >>> >>>> >>>>> i lost a day just trying to figure out how to replace a cluster node >>> >>>> with >>> >>>> >>>>> a spare part. I just thought someone else needs this info or maybe >>> >>>> >>>>> knows a better way as How I did. >>> >>>> >>>>> >>> >>>> >>>>> Situation: >>> >>>> >>>>> - cluster with 2 nodes (routing1, routing2) >>> >>>> >>>>> - routing2 should be replaced with a spare part >>> >>>> >>>>> - routing1 and routing2 use a file system on a drbd to share >>> >>>> >>>>> common data >>> >>>> >>>>> >>> >>>> >>>>> Precondition: >>> >>>> >>>>> - routing2 crashed and hb_uuid is not recoverable >>> >>>> >>> >>>> FYI: It's in the CIB, and also in the hb_uuid files on every machine. >>> >>>> >>> >>>> >>>>> - spare part is configured to not start heartbeat after power-on >>> >>>> >>>>> >>> >>>> >>>>> Steps I did: >>> >>>> >>>>> * replaced crashed routing2 with spare part (cabling etc.) >>> >>>> >>>>> * powered on routing2 >>> >>>> >>>>> * on routing2 invalidate data on drbd device (---> sync from routing1 >>> >>>> >>>>> to routing2) >>> >>>> >>>>> * on routing1 delete routing2 (I found a bug that pingd resets to 0 >>> >>>> >>>>> when calling hb_delnode ---> see bug #1535) >>> >>>> >>>>> # /usr/lib/heartbeat/hb_delnode routing2 && killall pingd >>> >>>> >>>>> (!!!NOTE: if your cluster configuration triggers a failover on a pingd >>> >>>> >>>>> failure set the cluster in unmanaged mode, stop pingd, delete >>> >>>> >>>>> the node and then restart pingd, setting the cluster in managed mode >>> >>>> >>>>> again) >>> >>>> >>>>> * on routing1 delete removed hostcache (I'm not sure if this setp is >>> >>>> >>>>> neccessary but someone in the mailing list explained it has to be >>>>> done) >>> >>>> >>>>> # rm /var/lib/heartbeat/delhostcache >>> >>>> >>>>> * on routing1 add routing2 again >>> >>>> >>>>> # /usr/lib/heartbeat/hb_addnode routing2 >>> >>>> >>>>> * start heartbeat on routing2 >>> >>>> >>>>> >>> >>>> >>>>> Finished ..... >>> >>>> >>>>> >>> >>>> >>>>> What i really find stupid about the whole proccedure: >>> >>>> >>>>> * the assumption the UUID file (/var/lib/heartbeat/hb_uuid) should can >>> >>>> >>>>> be used on the spare part is probably never the case (except you >>> >>>> >>>>> perform a planned replacement ... ) >>> >>>> >>> >>>> See note above... >>> >>>> >>> >>>> >>>>> * this assumption does not work well if the spare part is installed to >>> >>>> >>>>> be a replacement for different cluster nodes. The UUDI is created >>> >>>> >>>>> on the veiry first install of heartbeat (and thus is not part of my >>> >>>> >>>>> configuration data). It would be a cofiguration hell to "save all >>> >>>> >>>>> UUID of all clusters after cluster actvation" on a system with a >>> >>>> >>>>> couple nodes >>> >>>> >>> >>>> It's already saved for you - in two places on every machine... >>> >>>> >>> >>>> What's missing is the conversion from ASCII to binary. Could you >>>> make a >>> >>>> bugzilla for that and assign it to me? >>> >>>> >>> >>> been there done that: >>> crm_uuid -w >> >> Andrew: Is there a man page or other documentation outside the command >> for this? > > it will be in the set novell is making available to us
Does "us" mean novell customers? As a note, there does need to be a man page specifically, and it needs to be created with the man page macros through roff. Man is the UNIX/Linux documentation standard, unfortunately. -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems