Hi Martijn, I want to know how to setup a 3-node cluster environment can you suggested me some good book or could you provide some of the technical articles which i can follow and try it on to my test server.
On Tue, Aug 17, 2010 at 12:03 AM, Martijn <[email protected]> wrote: > > > -origineel bericht- > Onderwerp: [Linux-cluster] what happens during orderly shutdown ? > Van: "Martin Waite" <[email protected]> > Datum: 16-08-2010 19:05 > > Hi, > > > > We have hit a snag during testing of a 3-node cluster, where only nodes > 1 and 2 are allowed to run the service. > > > > If we shutdown the node running the service, the service first starts on > the surviving node, but is then stopped. This is not an activity we > would normally expect to do, but some of the environment monitoring in > our servers can trigger this situation - temperature getting too high > etc. > > > > What I think happens is the following: > > > > 1. host 1: shutdown calls on rgmanager to stop: this takes several > seconds, stopping the service. > > 2. host 2: rgmanager starts to relocate the service to the surviving > node > > 3. host 1: shutdown calls on cman to stop > > 4. host 2: while relocating the service, cman enters gather state > because host 1 has dropped out. The service is still starting and > triggers "Failed changing service status" in log below. > > 5. host 1: halts > > 6. host 2: the service completes start up. The cluster > reconfiguration completes. > > 7. host 2: the service is stopped. > > > > Is this a bug in the cluster system, or just an unfortunate sequence of > events that the cluster cannot cope with, i.e. node failure while a > service is relocating ? > > > > regards, > > Martin > > > > Here is the log from node 2: > > > > Aug 16 16:58:40 gmhcluster2 clurgmgrd[4132]: <notice> Member 1 shutting > down > > Aug 16 16:58:40 gmhcluster2 openais[4084]: [TOTEM] Retransmit List: 86 > > Aug 16 16:58:46 gmhcluster2 clurgmgrd[4132]: <notice> Starting stopped > service service:MASTERVIP > > Aug 16 16:58:55 gmhcluster2 openais[4084]: [TOTEM] The token was lost in > the OPERATIONAL state. > > Aug 16 16:58:55 gmhcluster2 openais[4084]: [TOTEM] Receive multicast > socket recv buffer size (320000 bytes). > > Aug 16 16:58:55 gmhcluster2 openais[4084]: [TOTEM] Transmit multicast > socket send buffer size (262142 bytes). > > Aug 16 16:58:55 gmhcluster2 openais[4084]: [TOTEM] entering GATHER state > from 2. > > Aug 16 16:59:01 gmhcluster2 clurgmgrd[4132]: <err> #47: Failed changing > service status > > Aug 16 16:59:01 gmhcluster2 clurgmgrd[4132]: <warning> #71: Relocating > failed service service:MASTERVIP > > Aug 16 16:59:01 gmhcluster2 clurgmgrd[4132]: <warning> #70: Failed to > relocate service:MASTERVIP; restarting locally > > Aug 16 16:59:01 gmhcluster2 clurgmgrd[4132]: <notice> Starting stopped > service service:MASTERVIP > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [TOTEM] entering GATHER state > from 0. > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [TOTEM] Creating commit token > because I am the rep. > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [TOTEM] Saving state aru 90 > high seq received 90 > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [TOTEM] Storing new sequence > id for ring 1e4 > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [TOTEM] entering COMMIT > state. > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [TOTEM] entering RECOVERY > state. > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [TOTEM] position [0] member > 172.30.31.4: > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [TOTEM] previous ring seq 480 > rep 172.30.31.3 > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [TOTEM] aru 90 high delivered > 90 received flag 1 > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [TOTEM] position [1] member > 172.30.31.5: > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [TOTEM] previous ring seq 480 > rep 172.30.31.3 > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [TOTEM] aru 90 high delivered > 90 received flag 1 > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [TOTEM] Did not need to > originate any messages in recovery. > > Aug 16 16:59:15 gmhcluster2 kernel: dlm: closing connection to node 1 > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [TOTEM] Sending initial ORF > token > > Aug 16 16:59:15 gmhcluster2 clurgmgrd: [4132]: <info> Executing > /usr/sbin/dc-cluster-resourceagent start > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [CLM ] CLM CONFIGURATION > CHANGE > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [CLM ] New Configuration: > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [CLM ] r(0) > ip(172.30.31.4) > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [CLM ] r(0) > ip(172.30.31.5) > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [CLM ] Members Left: > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [CLM ] r(0) > ip(172.30.31.3) > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [CLM ] Members Joined: > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [CLM ] CLM CONFIGURATION > CHANGE > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [CLM ] New Configuration: > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [CLM ] r(0) > ip(172.30.31.4) > > Aug 16 16:59:15 gmhcluster2 /usr/sbin/dc-cluster-resourceagent: [2492] > INFO - ACTIVESITE in state started > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [CLM ] r(0) > ip(172.30.31.5) > > Aug 16 16:59:15 gmhcluster2 /usr/sbin/dc-cluster-resourceagent: [2492] > INFO - ACTIVESITE service is running - ACTIVESITE in state started > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [CLM ] Members Left: > > Aug 16 16:59:15 gmhcluster2 /usr/sbin/dc-cluster-resourceagent: [2492] > INFO - Service MASTERVIP: Performing precautionary updates to databases > before setting up VIP > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [CLM ] Members Joined: > > Aug 16 16:59:15 gmhcluster2 /usr/sbin/dc-cluster-resourceagent: [2492] > INFO - Service MASTERVIP: running precautionary update [touch > /tmp/dc-set-dr-counters] > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [SYNC ] This node is within > the primary component and will provide service. > > Aug 16 16:59:15 gmhcluster2 /usr/sbin/dc-cluster-resourceagent: [2492] > INFO - Service MASTERVIP: running precautionary update [touch > /tmp/dc-set-dr-auto-increments] > > Aug 16 16:59:15 gmhcluster2 openais[4084]: [TOTEM] entering OPERATIONAL > state. > > Aug 16 16:59:16 gmhcluster2 openais[4084]: [CLM ] got nodejoin message > 172.30.31.4 > > Aug 16 16:59:16 gmhcluster2 openais[4084]: [CLM ] got nodejoin message > 172.30.31.5 > > Aug 16 16:59:16 gmhcluster2 openais[4084]: [CPG ] got joinlist message > from node 3 > > Aug 16 16:59:16 gmhcluster2 clurgmgrd: [4132]: <info> Adding IPv4 > address 172.30.30.93/24 to bond0 > > Aug 16 16:59:16 gmhcluster2 openais[4084]: [CPG ] got joinlist message > from node 2 > > Aug 16 16:59:17 gmhcluster2 clurgmgrd[4132]: <notice> Service > service:MASTERVIP started > > Aug 16 16:59:17 gmhcluster2 clurgmgrd[4132]: <notice> Stopping service > service:MASTERVIP > > Aug 16 16:59:17 gmhcluster2 clurgmgrd: [4132]: <info> Removing IPv4 > address 172.30.30.93/24 from bond0 > > Aug 16 16:59:27 gmhcluster2 clurgmgrd: [4132]: <info> Executing > /usr/sbin/dc-cluster-resourceagent stop > > Aug 16 16:59:27 gmhcluster2 clurgmgrd[4132]: <notice> Service > service:MASTERVIP is stopped > > > > > > > > > > > > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster >
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
