Re: [Linux-HA] Heartbeat 2: failover of EVMS private container resources

Chris Fri, 23 Nov 2007 02:17:24 -0800

Hi Yan,
            Thanks a lot for your help. I took out the evmsSCC
resource from the scenario, but I did not see any difference in the
system behavior, then I followed your suggestion and I manually tested
the EVMS commands from the CLI while both the nodes where in stand-by,
and I actually realized that the command:


modify: gwcont,type=private,node=CZVLabNode2

was failing; was somehow not recognized as a valid command.

The really weird thing is that the same command, avoiding the capital
letters in the host name, was successful:

modify: gwcont,type=private,node=czvlabnode2

This was true in both nodes, so I modified both the hostnames from:

CZVLabNode1 --> czvlabnode1
CZVLabNode2 --> czvlabnode2

and now the fail over is working properly. like everything else.

The reason why I tried to change the host names so to avoid any
capital letter is that I noticed that, even if my host names were a
mixture of normal and capital letters, in the hb_gui they were shown
without capitals.

As soon as I will have time for this, I will do some further test to
verify if I can duplicate this again starting from scratch, so to
verify if Heartbeat 2.1.2 and/or EVMS 2.5.5.-24.52 really have some
issues with node names partially capitalized, I will update the list
afterwards.

Could also be that I modified something else in the system that I'm
not fully aware of, or I simply or forgot it, as I did many different
test on the same boxes.

Thanks again,
                      Chris




On Nov 21, 2007 9:23 PM, Yan Fitterer <[EMAIL PROTECTED]> wrote:
> Andrew Beekhof wrote:
> >
> > On Nov 21, 2007, at 10:11 AM, Christian Zemella wrote:
> >
> >> Hi All,
> >>        Anybody out there managed to have EVMS container resources
> >> properly failing over in a 2 node Heartbeat 2 cluster running on SLES
> >> 10 SP1 ?
> >
> > I believe so... have you read the documentation below?
> >    http://wiki.novell.com/images/3/37/Exploring_HASF.pdf
> >
> >>
> >>
> >> In my lab I can only start and stop the resource on the node that has
> >> the container assigned within evms, while if I shut down that node,
> >> the fail over does not occur as the evms_failover resource goes in
> >> time out; as soon as the other nodes comes up again it takes the
> >> resource back properly.
>
> This would indicate that evms_failover RA cannot assign the container to
> the new node. Do you see the resource failing? Have you checked
> failcount for the resources on that node?
>
> Some clues (from evms perspective): take a look in /dev/evms/.nodes
> When the private container is present on the node, a device file named
> after the container should appear there.
>
> TO test manually, the easiest is to start HB, then put both nodes on
> standby, then manipulate the evms devices manually.
>
> To deport the container (on resource stop) evms_failover issues commands
> to the evms command line tool:
>
> modify:"$1",type=deported
> save
> exit
>
> where $1 is the value of the "1" parameter you've passed to evms_failover.
>
> You can try this yourself manually, to verify where the issue is (i.e.
> with evms or elsewhere).
>
> To import the container (when starting the resource), evms_failover does:
>
> modify:"$1",node="$HOSTNAME",type=private
> save
> exit
>
>
>
> >>
> >> In my environment I created the following:
> >>
> >> I'm working using 2 VMWare boxes sharing one 4GB plain disk that works
> >> as SAN;
> >>
> >> EVMS:
> >>
> >> I created a private container (gwcont) on the shared disk using CSM
> >> plug-in and in it an EVMS Volume (gwvol);
> >> on the volume i make a reiserfs file system;
> >> I verified that the HA plug-in was working and that the node assigned
> >> to the container can manually mount it.
> >>
> >> HB_GUI:
> >>
> >> I created a group ordered and collocated;
> >> Inside the group i created the following resources:
> >> - evmsSCC --> no No attributes, No Parameters;
> >> - evms_failover --> Parameter: 1 Value: gwcont (name of the EVMS
> >> container )
> >> - Filesystem --> Parameter: fstype Value: reiserfs; Parameter: device
> >> Value: /dev/evms/gwcont/gwvol; Parameter: directory Value: /gw;
> >> - IPAddr --> Parameter ip Value: xxx.xxx.xxx.xxx
> >>
> >> I then created a Location constraint so to assign the value 100 for
> >> the group to run on Node 1, and a second Location constraint so to
> >> assign the value 50 for the group to run on Node2.
> >>
> >> ha.cf:
> >>
> >> ***
> >> autojoin any
> >> crm true
> >> ucast eth1 xxx.xxx.xxx.xxx (ipaddress of eth1 on the other node)
> >> auto_failback off
> >> node CZVLabNode1
> >> node CZVLabNode2
> >> respawn hacluster /usr/lib/heartbeat/ccm
> >> respawn root /sbin/evmsd
> >> apiauth evms uid=hacluster,root
> >> apiauth ccm uid=hacluster, root
> >> apiauth crm uid=hacluster,root
> >> ***
> >>
> >> On both nodes I issued:
> >>
> >> chkconfig boot.evms on
> >>
> >> My feeling is that I'm doing something wrong in the configuration,
> >> anybody can point me to the error I'm eventually doing here ?
>
> There's certainly no use for the evmsSCC resource (SCC stands for....
> Shared Cluster Container). For private containers, you need to use
> evms_failover exclusively.
>
> You may  need to start evms as well (/etc/init.d/evms), not just boot.evms.
>
> Finally, yes it _does_ work, but it's not flawless, in my experience.
>
> HTH
> Yan
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Heartbeat 2: failover of EVMS private container resources

Reply via email to