Hi Yan, Thanks a lot for your help. I took out the evmsSCC resource from the scenario, but I did not see any difference in the system behavior, then I followed your suggestion and I manually tested the EVMS commands from the CLI while both the nodes where in stand-by, and I actually realized that the command:
modify: gwcont,type=private,node=CZVLabNode2 was failing; was somehow not recognized as a valid command. The really weird thing is that the same command, avoiding the capital letters in the host name, was successful: modify: gwcont,type=private,node=czvlabnode2 This was true in both nodes, so I modified both the hostnames from: CZVLabNode1 --> czvlabnode1 CZVLabNode2 --> czvlabnode2 and now the fail over is working properly. like everything else. The reason why I tried to change the host names so to avoid any capital letter is that I noticed that, even if my host names were a mixture of normal and capital letters, in the hb_gui they were shown without capitals. As soon as I will have time for this, I will do some further test to verify if I can duplicate this again starting from scratch, so to verify if Heartbeat 2.1.2 and/or EVMS 2.5.5.-24.52 really have some issues with node names partially capitalized, I will update the list afterwards. Could also be that I modified something else in the system that I'm not fully aware of, or I simply or forgot it, as I did many different test on the same boxes. Thanks again, Chris On Nov 21, 2007 9:23 PM, Yan Fitterer <[EMAIL PROTECTED]> wrote: > Andrew Beekhof wrote: > > > > On Nov 21, 2007, at 10:11 AM, Christian Zemella wrote: > > > >> Hi All, > >> Anybody out there managed to have EVMS container resources > >> properly failing over in a 2 node Heartbeat 2 cluster running on SLES > >> 10 SP1 ? > > > > I believe so... have you read the documentation below? > > http://wiki.novell.com/images/3/37/Exploring_HASF.pdf > > > >> > >> > >> In my lab I can only start and stop the resource on the node that has > >> the container assigned within evms, while if I shut down that node, > >> the fail over does not occur as the evms_failover resource goes in > >> time out; as soon as the other nodes comes up again it takes the > >> resource back properly. > > This would indicate that evms_failover RA cannot assign the container to > the new node. Do you see the resource failing? Have you checked > failcount for the resources on that node? > > Some clues (from evms perspective): take a look in /dev/evms/.nodes > When the private container is present on the node, a device file named > after the container should appear there. > > TO test manually, the easiest is to start HB, then put both nodes on > standby, then manipulate the evms devices manually. > > To deport the container (on resource stop) evms_failover issues commands > to the evms command line tool: > > modify:"$1",type=deported > save > exit > > where $1 is the value of the "1" parameter you've passed to evms_failover. > > You can try this yourself manually, to verify where the issue is (i.e. > with evms or elsewhere). > > To import the container (when starting the resource), evms_failover does: > > modify:"$1",node="$HOSTNAME",type=private > save > exit > > > > >> > >> In my environment I created the following: > >> > >> I'm working using 2 VMWare boxes sharing one 4GB plain disk that works > >> as SAN; > >> > >> EVMS: > >> > >> I created a private container (gwcont) on the shared disk using CSM > >> plug-in and in it an EVMS Volume (gwvol); > >> on the volume i make a reiserfs file system; > >> I verified that the HA plug-in was working and that the node assigned > >> to the container can manually mount it. > >> > >> HB_GUI: > >> > >> I created a group ordered and collocated; > >> Inside the group i created the following resources: > >> - evmsSCC --> no No attributes, No Parameters; > >> - evms_failover --> Parameter: 1 Value: gwcont (name of the EVMS > >> container ) > >> - Filesystem --> Parameter: fstype Value: reiserfs; Parameter: device > >> Value: /dev/evms/gwcont/gwvol; Parameter: directory Value: /gw; > >> - IPAddr --> Parameter ip Value: xxx.xxx.xxx.xxx > >> > >> I then created a Location constraint so to assign the value 100 for > >> the group to run on Node 1, and a second Location constraint so to > >> assign the value 50 for the group to run on Node2. > >> > >> ha.cf: > >> > >> *** > >> autojoin any > >> crm true > >> ucast eth1 xxx.xxx.xxx.xxx (ipaddress of eth1 on the other node) > >> auto_failback off > >> node CZVLabNode1 > >> node CZVLabNode2 > >> respawn hacluster /usr/lib/heartbeat/ccm > >> respawn root /sbin/evmsd > >> apiauth evms uid=hacluster,root > >> apiauth ccm uid=hacluster, root > >> apiauth crm uid=hacluster,root > >> *** > >> > >> On both nodes I issued: > >> > >> chkconfig boot.evms on > >> > >> My feeling is that I'm doing something wrong in the configuration, > >> anybody can point me to the error I'm eventually doing here ? > > There's certainly no use for the evmsSCC resource (SCC stands for.... > Shared Cluster Container). For private containers, you need to use > evms_failover exclusively. > > You may need to start evms as well (/etc/init.d/evms), not just boot.evms. > > Finally, yes it _does_ work, but it's not flawless, in my experience. > > HTH > Yan > > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems