Thanks Dejan, I'll try the kill -9. One thing I'm seeing is that I can easily move the resources between nodes using the <location> constraint, but if I shutdown heartbeat on one node (/etc/init.d/heartbeat stop) I run into problems. If I shutdown the node with the active resources, heartbeat migrates the DRBD Master to the other node but the colocated group does not migrate (it remains stopped on the active node). I'm digging into that now. If I shutdown the node that does not have the active resources, the following happens:
(State: DC on active node1, running drbd master and group resources) shutdown node2 demote attempted on node1 for drbd master, no attempt at halting groups resources that depend on drbd demote of drbd master fails due to "device held open" error, filesystem still has it mounted loops through continuously trying to demote drbd (spin condition) shutdown command never completes, control-C, then kill -9 main heartbeat on node1 drbd:0 goes stopped, :1 Master goes FAILED, group resources all still show started startup command executed on node1, Bad Things Happen, eventually drbd goes unmanaged after node1 heartbeat startup completes, stop group and drbd, restart resources, everything comes up fine I'm going to try a similar test, but using kill -9 right off the bat instead of the controlled shutdown. If there's any info I need to provide to make this clearer, please, anybody, just let me know. Doug On Thu, 2007-05-03 at 13:14 +0200, Dejan Muhamedagic wrote: > On Fri, Apr 27, 2007 at 03:10:22PM -0400, Doug Knight wrote: > > I now have a working configuration with DRBD master/slave, and a > > filesystem/pgsql/ipaddr group following it around. So far, I've been > > using a Place constraint and modifying its uname value to test the "fail > > over" of the resources. Can someone suggest a reasonable set of tests > > that most do to verify other possible error conditions (short of pulling > > the plug on one of the servers)? > > You can run CTS with your configuration. Otherwise, stopping > heartbeat in a way that it doesn't notice being stopped (kill -9) > simulates the "pull power plug" condition. You'd also want to > make various resources fail. > > > Also, the Place constraint is on the > > DRBD master/slave, does that make sense or should it be placed on one of > > the "higher level" resources like the file system or pgsql? > > I don't think it matters, you can go with either, given that the > resources are collocated. > > > Thanks, > > Doug > > > > On Thu, 2007-04-26 at 09:45 -0400, Doug Knight wrote: > > > > > Hi Alastair, > > > Have you encountered a situation where when you first start up the drbd > > > master/slave resource, crm_mon and/or the GUI indicate Master status on > > > one node, and Started status on the other (as opposed to Slave)? If so, > > > how did you correct it? > > > > > > Doug > > > p.s. Thanks for the scripts and xml, they're a big help! > > > _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems