Thanks Dejan, I'll try the kill -9. One thing I'm seeing is that I can
easily move the resources between nodes using the <location> constraint,
but if I shutdown heartbeat on one node (/etc/init.d/heartbeat stop) I
run into problems. If I shutdown the node with the active resources,
heartbeat migrates the DRBD Master to the other node but the colocated
group does not migrate (it remains stopped on the active node). I'm
digging into that now. If I shutdown the node that does not have the
active resources, the following happens:

(State: DC on active node1, running drbd master and group resources)
shutdown node2
demote attempted on node1 for drbd master, no attempt at halting groups
resources that depend on drbd
demote of drbd master fails due to "device held open" error, filesystem
still has it mounted
loops through continuously trying to demote drbd (spin condition)
shutdown command never completes, control-C, then kill -9 main heartbeat
on node1
drbd:0 goes stopped, :1 Master goes FAILED, group resources all still
show started
startup command executed on node1, Bad Things Happen, eventually drbd
goes unmanaged
after node1 heartbeat startup completes, stop group and drbd, restart
resources, everything comes up fine

I'm going to try a similar test, but using kill -9 right off the bat
instead of the controlled shutdown. If there's any info I need to
provide to make this clearer, please, anybody, just let me know.

Doug

On Thu, 2007-05-03 at 13:14 +0200, Dejan Muhamedagic wrote:

> On Fri, Apr 27, 2007 at 03:10:22PM -0400, Doug Knight wrote:
> > I now have a working configuration with DRBD master/slave, and a
> > filesystem/pgsql/ipaddr group following it around. So far, I've been
> > using a Place constraint and modifying its uname value to test the "fail
> > over" of the resources. Can someone suggest a reasonable set of tests
> > that most do to verify other possible error conditions (short of pulling
> > the plug on one of the servers)?
> 
> You can run CTS with your configuration. Otherwise, stopping
> heartbeat in a way that it doesn't notice being stopped (kill -9)
> simulates the "pull power plug" condition. You'd also want to
> make various resources fail.
> 
> > Also, the Place constraint is on the
> > DRBD master/slave, does that make sense or should it be placed on one of
> > the "higher level" resources like the file system or pgsql?
> 
> I don't think it matters, you can go with either, given that the
> resources are collocated.
> 
> > Thanks,
> > Doug
> > 
> > On Thu, 2007-04-26 at 09:45 -0400, Doug Knight wrote:
> > 
> > > Hi Alastair,
> > > Have you encountered a situation where when you first start up the drbd
> > > master/slave resource, crm_mon and/or the GUI indicate Master status on
> > > one node, and Started status on the other (as opposed to Slave)? If so,
> > > how did you correct it?
> > > 
> > > Doug
> > > p.s. Thanks for the scripts and xml, they're a big help!
> > > 


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to