I have recently assumed the responsibility for maintaining code on one of my 
company's products that uses Pacemaker/Heartbeat.  I'm still coming up to speed 
on this code, and would like to solicit comments about some particular 
behavior.  For reference, the Pacemaker version is 1.0.9.1, and Heartbeat is 
version 3.0.3.

This product uses two host systems, each of which supports several disk 
enclosures, operating in an "active/passive" mode.  The two hosts are connected 
by redundant, dedicated 10Gb Ethernet links, which are used for messaging 
between them.  The disks in each enclosure are controlled by an instance of an 
application called SS.  If an "active" host's SS application fails for some 
reason, then the corresponding application on the "passive" host will assume 
control of the disks.  Each application is assigned a Pacemaker resource, and 
the resource agent communicates with the appropriate SS instance.  For 
reference, here's a sample crm_mon output:

============
Last updated: Tue Mar  5 06:10:22 2013
Stack: Heartbeat
Current DC: mgraid-12241530rn01433-0 (f4e5e15c-d06b-4e37-89b9-4621af05128f) - 
partition with quorum
Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
2 Nodes configured, unknown expected votes
9 Resources configured.
============

Online: [ mgraid-12241530rn01433-0 mgraid-12241530rn01433-1 ]

Clone Set: Fencing
     Started: [ mgraid-12241530rn01433-0 mgraid-12241530rn01433-1 ]
Clone Set: cloneIcms
     Started: [ mgraid-12241530rn01433-0 mgraid-12241530rn01433-1 ]
Clone Set: cloneOmserver
     Started: [ mgraid-12241530rn01433-0 mgraid-12241530rn01433-1 ]
Master/Slave Set: ms-SS11451532RN01389
     Masters: [ mgraid-12241530rn01433-1 ]
     Slaves: [ mgraid-12241530rn01433-0 ]
Master/Slave Set: ms-SS11481532RN01465
     Masters: [ mgraid-12241530rn01433-0 ]
     Slaves: [ mgraid-12241530rn01433-1 ]
Master/Slave Set: ms-SS12171532RN01613
     Masters: [ mgraid-12241530rn01433-0 ]
     Slaves: [ mgraid-12241530rn01433-1 ]
Master/Slave Set: ms-SS12241530RN01433
     Masters: [ mgraid-12241530rn01433-0 ]
     Slaves: [ mgraid-12241530rn01433-1 ]
Master/Slave Set: ms-SS12391532RN01768
     Masters: [ mgraid-12241530rn01433-0 ]
     Slaves: [ mgraid-12241530rn01433-1 ]
Master/Slave Set: ms-SS12391532RN01772
     Masters: [ mgraid-12241530rn01433-0 ]
     Slaves: [ mgraid-12241530rn01433-1 ]

I've been investigating the system's behavior when one or more master SS 
instances crashes, simulated by a kill command.  I've noticed two behaviors of 
interest.

First, in a simple case, where one master SS is killed, it takes about 10-12 
seconds for the slave to complete the failover.  From the log files, the DC 
issues the following notifications to the slave SS:

*         Pre_notify_demote

*         Post_notify_demote

*         Pre_notify_stop

*         Post_notify_stop

*         Pre_notify_promote

*         Promote

*         Post_notify_promote

*         Monitor_3000

*         Pre_notify_start

*         Post_notify_start

These notifications and their confirmations appear to take about 1-2 seconds 
each, begging the following questions:

*         Is this sequence of notifications expected?

*         Is the 10-12 second timeframe expected?

Second, in a more complex case, where the master SS for each instance is 
assigned to the same can, and each SS is in turn killed with an approximate 
10-second delay between kill commands, there appear to be very long delays in 
processing the notifications.  These delays appear to be associated with these 
factors

*         After an SS instance is killed, there's a 10-second monitor 
notification which causes a new SS instance to be launched to replace the 
missing SS instance.

*         It takes about 30 seconds for an SS instance to complete the startup 
process.  The resource agent waits for that startup to complete before 
returning to crmd.

*         Until the resource agent returns, crmd does not process notifications 
for any other SS/resource.
The net effect of these delays varies from one SS instance to another.  In some 
cases, the "normal" failover occurs, taking 10-12 seconds.  In other cases, 
there is no failover to the other host's SS instance, and there is no 
master/active SS instance for 1-2 minutes (until an SS instance is re-launched 
following the kill), depending upon the number of disk enclosures and thus the 
number of SS instances.

My first question in this case is simply whether the serialization of 
notifications among the various SS resources is expected?  In other words, 
transition notifications for one resource are delayed until earlier 
notifications are completed.  Is this the expected behavior?  Secondly, once 
the SS instance has been restarted, there's apparently no attempt to complete 
the failover; the new SS instance assumes the active/master role

Finally, a couple of general questions:

*         Is there any reason to believe that a later version of Pacemaker 
would behave differently?

*         Is there a mechanism by which the crmd (and lrmd) debug levels can be 
increased at run time (allowing more debug messages in the log output)?

Thanks very much for your help,
Michael Powell

[cid:image001.gif@01CE1966.78F62940]

    Michael Powell
    Staff Engineer

    15220 NW Greenbrier Pkwy
        Suite 290
    Beaverton, OR   97006
    T 503-372-7327    M 503-789-3019   H 503-625-5332

    www.harmonicinc.com<http://www.harmonicinc.com>

<<inline: image001.gif>>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to