I can agree, Master monitor operation is broken in 1.1.10 release.
When the slave monitor action is started, the master monitor action is not called any more.
 
I have created a setup with Stateful resource with two nodes.
Then the Pacemaker installation is changed to different versions without changing the configuration part of the CIB.
 
Result:
1.1.10-rc5, 1.1.10-rc6 and 1.1.10-rc7 does not have this error
1.1.10-1 release has the error
 
Installation order (just that anybody know how it was done):
1.1.10-1 -> error
1.1.10-rc5 -> no error
1.1.10-rc6 -> no error
1.1.10-rc7 -> no error
1.1.10-1 -> error
 
Rainer
Gesendet: Freitag, 26. Juli 2013 um 09:32 Uhr
Von: "Takatoshi MATSUO" <matsuo....@gmail.com>
An: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org>
Betreff: Re: [Pacemaker] Announce: Pacemaker 1.1.10 now available
Hi

I used Stateful RA and caught a same issue.

1. before starting slave

# crm_simulate -VVV -S -x /var/lib/pacemaker/pengine/pe-input-1543.bz2
| grep "Resource action"
* Resource action: stateful monitor=2000 on 16-sl6

2. starting slave
# crm_simulate -VVV -S -x /var/lib/pacemaker/pengine/pe-input-1544.bz2
| grep "Resource action"
* Resource action: stateful monitor on 17-sl6
* Resource action: stateful notify on 16-sl6
* Resource action: stateful start on 17-sl6
* Resource action: stateful notify on 16-sl6
* Resource action: stateful notify on 17-sl6
* Resource action: stateful monitor=3000 on 17-sl6

3. after
# crm_simulate -VVV -S -x /var/lib/pacemaker/pengine/pe-input-1545.bz2
| grep "Resource action"
* Resource action: stateful monitor=3000 on 17-sl6

Monitor=2000 is deleted.
Is this correct ?


My setting
--------
property \
no-quorum-policy="ignore" \
stonith-enabled="false"

rsc_defaults \
resource-stickiness="INFINITY" \
migration-threshold="1"

ms msStateful stateful \
meta \
master-max="1" \
master-node-max="1" \
clone-max="2" \
clone-node-max="1" \
notify="true"

primitive stateful ocf:heartbeat:Stateful \
op start timeout="60s" interval="0s" on-fail="restart" \
op monitor timeout="60s" interval="3s" on-fail="restart" \
op monitor timeout="60s" interval="2s" on-fail="restart" role="Master" \
op promote timeout="60s" interval="0s" on-fail="restart" \
op demote timeout="60s" interval="0s" on-fail="stop" \
op stop timeout="60s" interval="0s" on-fail="block"
--------

Regards,
Takatoshi MATSUO

2013/7/26 Takatoshi MATSUO <matsuo....@gmail.com>:
> Hi
>
> My report is late for 1.1.10 :(
>
> I am using pacemaker 1.1.10-0.1.ab2e209.git.
> It seems that master's monitor is stopped when slave is started.
>
> Does someone encounter same problem ?
> I attach a log and settings.
>
>
> Thanks,
> Takatoshi MATSUO
>
> 2013/7/26 Digimer <li...@alteeve.ca>:
>> Congrats!! I know this was a long time in the making.
>>
>> digimer
>>
>>
>> On 25/07/13 20:43, Andrew Beekhof wrote:
>>>
>>> Announcing the release of Pacemaker 1.1.10
>>>
>>> https://github.com/ClusterLabs/pacemaker/releases/Pacemaker-1.1.10
>>>
>>> There were three changes of note since rc7:
>>>
>>> + Bug cl#5161 - crmd: Prevent memory leak in operation cache
>>> + cib: Correctly read back archived configurations if the primary is
>>> corrupted
>>> + cman: Do not pretend we know the state of nodes we've never seen
>>>
>>> Along with assorted bug fixes, the major topics for this release were:
>>>
>>> - stonithd fixes
>>> - fixing memory leaks, often caused by incorrect use of glib reference
>>> counting
>>> - supportability improvements (code cleanup and deduplication,
>>> standardized error codes)
>>>
>>> Release candidates for the next Pacemaker release (1.1.11) can be
>>> expected some time around Novemeber.
>>>
>>> A big thankyou to everyone that spent time testing the release
>>> candidates and/or contributed patches. However now that Pacemaker is
>>> perfect, anyone reporting bugs will be shot :-)
>>>
>>> To build `rpm` packages:
>>>
>>> 1. Clone the current sources:
>>>
>>> # git clone --depth 0 git://github.com/ClusterLabs/pacemaker.git
>>> # cd pacemaker
>>>
>>> 1. Install dependancies (if you haven't already)
>>>
>>> [Fedora] # sudo yum install -y yum-utils
>>> [ALL] # make rpm-dep
>>>
>>> 1. Build Pacemaker
>>>
>>> # make release
>>>
>>> 1. Copy and deploy as needed
>>>
>>> ## Details - 1.1.10 - final
>>>
>>> Changesets: 602
>>> Diff: 143 files changed, 8162 insertions(+), 5159 deletions(-)
>>>
>>> ## Highlights
>>>
>>> ### Features added since Pacemaker-1.1.9
>>>
>>> + Core: Convert all exit codes to positive errno values
>>> + crm_error: Add the ability to list and print error symbols
>>> + crm_resource: Allow individual resources to be reprobed
>>> + crm_resource: Allow options to be set recursively
>>> + crm_resource: Implement --ban for moving resources away from nodes
>>> and --clear (replaces --unmove)
>>> + crm_resource: Support OCF tracing when using
>>> --force-(check|start|stop)
>>> + PE: Allow active nodes in our current membership to be fenced without
>>> quorum
>>> + PE: Suppress meaningless IDs when displaying anonymous clone status
>>> + Turn off auto-respawning of systemd services when the cluster starts
>>> them
>>> + Bug cl#5128 - pengine: Support maintenance mode for a single node
>>>
>>> ### Changes since Pacemaker-1.1.9
>>>
>>> + crmd: cib: stonithd: Memory leaks resolved and improved use of glib
>>> reference counting
>>> + attrd: Fixes deleted attributes during dc election
>>> + Bug cf#5153 - Correctly display clone failcounts in crm_mon
>>> + Bug cl#5133 - pengine: Correctly observe on-fail=block for failed
>>> demote operation
>>> + Bug cl#5148 - legacy: Correctly remove a node that used to have a
>>> different nodeid
>>> + Bug cl#5151 - Ensure node names are consistently compared without
>>> case
>>> + Bug cl#5152 - crmd: Correctly clean up fenced nodes during membership
>>> changes
>>> + Bug cl#5154 - Do not expire failures when on-fail=block is present
>>> + Bug cl#5155 - pengine: Block the stop of resources if any depending
>>> resource is unmanaged
>>> + Bug cl#5157 - Allow migration in the absence of some colocation
>>> constraints
>>> + Bug cl#5161 - crmd: Prevent memory leak in operation cache
>>> + Bug cl#5164 - crmd: Fixes crash when using pacemaker-remote
>>> + Bug cl#5164 - pengine: Fixes segfault when calculating transition
>>> with remote-nodes.
>>> + Bug cl#5167 - crm_mon: Only print "stopped" node list for incomplete
>>> clone sets
>>> + Bug cl#5168 - Prevent clones from being bounced around the cluster
>>> due to location constraints
>>> + Bug cl#5170 - Correctly support on-fail=block for clones
>>> + cib: Correctly read back archived configurations if the primary is
>>> corrupted
>>> + cib: The result is not valid when diffs fail to apply cleanly for CLI
>>> tools
>>> + cib: Restore the ability to embed comments in the configuration
>>> + cluster: Detect and warn about node names with capitals
>>> + cman: Do not pretend we know the state of nodes we've never seen
>>> + cman: Do not unconditionally start cman if it is already running
>>> + cman: Support non-blocking CPG calls
>>> + Core: Ensure the blackbox is saved on abnormal program termination
>>> + corosync: Detect the loss of members for which we only know the
>>> nodeid
>>> + corosync: Do not pretend we know the state of nodes we've never seen
>>> + corosync: Ensure removed peers are erased from all caches
>>> + corosync: Nodes that can persist in sending CPG messages must be
>>> alive afterall
>>> + crmd: Do not get stuck in S_POLICY_ENGINE if a node we couldn't fence
>>> returns
>>> + crmd: Do not update fail-count and last-failure for old failures
>>> + crmd: Ensure all membership operations can complete while trying to
>>> cancel a transition
>>> + crmd: Ensure operations for cleaned up resources don't block recovery
>>> + crmd: Ensure we return to a stable state if there have been too many
>>> fencing failures
>>> + crmd: Initiate node shutdown if another node claims to have
>>> successfully fenced us
>>> + crmd: Prevent messages for remote crmd clients from being relayed to
>>> wrong daemons
>>> + crmd: Properly handle recurring monitor operations for remote-node
>>> agent
>>> + crmd: Store last-run and last-rc-change for all operations
>>> + crm_mon: Ensure stale pid files are updated when a new process is
>>> started
>>> + crm_report: Correctly collect logs when 'uname -n' reports fully
>>> qualified names
>>> + fencing: Fail the operation once all peers have been exhausted
>>> + fencing: Restore the ability to manually confirm that fencing
>>> completed
>>> + ipc: Allow unpriviliged clients to clean up after server failures
>>> + ipc: Restore the ability for members of the haclient group to connect
>>> to the cluster
>>> + legacy: Support "crm_node --remove" with a node name for corosync
>>> plugin (bnc#805278)
>>> + lrmd: Default to the upstream location for resource agent scratch
>>> directory
>>> + lrmd: Pass errors from lsb metadata generation back to the caller
>>> + pengine: Correctly handle resources that recover before we operate on
>>> them
>>> + pengine: Delete the old resource state on every node whenever the
>>> resource type is changed
>>> + pengine: Detect constraints with inappropriate actions (ie. promote
>>> for a clone)
>>> + pengine: Ensure per-node resource parameters are used during probes
>>> + pengine: If fencing is unavailable or disabled, block further
>>> recovery for resources that fail to stop
>>> + pengine: Implement the rest of get_timet_now() and rename to
>>> get_effective_time
>>> + pengine: Re-initiate _active_ recurring monitors that previously
>>> failed but have timed out
>>> + remote: Workaround for inconsistent tls handshake behavior between
>>> gnutls versions
>>> + systemd: Ensure we get shut down correctly by systemd
>>> + systemd: Reload systemd after adding/removing override files for
>>> cluster services
>>> + xml: Check for and replace non-printing characters with their octal
>>> equivalent while exporting xml text
>>> + xml: Prevent lockups by setting a more reliable buffer allocation
>>> strategy
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to