On 01/08/2013, at 6:53 PM, Rainer Brestan <rainer.bres...@gmx.net> wrote:
> I can also agree patch is working. > > To be sure, that it had to do with notify, i have created a clone resource > with notify=true and it happened to same way, after notify monitor was not > called again. > > With patch applied it works also for clone resources. > And from my output of the modified resource agents i can see on the > timestamps of applied calls that there is no interruption of monitor > operation. > Thu Aug 1 10:34:53 CEST 2013 resABC: operation monitor, type , operation > Thu Aug 1 10:34:59 CEST 2013 resABC: operation notify, type pre, operation > start > Thu Aug 1 10:34:59 CEST 2013 resABC: operation notify, type post, operation > start > Thu Aug 1 10:35:13 CEST 2013 resABC: operation monitor, type , operation > Monitor interval is set to 20 seconds and it is called at this intervals even > if notify is in between. > > Some hint for the check of sufficency: > On the original 1.1.10 version (without patch) i have tried some resource > configuration change on clone resource with notify=true, which result in a > "reload" call of the resource agent. > After logging reload, monitor starts again on both nodes. > Thu Aug 1 09:28:31 CEST 2013 resX: operation monitor, type , operation > Thu Aug 1 09:28:48 CEST 2013 resX: operation notify, type pre, operation start > Thu Aug 1 09:28:48 CEST 2013 resX: operation notify, type post, operation > start > Thu Aug 1 09:38:47 CEST 2013 resX: operation reload, type , operation > Thu Aug 1 09:38:47 CEST 2013 resX: operation monitor, type , operation > > Will there be a new tag (like 1.1.10-2) for version 1.1.10 with applied patch > ? No, in due course there will be 1.1.11 as happens for any other bug that is fixed after a release. > > Rainer > Gesendet: Donnerstag, 01. August 2013 um 05:56 Uhr > Von: "Takatoshi MATSUO" <matsuo....@gmail.com> > An: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org> > Betreff: Re: [Pacemaker] Announce: Pacemaker 1.1.10 now available > Hi Andrew > > This patch works fine. > > 2013/8/1 Andrew Beekhof <and...@beekhof.net>: > > > > On 01/08/2013, at 10:18 AM, Takatoshi MATSUO <matsuo....@gmail.com> wrote: > > > >> Hi Andrew > >> > >> I'm about to collect logs of crm_report, > >> but Rainer already provides it. > >> > >> Could you see his reports ? > > > > I had just written: > > > > "I can but they're insufficiently helpful." > > > > when a thought struck me.... > > > > Can you try the following patch? > > It would explain why I couldn't reproduce it locally earlier today. > > > > diff --git a/crmd/lrm.c b/crmd/lrm.c > > index d6b0dd0..4bce39a 100644 > > --- a/crmd/lrm.c > > +++ b/crmd/lrm.c > > @@ -1744,7 +1744,9 @@ do_lrm_rsc_op(lrm_state_t * lrm_state, > > lrmd_rsc_info_t * rsc, const char *operat > > CRM_CHECK(op != NULL, return); > > > > /* stop any previous monitor operations before changing the resource state > > */ > > - if (op->interval == 0 && strcmp(operation, CRMD_ACTION_STATUS) != 0) { > > + if (op->interval == 0 > > + && strcmp(operation, CRMD_ACTION_STATUS) != 0 > > + && strcmp(operation, CRMD_ACTION_NOTIFY) != 0) { > > guint removed = 0; > > struct stop_recurring_action_s data; > > > > > > > >> > >> Thanks, > >> Takatoshi MATSUO > >> > >> > >> 2013/8/1 Rainer Brestan <rainer.bres...@gmx.net>: > >>> Base situation for the logs: > >>> Pacemaker stop on int2node1 and int2node2 > >>> Master/slave resource msABC already configured. > >>> Included in the crm_report is also per node a file "a", this is the one, > >>> which the modified Stateful RA writes to log each action performed. > >>> > >>> 1.) 19:22:25 start Pacemaker on int2node1 > >>> https://www.dropbox.com/s/ftbdl71ol2iyi42/step1.log.tar.bz2 > >>> monitor on master is called > >>> > >>> 2.) 19:32:14 start Pacemaker on int2node2 > >>> https://www.dropbox.com/s/s3jnxqvod9mlyz1/step2.log.tar.bz2 > >>> monitor on master is not called any more > >>> > >>> 3.) 19:37:14 stop Pacemaker on int2node2 > >>> https://www.dropbox.com/s/w75myab6fxh7mak/step3.log.tar.bz2 > >>> monitor on master is still not called any more > >>> > >>> 4.) 19:42:14 start Pacemaker on in2node2 > >>> https://www.dropbox.com/s/p00wl9kx4vwhilh/step4.log.tar.bz2 > >>> monitor on master is called normally > >>> > >>> Hope this gives a clearer picture which component has forgotten the > >>> monitor > >>> action. > >>> > >>> Rainer > >>> Gesendet: Mittwoch, 31. Juli 2013 um 14:19 Uhr > >>> > >>> Von: "Andrew Beekhof" <and...@beekhof.net> > >>> An: "The Pacemaker cluster resource manager" > >>> <pacemaker@oss.clusterlabs.org> > >>> Betreff: Re: [Pacemaker] Announce: Pacemaker 1.1.10 now available > >>> > >>> On 31/07/2013, at 5:17 PM, Rainer Brestan <rainer.bres...@gmx.net> wrote: > >>> > >>>> Modified the RA to log each action call performed and from this log there > >>>> is no call of monitor action. > >>>> > >>>> From the logs i do not think it is the policy engine, it might be the LRM > >>>> part of crmd (the is the only relevant change be seen after git diff > >>>> between > >>>> 1.1.10-rc7 and 1.1.10). > >>> > >>> Ok. Can you still send me a crm_report though? > >>> Even if the PE isn't at fault, it shows me what the cib looked like at the > >>> time which can be surprisingly helpful. > >>> And it would have all the logs... > >>> > >>>> > >>>> Explanation of the below log: > >>>> primitive resABC ocf:heartbeat:Stateful \ > >>>> op start interval="0s" timeout="60s" on-fail="restart" \ > >>>> op monitor interval="30s" timeout="60s" on-fail="restart" \ > >>>> op promote interval="0s" timeout="60s" on-fail="restart" \ > >>>> op demote interval="0" timeout="60s" on-fail="restart" \ > >>>> op stop interval="0" timeout="60s" on-fail="restart" \ > >>>> op monitor interval="20" role="Master" timeout="60" > >>>> ms msABC resABC \ > >>>> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" > >>>> notify="true" > >>>> crm_mon at begin of log: > >>>> Last updated: Wed Jul 31 08:30:57 2013 > >>>> Last change: Tue Jul 30 13:01:36 2013 via crmd on int2node1 > >>>> Stack: corosync > >>>> Current DC: int2node1 (1743917066) - partition with quorum > >>>> Version: 1.1.10-1.el6-368c726 > >>>> 2 Nodes configured > >>>> 5 Resources configured > >>>> Online: [ int2node1 int2node2 ] > >>>> Master/Slave Set: msABC [resABC] > >>>> Masters: [ int2node1 ] > >>>> Slaves: [ int2node2 ] > >>>> crm_mon at end of log: > >>>> Last updated: Wed Jul 31 08:55:29 2013 > >>>> Last change: Tue Jul 30 13:01:36 2013 via crmd on int2node1 > >>>> Stack: corosync > >>>> Current DC: int2node1 (1743917066) - partition with quorum > >>>> Version: 1.1.10-1.el6-368c726 > >>>> 2 Nodes configured > >>>> 5 Resources configured > >>>> Online: [ int2node1 ] > >>>> OFFLINE: [ int2node2 ] > >>>> Master/Slave Set: msABC [resABC] > >>>> Masters: [ int2node1 ] > >>>> > >>>> int2node1 is running, int2node2 is started > >>>> 2013-07-31T08:30:52.631+02:00 int2node1 pengine[16443] notice: notice: > >>>> LogActions: Start resABC:1 (int2node2) > >>>> 2013-07-31T08:30:52.638+02:00 int2node1 crmd[16444] notice: notice: > >>>> te_rsc_command: Initiating action 9: monitor resABC:1_monitor_0 on > >>>> int2node2 > >>>> 2013-07-31T08:30:52.638+02:00 int2node1 crmd[16444] notice: notice: > >>>> te_rsc_command: Initiating action 54: notify resABC_pre_notify_start_0 on > >>>> int2node1 (local) > >>>> 2013-07-31T08:30:52.681+02:00 int2node1 crmd[16444] notice: notice: > >>>> process_lrm_event: LRM operation resABC_notify_0 (call=64, rc=0, > >>>> cib-update=0, confirmed=true) ok > >>>> 2013-07-31T08:30:52.780+02:00 int2node1 crmd[16444] notice: notice: > >>>> te_rsc_command: Initiating action 25: start resABC:1_start_0 on int2node2 > >>>> 2013-07-31T08:30:52.940+02:00 int2node1 crmd[16444] notice: notice: > >>>> te_rsc_command: Initiating action 55: notify resABC_post_notify_start_0 > >>>> on > >>>> int2node1 (local) > >>>> 2013-07-31T08:30:52.943+02:00 int2node1 crmd[16444] notice: notice: > >>>> te_rsc_command: Initiating action 56: notify > >>>> resABC:1_post_notify_start_0 on > >>>> int2node2 > >>>> 2013-07-31T08:30:52.982+02:00 int2node1 crmd[16444] notice: notice: > >>>> process_lrm_event: LRM operation resABC_notify_0 (call=67, rc=0, > >>>> cib-update=0, confirmed=true) ok > >>>> 2013-07-31T08:30:52.992+02:00 int2node1 crmd[16444] notice: notice: > >>>> te_rsc_command: Initiating action 24: monitor resABC_monitor_20000 on > >>>> int2node1 (local) > >>>> 2013-07-31T08:30:52.996+02:00 int2node1 crmd[16444] notice: notice: > >>>> te_rsc_command: Initiating action 26: monitor resABC:1_monitor_30000 on > >>>> int2node2 > >>>> 2013-07-31T08:30:53.035+02:00 int2node1 crmd[16444] notice: notice: > >>>> process_lrm_event: LRM operation resABC_monitor_20000 (call=70, rc=8, > >>>> cib-update=149, confirmed=false) master > >>>> > >>>> At this point int2node2 is stopped. > >>>> 2013-07-31T08:37:51.457+02:00 int2node1 crmd[16444] notice: notice: > >>>> do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ > >>>> input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] > >>>> 2013-07-31T08:37:51.462+02:00 int2node1 pengine[16443] notice: notice: > >>>> unpack_config: On loss of CCM Quorum: Ignore > >>>> 2013-07-31T08:37:51.465+02:00 int2node1 pengine[16443] notice: notice: > >>>> stage6: Scheduling Node int2node2 for shutdown > >>>> 2013-07-31T08:37:51.466+02:00 int2node1 pengine[16443] notice: notice: > >>>> LogActions: Stop resABC:1 (int2node2) > >>>> 2013-07-31T08:37:51.469+02:00 int2node1 pengine[16443] notice: notice: > >>>> process_pe_message: Calculated Transition 86: > >>>> /var/lib/pacemaker/pengine/pe-input-125.bz2 > >>>> 2013-07-31T08:37:51.471+02:00 int2node1 crmd[16444] notice: notice: > >>>> te_rsc_command: Initiating action 56: notify resABC_pre_notify_stop_0 on > >>>> int2node1 (local) > >>>> 2013-07-31T08:37:51.474+02:00 int2node1 crmd[16444] notice: notice: > >>>> te_rsc_command: Initiating action 58: notify resABC_pre_notify_stop_0 on > >>>> int2node2 > >>>> 2013-07-31T08:37:51.512+02:00 int2node1 crmd[16444] notice: notice: > >>>> process_lrm_event: LRM operation resABC_notify_0 (call=74, rc=0, > >>>> cib-update=0, confirmed=true) ok > >>>> 2013-07-31T08:37:51.514+02:00 int2node1 crmd[16444] notice: notice: > >>>> te_rsc_command: Initiating action 23: stop resABC_stop_0 on int2node2 > >>>> 2013-07-31T08:37:51.654+02:00 int2node1 crmd[16444] notice: notice: > >>>> te_rsc_command: Initiating action 57: notify resABC_post_notify_stop_0 on > >>>> int2node1 (local) > >>>> 2013-07-31T08:37:51.699+02:00 int2node1 crmd[16444] notice: notice: > >>>> process_lrm_event: LRM operation resABC_notify_0 (call=78, rc=0, > >>>> cib-update=0, confirmed=true) ok > >>>> 2013-07-31T08:37:51.699+02:00 int2node1 crmd[16444] notice: notice: > >>>> run_graph: Transition 86 (Complete=13, Pending=0, Fired=0, Skipped=2, > >>>> Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-125.bz2): > >>>> Stopped > >>>> 2013-07-31T08:37:51.705+02:00 int2node1 pengine[16443] notice: notice: > >>>> unpack_config: On loss of CCM Quorum: Ignore > >>>> 2013-07-31T08:37:51.705+02:00 int2node1 pengine[16443] notice: notice: > >>>> stage6: Scheduling Node int2node2 for shutdown > >>>> 2013-07-31T08:37:51.706+02:00 int2node1 pengine[16443] notice: notice: > >>>> process_pe_message: Calculated Transition 87: > >>>> /var/lib/pacemaker/pengine/pe-input-126.bz2 > >>>> 2013-07-31T08:37:51.707+02:00 int2node1 crmd[16444] notice: notice: > >>>> run_graph: Transition 87 (Complete=1, Pending=0, Fired=0, Skipped=0, > >>>> Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-126.bz2): > >>>> Complete > >>>> 2013-07-31T08:37:51.707+02:00 int2node1 crmd[16444] notice: notice: > >>>> do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ > >>>> input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] > >>>> 2013-07-31T08:37:51.720+02:00 int2node1 crmd[16444] notice: notice: > >>>> peer_update_callback: do_shutdown of int2node2 (op 45) is complete > >>>> > >>>> Output from RA on int2node1: > >>>> Wed Jul 31 08:30:52 CEST 2013 resABC: operation notify, type pre, > >>>> operation start > >>>> Wed Jul 31 08:30:52 CEST 2013 resABC: operation notify, type post, > >>>> operation start > >>>> Wed Jul 31 08:30:53 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:31:13 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:31:33 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:31:53 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:32:13 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:32:33 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:32:53 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:33:13 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:33:33 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:33:53 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:34:13 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:34:33 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:34:53 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:35:13 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:35:33 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:35:53 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:36:13 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:36:33 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:36:53 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:37:13 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:37:33 CEST 2013 resABC: operation monitor, type , operation > >>>> Wed Jul 31 08:37:51 CEST 2013 resABC: operation notify, type pre, > >>>> operation stop > >>>> Wed Jul 31 08:37:51 CEST 2013 resABC: operation notify, type post, > >>>> operation stop > >>>> > >>>> After 08:37:51 no log output from Pacemaker for resABC, nor any output > >>>> from RA on int2node1. > >>>> > >>>> Gesendet: Mittwoch, 31. Juli 2013 um 02:10 Uhr > >>>> Von: "Andrew Beekhof" <and...@beekhof.net> > >>>> An: "The Pacemaker cluster resource manager" > >>>> <pacemaker@oss.clusterlabs.org> > >>>> Betreff: Re: [Pacemaker] Announce: Pacemaker 1.1.10 now available > >>>> > >>>> On 30/07/2013, at 9:13 PM, Rainer Brestan <rainer.bres...@gmx.net> wrote: > >>>> > >>>>> I can agree, Master monitor operation is broken in 1.1.10 release. > >>>>> When the slave monitor action is started, the master monitor action is > >>>>> not called any more. > >>>> > >>>> Based on? > >>>> > >>>>> > >>>>> I have created a setup with Stateful resource with two nodes. > >>>>> Then the Pacemaker installation is changed to different versions without > >>>>> changing the configuration part of the CIB. > >>>>> > >>>>> Result: > >>>>> 1.1.10-rc5, 1.1.10-rc6 and 1.1.10-rc7 does not have this error > >>>>> 1.1.10-1 release has the error > >>>>> > >>>>> Installation order (just that anybody know how it was done): > >>>>> 1.1.10-1 -> error > >>>>> 1.1.10-rc5 -> no error > >>>>> 1.1.10-rc6 -> no error > >>>>> 1.1.10-rc7 -> no error > >>>>> 1.1.10-1 -> error > >>>>> > >>>>> Rainer > >>>>> Gesendet: Freitag, 26. Juli 2013 um 09:32 Uhr > >>>>> Von: "Takatoshi MATSUO" <matsuo....@gmail.com> > >>>>> An: "The Pacemaker cluster resource manager" > >>>>> <pacemaker@oss.clusterlabs.org> > >>>>> Betreff: Re: [Pacemaker] Announce: Pacemaker 1.1.10 now available > >>>>> Hi > >>>>> > >>>>> I used Stateful RA and caught a same issue. > >>>>> > >>>>> 1. before starting slave > >>>>> > >>>>> # crm_simulate -VVV -S -x /var/lib/pacemaker/pengine/pe-input-1543.bz2 > >>>>> | grep "Resource action" > >>>>> * Resource action: stateful monitor=2000 on 16-sl6 > >>>>> > >>>>> 2. starting slave > >>>>> # crm_simulate -VVV -S -x /var/lib/pacemaker/pengine/pe-input-1544.bz2 > >>>>> | grep "Resource action" > >>>>> * Resource action: stateful monitor on 17-sl6 > >>>>> * Resource action: stateful notify on 16-sl6 > >>>>> * Resource action: stateful start on 17-sl6 > >>>>> * Resource action: stateful notify on 16-sl6 > >>>>> * Resource action: stateful notify on 17-sl6 > >>>>> * Resource action: stateful monitor=3000 on 17-sl6 > >>>>> > >>>>> 3. after > >>>>> # crm_simulate -VVV -S -x /var/lib/pacemaker/pengine/pe-input-1545.bz2 > >>>>> | grep "Resource action" > >>>>> * Resource action: stateful monitor=3000 on 17-sl6 > >>>>> > >>>>> Monitor=2000 is deleted. > >>>>> Is this correct ? > >>>>> > >>>>> > >>>>> My setting > >>>>> -------- > >>>>> property \ > >>>>> no-quorum-policy="ignore" \ > >>>>> stonith-enabled="false" > >>>>> > >>>>> rsc_defaults \ > >>>>> resource-stickiness="INFINITY" \ > >>>>> migration-threshold="1" > >>>>> > >>>>> ms msStateful stateful \ > >>>>> meta \ > >>>>> master-max="1" \ > >>>>> master-node-max="1" \ > >>>>> clone-max="2" \ > >>>>> clone-node-max="1" \ > >>>>> notify="true" > >>>>> > >>>>> primitive stateful ocf:heartbeat:Stateful \ > >>>>> op start timeout="60s" interval="0s" on-fail="restart" \ > >>>>> op monitor timeout="60s" interval="3s" on-fail="restart" \ > >>>>> op monitor timeout="60s" interval="2s" on-fail="restart" role="Master" \ > >>>>> op promote timeout="60s" interval="0s" on-fail="restart" \ > >>>>> op demote timeout="60s" interval="0s" on-fail="stop" \ > >>>>> op stop timeout="60s" interval="0s" on-fail="block" > >>>>> -------- > >>>>> > >>>>> Regards, > >>>>> Takatoshi MATSUO > >>>>> > >>>>> 2013/7/26 Takatoshi MATSUO <matsuo....@gmail.com>: > >>>>>> Hi > >>>>>> > >>>>>> My report is late for 1.1.10 :( > >>>>>> > >>>>>> I am using pacemaker 1.1.10-0.1.ab2e209.git. > >>>>>> It seems that master's monitor is stopped when slave is started. > >>>>>> > >>>>>> Does someone encounter same problem ? > >>>>>> I attach a log and settings. > >>>>>> > >>>>>> > >>>>>> Thanks, > >>>>>> Takatoshi MATSUO > >>>>>> > >>>>>> 2013/7/26 Digimer <li...@alteeve.ca>: > >>>>>>> Congrats!! I know this was a long time in the making. > >>>>>>> > >>>>>>> digimer > >>>>>>> > >>>>>>> > >>>>>>> On 25/07/13 20:43, Andrew Beekhof wrote: > >>>>>>>> > >>>>>>>> Announcing the release of Pacemaker 1.1.10 > >>>>>>>> > >>>>>>>> https://github.com/ClusterLabs/pacemaker/releases/Pacemaker-1.1.10 > >>>>>>>> > >>>>>>>> There were three changes of note since rc7: > >>>>>>>> > >>>>>>>> + Bug cl#5161 - crmd: Prevent memory leak in operation cache > >>>>>>>> + cib: Correctly read back archived configurations if the primary is > >>>>>>>> corrupted > >>>>>>>> + cman: Do not pretend we know the state of nodes we've never seen > >>>>>>>> > >>>>>>>> Along with assorted bug fixes, the major topics for this release > >>>>>>>> were: > >>>>>>>> > >>>>>>>> - stonithd fixes > >>>>>>>> - fixing memory leaks, often caused by incorrect use of glib > >>>>>>>> reference > >>>>>>>> counting > >>>>>>>> - supportability improvements (code cleanup and deduplication, > >>>>>>>> standardized error codes) > >>>>>>>> > >>>>>>>> Release candidates for the next Pacemaker release (1.1.11) can be > >>>>>>>> expected some time around Novemeber. > >>>>>>>> > >>>>>>>> A big thankyou to everyone that spent time testing the release > >>>>>>>> candidates and/or contributed patches. However now that Pacemaker is > >>>>>>>> perfect, anyone reporting bugs will be shot :-) > >>>>>>>> > >>>>>>>> To build `rpm` packages: > >>>>>>>> > >>>>>>>> 1. Clone the current sources: > >>>>>>>> > >>>>>>>> # git clone --depth 0 git://github.com/ClusterLabs/pacemaker.git > >>>>>>>> # cd pacemaker > >>>>>>>> > >>>>>>>> 1. Install dependancies (if you haven't already) > >>>>>>>> > >>>>>>>> [Fedora] # sudo yum install -y yum-utils > >>>>>>>> [ALL] # make rpm-dep > >>>>>>>> > >>>>>>>> 1. Build Pacemaker > >>>>>>>> > >>>>>>>> # make release > >>>>>>>> > >>>>>>>> 1. Copy and deploy as needed > >>>>>>>> > >>>>>>>> ## Details - 1.1.10 - final > >>>>>>>> > >>>>>>>> Changesets: 602 > >>>>>>>> Diff: 143 files changed, 8162 insertions(+), 5159 deletions(-) > >>>>>>>> > >>>>>>>> ## Highlights > >>>>>>>> > >>>>>>>> ### Features added since Pacemaker-1.1.9 > >>>>>>>> > >>>>>>>> + Core: Convert all exit codes to positive errno values > >>>>>>>> + crm_error: Add the ability to list and print error symbols > >>>>>>>> + crm_resource: Allow individual resources to be reprobed > >>>>>>>> + crm_resource: Allow options to be set recursively > >>>>>>>> + crm_resource: Implement --ban for moving resources away from nodes > >>>>>>>> and --clear (replaces --unmove) > >>>>>>>> + crm_resource: Support OCF tracing when using > >>>>>>>> --force-(check|start|stop) > >>>>>>>> + PE: Allow active nodes in our current membership to be fenced > >>>>>>>> without > >>>>>>>> quorum > >>>>>>>> + PE: Suppress meaningless IDs when displaying anonymous clone > >>>>>>>> status > >>>>>>>> + Turn off auto-respawning of systemd services when the cluster > >>>>>>>> starts > >>>>>>>> them > >>>>>>>> + Bug cl#5128 - pengine: Support maintenance mode for a single node > >>>>>>>> > >>>>>>>> ### Changes since Pacemaker-1.1.9 > >>>>>>>> > >>>>>>>> + crmd: cib: stonithd: Memory leaks resolved and improved use of > >>>>>>>> glib > >>>>>>>> reference counting > >>>>>>>> + attrd: Fixes deleted attributes during dc election > >>>>>>>> + Bug cf#5153 - Correctly display clone failcounts in crm_mon > >>>>>>>> + Bug cl#5133 - pengine: Correctly observe on-fail=block for failed > >>>>>>>> demote operation > >>>>>>>> + Bug cl#5148 - legacy: Correctly remove a node that used to have a > >>>>>>>> different nodeid > >>>>>>>> + Bug cl#5151 - Ensure node names are consistently compared without > >>>>>>>> case > >>>>>>>> + Bug cl#5152 - crmd: Correctly clean up fenced nodes during > >>>>>>>> membership > >>>>>>>> changes > >>>>>>>> + Bug cl#5154 - Do not expire failures when on-fail=block is present > >>>>>>>> + Bug cl#5155 - pengine: Block the stop of resources if any > >>>>>>>> depending > >>>>>>>> resource is unmanaged > >>>>>>>> + Bug cl#5157 - Allow migration in the absence of some colocation > >>>>>>>> constraints > >>>>>>>> + Bug cl#5161 - crmd: Prevent memory leak in operation cache > >>>>>>>> + Bug cl#5164 - crmd: Fixes crash when using pacemaker-remote > >>>>>>>> + Bug cl#5164 - pengine: Fixes segfault when calculating transition > >>>>>>>> with remote-nodes. > >>>>>>>> + Bug cl#5167 - crm_mon: Only print "stopped" node list for > >>>>>>>> incomplete > >>>>>>>> clone sets > >>>>>>>> + Bug cl#5168 - Prevent clones from being bounced around the cluster > >>>>>>>> due to location constraints > >>>>>>>> + Bug cl#5170 - Correctly support on-fail=block for clones > >>>>>>>> + cib: Correctly read back archived configurations if the primary is > >>>>>>>> corrupted > >>>>>>>> + cib: The result is not valid when diffs fail to apply cleanly for > >>>>>>>> CLI > >>>>>>>> tools > >>>>>>>> + cib: Restore the ability to embed comments in the configuration > >>>>>>>> + cluster: Detect and warn about node names with capitals > >>>>>>>> + cman: Do not pretend we know the state of nodes we've never seen > >>>>>>>> + cman: Do not unconditionally start cman if it is already running > >>>>>>>> + cman: Support non-blocking CPG calls > >>>>>>>> + Core: Ensure the blackbox is saved on abnormal program termination > >>>>>>>> + corosync: Detect the loss of members for which we only know the > >>>>>>>> nodeid > >>>>>>>> + corosync: Do not pretend we know the state of nodes we've never > >>>>>>>> seen > >>>>>>>> + corosync: Ensure removed peers are erased from all caches > >>>>>>>> + corosync: Nodes that can persist in sending CPG messages must be > >>>>>>>> alive afterall > >>>>>>>> + crmd: Do not get stuck in S_POLICY_ENGINE if a node we couldn't > >>>>>>>> fence > >>>>>>>> returns > >>>>>>>> + crmd: Do not update fail-count and last-failure for old failures > >>>>>>>> + crmd: Ensure all membership operations can complete while trying > >>>>>>>> to > >>>>>>>> cancel a transition > >>>>>>>> + crmd: Ensure operations for cleaned up resources don't block > >>>>>>>> recovery > >>>>>>>> + crmd: Ensure we return to a stable state if there have been too > >>>>>>>> many > >>>>>>>> fencing failures > >>>>>>>> + crmd: Initiate node shutdown if another node claims to have > >>>>>>>> successfully fenced us > >>>>>>>> + crmd: Prevent messages for remote crmd clients from being relayed > >>>>>>>> to > >>>>>>>> wrong daemons > >>>>>>>> + crmd: Properly handle recurring monitor operations for remote-node > >>>>>>>> agent > >>>>>>>> + crmd: Store last-run and last-rc-change for all operations > >>>>>>>> + crm_mon: Ensure stale pid files are updated when a new process is > >>>>>>>> started > >>>>>>>> + crm_report: Correctly collect logs when 'uname -n' reports fully > >>>>>>>> qualified names > >>>>>>>> + fencing: Fail the operation once all peers have been exhausted > >>>>>>>> + fencing: Restore the ability to manually confirm that fencing > >>>>>>>> completed > >>>>>>>> + ipc: Allow unpriviliged clients to clean up after server failures > >>>>>>>> + ipc: Restore the ability for members of the haclient group to > >>>>>>>> connect > >>>>>>>> to the cluster > >>>>>>>> + legacy: Support "crm_node --remove" with a node name for corosync > >>>>>>>> plugin (bnc#805278) > >>>>>>>> + lrmd: Default to the upstream location for resource agent scratch > >>>>>>>> directory > >>>>>>>> + lrmd: Pass errors from lsb metadata generation back to the caller > >>>>>>>> + pengine: Correctly handle resources that recover before we operate > >>>>>>>> on > >>>>>>>> them > >>>>>>>> + pengine: Delete the old resource state on every node whenever the > >>>>>>>> resource type is changed > >>>>>>>> + pengine: Detect constraints with inappropriate actions (ie. > >>>>>>>> promote > >>>>>>>> for a clone) > >>>>>>>> + pengine: Ensure per-node resource parameters are used during > >>>>>>>> probes > >>>>>>>> + pengine: If fencing is unavailable or disabled, block further > >>>>>>>> recovery for resources that fail to stop > >>>>>>>> + pengine: Implement the rest of get_timet_now() and rename to > >>>>>>>> get_effective_time > >>>>>>>> + pengine: Re-initiate _active_ recurring monitors that previously > >>>>>>>> failed but have timed out > >>>>>>>> + remote: Workaround for inconsistent tls handshake behavior between > >>>>>>>> gnutls versions > >>>>>>>> + systemd: Ensure we get shut down correctly by systemd > >>>>>>>> + systemd: Reload systemd after adding/removing override files for > >>>>>>>> cluster services > >>>>>>>> + xml: Check for and replace non-printing characters with their > >>>>>>>> octal > >>>>>>>> equivalent while exporting xml text > >>>>>>>> + xml: Prevent lockups by setting a more reliable buffer allocation > >>>>>>>> strategy > >>>>>>>> > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>>>>>>> > >>>>>>>> Project Home: http://www.clusterlabs.org > >>>>>>>> Getting started: > >>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>>>>>>> Bugs: http://bugs.clusterlabs.org > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Digimer > >>>>>>> Papers and Projects: https://alteeve.ca/w/ > >>>>>>> What if the cure for cancer is trapped in the mind of a person > >>>>>>> without > >>>>>>> access to education? > >>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>>>>>> > >>>>>>> Project Home: http://www.clusterlabs.org > >>>>>>> Getting started: > >>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>>>>>> Bugs: http://bugs.clusterlabs.org > >>>>> > >>>>> _______________________________________________ > >>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>>>> > >>>>> Project Home: http://www.clusterlabs.org > >>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>>>> Bugs: http://bugs.clusterlabs.org > >>>>> _______________________________________________ > >>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>>>> > >>>>> Project Home: http://www.clusterlabs.org > >>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>>>> Bugs: http://bugs.clusterlabs.org > >>>> > >>>> > >>>> _______________________________________________ > >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>>> > >>>> Project Home: http://www.clusterlabs.org > >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>>> Bugs: http://bugs.clusterlabs.org > >>>> _______________________________________________ > >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>>> > >>>> Project Home: http://www.clusterlabs.org > >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>>> Bugs: http://bugs.clusterlabs.org > >>> > >>> > >>> _______________________________________________ > >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> > >>> Project Home: http://www.clusterlabs.org > >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> Bugs: http://bugs.clusterlabs.org > >>> > >>> _______________________________________________ > >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> > >>> Project Home: http://www.clusterlabs.org > >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> Bugs: http://bugs.clusterlabs.org > >>> > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org