Re: [ClusterLabs] Pacemaker reload Master/Slave resource

Ken Gaillot Mon, 06 Jun 2016 14:33:06 -0700

On 05/20/2016 06:20 AM, Felix Zachlod (Lists) wrote:
> version 1.1.13-10.el7_2.2-44eb2dd
> 
> Hello!
> 
> I am currently developing a master/slave resource agent. So far it is working 
> just fine, but this resource agent implements reload() and this does not work 
> as expected when running as Master:
> The reload action is invoked and it succeeds returning 0. The resource is 
> still Master and monitor will return $OCF_RUNNING_MASTER.
> 
> But Pacemaker considers the instance being slave afterwards. Actually only 
> reload is invoked, no monitor, no demote etc.
> 
> I first thought that reload should possibly return $OCF_RUNNING_MASTER too 
> but this leads to the resource failing on reload. It seems 0 is the only 
> valid return code.
> 
> I can recover the cluster state running resource $resourcename promote, which 
> will call
> 
> notify
> promote
> notify
> 
> Afterwards my resource is considered Master again. After  PEngine Recheck 
> Timer (I_PE_CALC) just popped (900000ms), the cluster manager will promote 
> the resource itself.
> But this can lead to unexpected results, it could promote the resource on the 
> wrong node so that both sides are actually running master, the cluster will 
> not even notice it does not call monitor either.
> 
> Is this a bug?
> 
> regards, Felix


I think it depends on your point of view :)

Reload is implemented as an alternative to stop-then-start. For m/s
clones, start leaves the resource in slave state.

So on the one hand, it makes sense that Pacemaker would expect a m/s
reload to end up in slave state, regardless of the initial state, since
it should be equivalent to stop-then-start.

On the other hand, you could argue that a reload for a master should
logically be an alternative to demote-stop-start-promote.

On the third hand ;) you could argue that reload is ambiguous for master
resources and thus shouldn't be supported at all.

Feel free to open a feature request at http://bugs.clusterlabs.org/ to
say how you think it should work.

As an aside, I think the current implementation of reload in pacemaker
is unsatisfactory for two reasons:

* Using the "unique" attribute to determine whether a parameter is
reloadable was a bad idea. For example, the location of a daemon binary
is generally set to unique=0, which is sensible in that multiple RA
instances can use the same binary, but a reload could not handle that
change. It is not a problem only because no one ever changes that.

* There is a fundamental misunderstanding between pacemaker and most RA
developers as to what reload means. Pacemaker uses the reload action to
make parameter changes in the resource's *pacemaker* configuration take
effect, but RA developers tend to use it to reload the service's own
configuration files (a more natural interpretation, but completely
different from how pacemaker uses it).

> trace   May 20 12:58:31 cib_create_op(609):0: Sending call options: 00100000, 
> 1048576
> trace   May 20 12:58:31 cib_native_perform_op_delegate(384):0: Sending 
> cib_modify message to CIB service (timeout=120s)
> trace   May 20 12:58:31 crm_ipc_send(1175):0: Sending from client: cib_shm 
> request id: 745 bytes: 1070 timeout:120000 msg...
> trace   May 20 12:58:31 crm_ipc_send(1188):0: Message sent, not waiting for 
> reply to 745 from cib_shm to 1070 bytes...
> trace   May 20 12:58:31 cib_native_perform_op_delegate(395):0: Reply: No data 
> to dump as XML
> trace   May 20 12:58:31 cib_native_perform_op_delegate(398):0: Async call, 
> returning 268
> trace   May 20 12:58:31 do_update_resource(2274):0: Sent resource state 
> update message: 268 for reload=0 on scst_dg_ssd
> trace   May 20 12:58:31 cib_client_register_callback_full(606):0: Adding 
> callback cib_rsc_callback for call 268
> trace   May 20 12:58:31 process_lrm_event(2374):0: Op scst_dg_ssd_reload_0 
> (call=449, stop-id=scst_dg_ssd:449, remaining=3): Confirmed
> notice  May 20 12:58:31 process_lrm_event(2392):0: Operation 
> scst_dg_ssd_reload_0: ok (node=alpha, call=449, rc=0, cib-update=268, 
> confirmed=true)
> debug   May 20 12:58:31 update_history_cache(196):0: Updating history for 
> 'scst_dg_ssd' with reload op
> trace   May 20 12:58:31 crm_ipc_read(992):0: No message from lrmd received: 
> Resource temporarily unavailable
> trace   May 20 12:58:31 mainloop_gio_callback(654):0: Message acquisition 
> from lrmd[0x22b0ec0] failed: No message of desired type (-42)
> trace   May 20 12:58:31 crm_fsa_trigger(293):0: Invoked (queue len: 0)
> trace   May 20 12:58:31 s_crmd_fsa(159):0: FSA invoked with Cause: 
> C_FSA_INTERNAL       State: S_NOT_DC
> trace   May 20 12:58:31 s_crmd_fsa(246):0: Exiting the FSA
> trace   May 20 12:58:31 crm_fsa_trigger(295):0: Exited  (queue len: 0)
> trace   May 20 12:58:31 crm_ipc_read(989):0: Received cib_shm event 2108, 
> size=183, rc=183, text: <cib-reply t="cib" cib_op="cib_modify" 
> cib_callid="268" cib_clientid="60010689-7350-4916-a7bd-bd85ff
> trace   May 20 12:58:31 mainloop_gio_callback(659):0: New message from 
> cib_shm[0x23b7ab0] = 143
> trace   May 20 12:58:31 cib_native_dispatch_internal(100):0: dispatching 
> 0x22b2370
> trace   May 20 12:58:31 cib_native_dispatch_internal(116):0: Activating cib 
> callbacks...
> trace   May 20 12:58:31 cib_native_callback(649):0: Invoking callback 
> cib_rsc_callback for call 268
> trace   May 20 12:58:31 cib_rsc_callback(2113):0: Resource update 268 
> complete: rc=0
> trace   May 20 12:58:31 cib_rsc_callback(2121):0: Triggering FSA: 
> cib_rsc_callback
> trace   May 20 12:58:31 cib_native_callback(666):0: OP callback activated for 
> 268
> trace   May 20 12:58:31 crm_ipc_read(992):0: No message from cib_shm 
> received: Resource temporarily unavailable
> trace   May 20 12:58:31 mainloop_gio_callback(654):0: Message acquisition 
> from cib_shm[0x23b7ab0] failed: No message of desired type (-42)
> trace   May 20 12:58:31 crm_fsa_trigger(293):0: Invoked (queue len: 0)
> trace   May 20 12:58:31 s_crmd_fsa(159):0: FSA invoked with Cause: 
> C_FSA_INTERNAL       State: S_NOT_DC
> trace   May 20 12:58:31 s_crmd_fsa(246):0: Exiting the FSA
> trace   May 20 12:58:31 crm_fsa_trigger(295):0: Exited  (queue len: 0)
> notice  May 20 12:58:43 crm_signal_dispatch(272):0: Invoking handler for 
> signal 5: Trace/breakpoint trap
> notice  May 20 12:58:43 crm_write_blackbox(431):0: Blackbox dump requested, 
> please see /var/lib/pacemaker/blackbox/crmd-2877.2 for contents
> 
> --
> Mit freundlichen Grüßen
> Dipl. Inf. (FH) Felix Zachlod
> 
> Onesty Tech GmbH
> Lieberoser Str. 7
> 03046 Cottbus
> 
> Tel.: +49 (355) 289430
> Fax.: +49 (355) 28943100
> f...@onesty-tech.de
> 
> Registergericht Amtsgericht Cottbus, HRB 7885 Geschäftsführer Romy Schötz, 
> Thomas Menzel

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker reload Master/Slave resource

Reply via email to