On 05/20/2016 06:20 AM, Felix Zachlod (Lists) wrote: > version 1.1.13-10.el7_2.2-44eb2dd > > Hello! > > I am currently developing a master/slave resource agent. So far it is working > just fine, but this resource agent implements reload() and this does not work > as expected when running as Master: > The reload action is invoked and it succeeds returning 0. The resource is > still Master and monitor will return $OCF_RUNNING_MASTER. > > But Pacemaker considers the instance being slave afterwards. Actually only > reload is invoked, no monitor, no demote etc. > > I first thought that reload should possibly return $OCF_RUNNING_MASTER too > but this leads to the resource failing on reload. It seems 0 is the only > valid return code. > > I can recover the cluster state running resource $resourcename promote, which > will call > > notify > promote > notify > > Afterwards my resource is considered Master again. After PEngine Recheck > Timer (I_PE_CALC) just popped (900000ms), the cluster manager will promote > the resource itself. > But this can lead to unexpected results, it could promote the resource on the > wrong node so that both sides are actually running master, the cluster will > not even notice it does not call monitor either. > > Is this a bug? > > regards, Felix
I think it depends on your point of view :) Reload is implemented as an alternative to stop-then-start. For m/s clones, start leaves the resource in slave state. So on the one hand, it makes sense that Pacemaker would expect a m/s reload to end up in slave state, regardless of the initial state, since it should be equivalent to stop-then-start. On the other hand, you could argue that a reload for a master should logically be an alternative to demote-stop-start-promote. On the third hand ;) you could argue that reload is ambiguous for master resources and thus shouldn't be supported at all. Feel free to open a feature request at http://bugs.clusterlabs.org/ to say how you think it should work. As an aside, I think the current implementation of reload in pacemaker is unsatisfactory for two reasons: * Using the "unique" attribute to determine whether a parameter is reloadable was a bad idea. For example, the location of a daemon binary is generally set to unique=0, which is sensible in that multiple RA instances can use the same binary, but a reload could not handle that change. It is not a problem only because no one ever changes that. * There is a fundamental misunderstanding between pacemaker and most RA developers as to what reload means. Pacemaker uses the reload action to make parameter changes in the resource's *pacemaker* configuration take effect, but RA developers tend to use it to reload the service's own configuration files (a more natural interpretation, but completely different from how pacemaker uses it). > trace May 20 12:58:31 cib_create_op(609):0: Sending call options: 00100000, > 1048576 > trace May 20 12:58:31 cib_native_perform_op_delegate(384):0: Sending > cib_modify message to CIB service (timeout=120s) > trace May 20 12:58:31 crm_ipc_send(1175):0: Sending from client: cib_shm > request id: 745 bytes: 1070 timeout:120000 msg... > trace May 20 12:58:31 crm_ipc_send(1188):0: Message sent, not waiting for > reply to 745 from cib_shm to 1070 bytes... > trace May 20 12:58:31 cib_native_perform_op_delegate(395):0: Reply: No data > to dump as XML > trace May 20 12:58:31 cib_native_perform_op_delegate(398):0: Async call, > returning 268 > trace May 20 12:58:31 do_update_resource(2274):0: Sent resource state > update message: 268 for reload=0 on scst_dg_ssd > trace May 20 12:58:31 cib_client_register_callback_full(606):0: Adding > callback cib_rsc_callback for call 268 > trace May 20 12:58:31 process_lrm_event(2374):0: Op scst_dg_ssd_reload_0 > (call=449, stop-id=scst_dg_ssd:449, remaining=3): Confirmed > notice May 20 12:58:31 process_lrm_event(2392):0: Operation > scst_dg_ssd_reload_0: ok (node=alpha, call=449, rc=0, cib-update=268, > confirmed=true) > debug May 20 12:58:31 update_history_cache(196):0: Updating history for > 'scst_dg_ssd' with reload op > trace May 20 12:58:31 crm_ipc_read(992):0: No message from lrmd received: > Resource temporarily unavailable > trace May 20 12:58:31 mainloop_gio_callback(654):0: Message acquisition > from lrmd[0x22b0ec0] failed: No message of desired type (-42) > trace May 20 12:58:31 crm_fsa_trigger(293):0: Invoked (queue len: 0) > trace May 20 12:58:31 s_crmd_fsa(159):0: FSA invoked with Cause: > C_FSA_INTERNAL State: S_NOT_DC > trace May 20 12:58:31 s_crmd_fsa(246):0: Exiting the FSA > trace May 20 12:58:31 crm_fsa_trigger(295):0: Exited (queue len: 0) > trace May 20 12:58:31 crm_ipc_read(989):0: Received cib_shm event 2108, > size=183, rc=183, text: <cib-reply t="cib" cib_op="cib_modify" > cib_callid="268" cib_clientid="60010689-7350-4916-a7bd-bd85ff > trace May 20 12:58:31 mainloop_gio_callback(659):0: New message from > cib_shm[0x23b7ab0] = 143 > trace May 20 12:58:31 cib_native_dispatch_internal(100):0: dispatching > 0x22b2370 > trace May 20 12:58:31 cib_native_dispatch_internal(116):0: Activating cib > callbacks... > trace May 20 12:58:31 cib_native_callback(649):0: Invoking callback > cib_rsc_callback for call 268 > trace May 20 12:58:31 cib_rsc_callback(2113):0: Resource update 268 > complete: rc=0 > trace May 20 12:58:31 cib_rsc_callback(2121):0: Triggering FSA: > cib_rsc_callback > trace May 20 12:58:31 cib_native_callback(666):0: OP callback activated for > 268 > trace May 20 12:58:31 crm_ipc_read(992):0: No message from cib_shm > received: Resource temporarily unavailable > trace May 20 12:58:31 mainloop_gio_callback(654):0: Message acquisition > from cib_shm[0x23b7ab0] failed: No message of desired type (-42) > trace May 20 12:58:31 crm_fsa_trigger(293):0: Invoked (queue len: 0) > trace May 20 12:58:31 s_crmd_fsa(159):0: FSA invoked with Cause: > C_FSA_INTERNAL State: S_NOT_DC > trace May 20 12:58:31 s_crmd_fsa(246):0: Exiting the FSA > trace May 20 12:58:31 crm_fsa_trigger(295):0: Exited (queue len: 0) > notice May 20 12:58:43 crm_signal_dispatch(272):0: Invoking handler for > signal 5: Trace/breakpoint trap > notice May 20 12:58:43 crm_write_blackbox(431):0: Blackbox dump requested, > please see /var/lib/pacemaker/blackbox/crmd-2877.2 for contents > > -- > Mit freundlichen Grüßen > Dipl. Inf. (FH) Felix Zachlod > > Onesty Tech GmbH > Lieberoser Str. 7 > 03046 Cottbus > > Tel.: +49 (355) 289430 > Fax.: +49 (355) 28943100 > f...@onesty-tech.de > > Registergericht Amtsgericht Cottbus, HRB 7885 Geschäftsführer Romy Schötz, > Thomas Menzel _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org