LS, Running a 2 node cluster, heartbeat-2.1.3-3 centos rpms, RH AS 4.6
While testing a "maintenance scenario" for the cluster I set all resources to is_managed is false, Feb 20 21:09:41 sierpinski pengine: [15725]: notice: native_print: R_BB10PRD_DB (heartbeat::ocf:oracle): Started sierpinski.uvt.nl (unmanaged) and proceeded to shut oracle by hand, oracle being one of the resources. Feb 20 21:12:03 sierpinski oracle[23120]: [23145]: INFO: Oracle instance BB10PRD is down Within minutes, the node was stonithed. The log shows that this was right after the monitor operation for the oracle resource came back with return code 7: Feb 20 21:12:03 sierpinski crmd: [4584]: info: process_lrm_event: LRM operation R_BB10PRD_DB_monitor_120000 (call=31, rc=7) complete Feb 20 21:12:03 mandelbrot stonithd: [4580]: info: stonith_operate_locally::2375: sending fencing op (RESET) for sierpinski.uvt.nl to device external (rsc_id=R_ilo_sierpinski:0, pid=5414) Feb 20 21:12:03 mandelbrot stonithd: [4580]: info: Node mandelbrot.uvt.nl try to help node sierpinski.uvt.nl to fence node sierpinski.uvt.nl. Conclusion: the monitor operation was still running even though the resource was unmanaged, and it forced a fencing action. I then made a script which in addition to changing the resources to is_managed = false also set the monitor operations to disabled=true. This worked, now I am able to shutdown oracle by hand without a fencing action starting up. Questions: It this expected behavior? Should monitor operations keep running even though the resources are set to is_managed=false? Is explicitly setting the monitor operations to disable=true the "right way" to prevent unwanted fencing actions during cluster maintenance? tia, Johan (happy to post hb_reports if requested)
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems