On 12 Sep 2013, at 3:44 am, Lindsay Todd <rltodd....@gmail.com> wrote:
> What I am seeing in the syslog are messages like: > > Sep 11 13:19:52 db02 pacemaker_remoted[1736]: notice: operation_finished: > p-my > sql_monitor_20000:19398:stderr [ 2013/09/11_13:19:52 INFO: MySQL monitor > succeed > ed ] > Sep 11 13:20:08 cvmh03 pengine[4832]: warning: unpack_rsc_op_failure: > Processin > g failed op monitor for p-mysql-slurm on cvmh02: not installed (5) > Sep 11 13:20:08 cvmh03 pengine[4832]: notice: unpack_rsc_op: Preventing > p-mysq > l-slurm from re-starting on cvmh02: operation monitor failed 'not installed' > (5) > Sep 11 13:20:08 cvmh03 pengine[4832]: warning: unpack_rsc_op_failure: > Processin > g failed op monitor for p-mysql-slurm on cvmh03: not installed (5) > Sep 11 13:20:08 cvmh03 pengine[4832]: notice: unpack_rsc_op: Preventing > p-mysq > l-slurm from re-starting on cvmh03: operation monitor failed 'not installed' > (5) > Sep 11 13:20:08 cvmh03 pengine[4832]: warning: unpack_rsc_op_failure: > Processing failed op monitor for p-mysql-slurm on cvmh01: not installed (5) > Sep 11 13:20:08 cvmh03 pengine[4832]: notice: unpack_rsc_op: Preventing > p-mysql-slurm from re-starting on cvmh01: operation monitor failed 'not > installed' (5) > Sep 11 13:20:08 cvmh03 pengine[4832]: warning: unpack_rsc_op_failure: > Processing failed op monitor for p-mysql-slurm on cvmh02: not installed (5) > Sep 11 13:20:08 cvmh03 pengine[4832]: notice: unpack_rsc_op: Preventing > p-mysql-slurm from re-starting on cvmh02: operation monitor failed 'not > installed' (5) > Sep 11 13:20:08 cvmh03 pengine[4832]: warning: unpack_rsc_op_failure: > Processing failed op monitor for p-mysql-slurm on cvmh03: not installed (5) > Sep 11 13:20:08 cvmh03 pengine[4832]: notice: unpack_rsc_op: Preventing > p-mysql-slurm from re-starting on cvmh03: operation monitor failed 'not > installed' (5) > Sep 11 13:20:08 cvmh03 pengine[4832]: warning: unpack_rsc_op_failure: > Processing failed op monitor for p-mysql-slurm on cvmh01: not installed (5) > Sep 11 13:20:08 cvmh03 pengine[4832]: notice: unpack_rsc_op: Preventing > p-mysql-slurm from re-starting on cvmh01: operation monitor failed 'not > installed' (5) > Sep 11 13:20:08 cvmh03 pengine[4832]: notice: LogActions: Start > p-mysql#011(db02) > Sep 11 13:20:08 cvmh03 crmd[4833]: notice: te_rsc_command: Initiating > action 48: monitor p-mysql_monitor_0 on cvmh03 (local) > Sep 11 13:20:08 cvmh03 crmd[4833]: notice: te_rsc_command: Initiating > action 46: monitor p-mysql_monitor_0 on cvmh02 > Sep 11 13:20:08 cvmh03 crmd[4833]: notice: te_rsc_command: Initiating > action 44: monitor p-mysql_monitor_0 on cvmh01 > Sep 11 13:20:08 cvmh03 mysql(p-mysql)[12476]: ERROR: Setup problem: couldn't > find command: /usr/bin/mysqld_safe > Sep 11 13:20:08 cvmh03 crmd[4833]: notice: process_lrm_event: LRM operation > p-mysql_monitor_0 (call=907, rc=5, cib-update=701, confirmed=true) not > installed > Sep 11 13:20:08 cvmh02 mysql(p-mysql)[17158]: ERROR: Setup problem: couldn't > find command: /usr/bin/mysqld_safe > Sep 11 13:20:08 cvmh01 mysql(p-mysql)[5968]: ERROR: Setup problem: couldn't > find command: /usr/bin/mysqld_safe > Sep 11 13:20:08 cvmh02 crmd[5081]: notice: process_lrm_event: LRM operation > p-mysql_monitor_0 (call=332, rc=5, cib-update=164, confirmed=true) not > installed > Sep 11 13:20:08 cvmh01 crmd[5169]: notice: process_lrm_event: LRM operation > p-mysql_monitor_0 (call=319, rc=5, cib-update=188, confirmed=true) not > installed > Sep 11 13:20:08 cvmh03 crmd[4833]: warning: status_from_rc: Action 48 > (p-mysql_monitor_0) on cvmh03 failed (target: 7 vs. rc: 5): Error > Sep 11 13:20:08 cvmh03 crmd[4833]: warning: status_from_rc: Action 46 > (p-mysql_monitor_0) on cvmh02 failed (target: 7 vs. rc: 5): Error > Sep 11 13:20:08 cvmh03 crmd[4833]: warning: status_from_rc: Action 44 > (p-mysql_monitor_0) on cvmh01 failed (target: 7 vs. rc: 5): Error > Sep 11 13:20:08 cvmh03 pengine[4832]: warning: unpack_rsc_op_failure: > Processing failed op monitor for p-mysql-slurm on cvmh02: not installed (5) > Sep 11 13:20:08 cvmh03 pengine[4832]: notice: unpack_rsc_op: Preventing > p-mysql-slurm from re-starting on cvmh02: operation monitor failed 'not > installed' (5) > Sep 11 13:20:08 cvmh03 pengine[4832]: warning: unpack_rsc_op_failure: > Processing failed op monitor for p-mysql on cvmh02: not installed (5) > ... > Sep 11 13:20:08 cvmh03 crmd[4833]: notice: te_rsc_command: Initiating > action 150: start p-mysql_start_0 on db02 > Sep 11 13:20:08 db02 pacemaker_remoted[1736]: notice: operation_finished: > p-mysql_start_0:19427:stderr [ 2013/09/11_13:20:08 INFO: MySQL already > running ] > Sep 11 13:20:08 cvmh02 crmd[5081]: notice: process_lrm_event: LRM operation > p-mysql_start_0 (call=2600, rc=0, cib-update=165, confirmed=true) ok > Sep 11 13:20:08 cvmh03 crmd[4833]: notice: te_rsc_command: Initiating > action 151: monitor p-mysql_monitor_20000 on db02 > Sep 11 13:20:09 db02 pacemaker_remoted[1736]: notice: operation_finished: > p-mysql_monitor_20000:19454:stderr [ 2013/09/11_13:20:09 INFO: MySQL monitor > succeeded ] > > So I guess they aren't "error", but rather warnings, which is what we see in > unpack_rcs_op_failure, and I do see that is makes OCF_NOT_INSTALLED when > asymmetric a special case -- after logging the warning. Should the test move > earlier in this function, and maybe return in that case? I've moved that message further down into a block that is conditional on OCF_NOT_INSTALLED and pe_flag_symmetric_cluster: https://github.com/beekhof/pacemaker/commit/4b6def9 > Also crm_mon reports errors: The latest in git appears to have resolved this. I'm reasonably sure it was this commit: https://github.com/beekhof/pacemaker/commit/a32474b > > Failed actions: > p-mysql-slurm_monitor_0 on cvmh02 'not installed' (5): call=69, > status=compl > ete, last-rc-change='Tue Sep 10 15:52:57 2013', queued=31ms, exec=0ms > s-ldap_monitor_0 on cvmh02 'not installed' (5): call=289, status=Not > install > ed, last-rc-change='Tue Sep 10 16:15:19 2013', queued=0ms, exec=0ms > p-mysql_monitor_0 on cvmh02 'not installed' (5): call=332, > status=complete, last-rc-change='Wed Sep 11 13:20:08 2013', queued=40ms, > exec=0ms > p-mysql-slurm_monitor_0 on cvmh03 'not installed' (5): call=325, > status=complete, last-rc-change='Wed Sep 4 13:44:15 2013', queued=35ms, > exec=0ms > s-ldap_monitor_0 on cvmh03 'not installed' (5): call=869, status=Not > installed, last-rc-change='Tue Sep 10 16:15:19 2013', queued=0ms, exec=0ms > p-mysql_monitor_0 on cvmh03 'not installed' (5): call=907, > status=complete, last-rc-change='Wed Sep 11 13:20:08 2013', queued=36ms, > exec=0ms > p-mysql-slurm_monitor_0 on cvmh01 'not installed' (5): call=95, > status=complete, last-rc-change='Tue Sep 10 15:48:15 2013', queued=95ms, > exec=0ms > fence-cvmh02_start_0 on (null) 'unknown error' (1): call=-1, status=Timed > Out, last-rc-change='Tue Sep 10 15:49:38 2013', queued=0ms, exec=0ms > fence-cvmh02_start_0 on cvmh01 'unknown error' (1): call=-1, status=Timed > Out, last-rc-change='Tue Sep 10 15:49:38 2013', queued=0ms, exec=0ms > s-ldap_monitor_0 on cvmh01 'not installed' (5): call=279, status=Not > installed, last-rc-change='Tue Sep 10 16:15:19 2013', queued=0ms, exec=0ms > p-mysql_monitor_0 on cvmh01 'not installed' (5): call=319, > status=complete, last-rc-change='Wed Sep 11 13:20:08 2013', queued=42ms, > exec=0ms > > Almost all of these are instances of resources being probed on nodes that > they shouldn't be running on, aren't installed on, and aren't really errors. > (I assume the crm_report has captured the location rules, as well as > confirmed that the symmetric-cluster property is false.) The resources do > also start up on the nodes they should run on. > > Previously I'd noticed that LSB resources probed on nodes that don't have the > associated init script would fail; looks like that is also getting reported > as OCF_NOT_INSTALLED, so perhaps is the same problem. > > > On Wed, Sep 4, 2013 at 12:49 AM, Andrew Beekhof <and...@beekhof.net> wrote: > > On 04/09/2013, at 6:18 AM, Lindsay Todd <rltodd....@gmail.com> wrote: > > > We've been attempting to set up an asymmetric pacemaker cluster using > > remote cluster nodes, with pacemaker 1.1.10 (actually, building from git > > lately, currently at a4eb44f). We use location constraints to enable > > resources to start on nodes they should start on, and rely on asymmetry to > > otherwise keep resources from starting. > > You set symmetric-cluster=false or assumed that was the default > > > > > But we get many monitor operation failures. > > > > Resource monitor operations run on the physical real hosts, and frequently > > fail because not all the components are present on those hosts. For > > instance, the mysql resource agent's monitor operation fails as "not > > installed", since, well, mysql isn't installed on those systems, so the > > validate operation, which most or every path through that agent runs, > > always fails. I don't see failures on the remote nodes, even ones without > > mysql installed. > > > > Previously I'd noticed LSB resources had failed monitor operations on > > systems that didn't have the LSB init script installed. > > > > Presumably these monitor operations are happening to ensure the resource is > > NOT running where it should not be??? > > Correct. Although with symmetric-cluster=false it shouldn't show up as an > error. > Logs? crm_mon output? > > > There doesn't seem to be a way to set up location constraints to prevent > > this from happening, at least that I've found. I wrote an OCF wrapper RA > > to help me with LSB init scripts, but not sure what to do about other RA's > > like mysql short of maintaining my own version, unless there is a way to > > tune where "monitor" runs. Or more likely I'm missing something ... > > > > It would seem to me that a "not installed" failure, OCF_ERR_INSTALLED, > > would not really be an error on a node that shouldn't be running that > > resource agent anyway, and is probably a pretty good indication that it > > isn't running. > > > > /Lindsay > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org