On 12 Sep 2013, at 3:44 am, Lindsay Todd <rltodd....@gmail.com> wrote:

> What I am seeing in the syslog are messages like:
> 
> Sep 11 13:19:52 db02 pacemaker_remoted[1736]:   notice: operation_finished: 
> p-my
> sql_monitor_20000:19398:stderr [ 2013/09/11_13:19:52 INFO: MySQL monitor 
> succeed
> ed ]
> Sep 11 13:20:08 cvmh03 pengine[4832]:  warning: unpack_rsc_op_failure: 
> Processin
> g failed op monitor for p-mysql-slurm on cvmh02: not installed (5)
> Sep 11 13:20:08 cvmh03 pengine[4832]:   notice: unpack_rsc_op: Preventing 
> p-mysq
> l-slurm from re-starting on cvmh02: operation monitor failed 'not installed' 
> (5)
> Sep 11 13:20:08 cvmh03 pengine[4832]:  warning: unpack_rsc_op_failure: 
> Processin
> g failed op monitor for p-mysql-slurm on cvmh03: not installed (5)
> Sep 11 13:20:08 cvmh03 pengine[4832]:   notice: unpack_rsc_op: Preventing 
> p-mysq
> l-slurm from re-starting on cvmh03: operation monitor failed 'not installed' 
> (5)
> Sep 11 13:20:08 cvmh03 pengine[4832]:  warning: unpack_rsc_op_failure: 
> Processing failed op monitor for p-mysql-slurm on cvmh01: not installed (5)
> Sep 11 13:20:08 cvmh03 pengine[4832]:   notice: unpack_rsc_op: Preventing 
> p-mysql-slurm from re-starting on cvmh01: operation monitor failed 'not 
> installed' (5)
> Sep 11 13:20:08 cvmh03 pengine[4832]:  warning: unpack_rsc_op_failure: 
> Processing failed op monitor for p-mysql-slurm on cvmh02: not installed (5)
> Sep 11 13:20:08 cvmh03 pengine[4832]:   notice: unpack_rsc_op: Preventing 
> p-mysql-slurm from re-starting on cvmh02: operation monitor failed 'not 
> installed' (5)
> Sep 11 13:20:08 cvmh03 pengine[4832]:  warning: unpack_rsc_op_failure: 
> Processing failed op monitor for p-mysql-slurm on cvmh03: not installed (5)
> Sep 11 13:20:08 cvmh03 pengine[4832]:   notice: unpack_rsc_op: Preventing 
> p-mysql-slurm from re-starting on cvmh03: operation monitor failed 'not 
> installed' (5)
> Sep 11 13:20:08 cvmh03 pengine[4832]:  warning: unpack_rsc_op_failure: 
> Processing failed op monitor for p-mysql-slurm on cvmh01: not installed (5)
> Sep 11 13:20:08 cvmh03 pengine[4832]:   notice: unpack_rsc_op: Preventing 
> p-mysql-slurm from re-starting on cvmh01: operation monitor failed 'not 
> installed' (5)
> Sep 11 13:20:08 cvmh03 pengine[4832]:   notice: LogActions: Start   
> p-mysql#011(db02)
> Sep 11 13:20:08 cvmh03 crmd[4833]:   notice: te_rsc_command: Initiating 
> action 48: monitor p-mysql_monitor_0 on cvmh03 (local)
> Sep 11 13:20:08 cvmh03 crmd[4833]:   notice: te_rsc_command: Initiating 
> action 46: monitor p-mysql_monitor_0 on cvmh02
> Sep 11 13:20:08 cvmh03 crmd[4833]:   notice: te_rsc_command: Initiating 
> action 44: monitor p-mysql_monitor_0 on cvmh01
> Sep 11 13:20:08 cvmh03 mysql(p-mysql)[12476]: ERROR: Setup problem: couldn't 
> find command: /usr/bin/mysqld_safe
> Sep 11 13:20:08 cvmh03 crmd[4833]:   notice: process_lrm_event: LRM operation 
> p-mysql_monitor_0 (call=907, rc=5, cib-update=701, confirmed=true) not 
> installed
> Sep 11 13:20:08 cvmh02 mysql(p-mysql)[17158]: ERROR: Setup problem: couldn't 
> find command: /usr/bin/mysqld_safe
> Sep 11 13:20:08 cvmh01 mysql(p-mysql)[5968]: ERROR: Setup problem: couldn't 
> find command: /usr/bin/mysqld_safe
> Sep 11 13:20:08 cvmh02 crmd[5081]:   notice: process_lrm_event: LRM operation 
> p-mysql_monitor_0 (call=332, rc=5, cib-update=164, confirmed=true) not 
> installed
> Sep 11 13:20:08 cvmh01 crmd[5169]:   notice: process_lrm_event: LRM operation 
> p-mysql_monitor_0 (call=319, rc=5, cib-update=188, confirmed=true) not 
> installed
> Sep 11 13:20:08 cvmh03 crmd[4833]:  warning: status_from_rc: Action 48 
> (p-mysql_monitor_0) on cvmh03 failed (target: 7 vs. rc: 5): Error
> Sep 11 13:20:08 cvmh03 crmd[4833]:  warning: status_from_rc: Action 46 
> (p-mysql_monitor_0) on cvmh02 failed (target: 7 vs. rc: 5): Error
> Sep 11 13:20:08 cvmh03 crmd[4833]:  warning: status_from_rc: Action 44 
> (p-mysql_monitor_0) on cvmh01 failed (target: 7 vs. rc: 5): Error
> Sep 11 13:20:08 cvmh03 pengine[4832]:  warning: unpack_rsc_op_failure: 
> Processing failed op monitor for p-mysql-slurm on cvmh02: not installed (5)
> Sep 11 13:20:08 cvmh03 pengine[4832]:   notice: unpack_rsc_op: Preventing 
> p-mysql-slurm from re-starting on cvmh02: operation monitor failed 'not 
> installed' (5)
> Sep 11 13:20:08 cvmh03 pengine[4832]:  warning: unpack_rsc_op_failure: 
> Processing failed op monitor for p-mysql on cvmh02: not installed (5)
> ...
> Sep 11 13:20:08 cvmh03 crmd[4833]:   notice: te_rsc_command: Initiating 
> action 150: start p-mysql_start_0 on db02
> Sep 11 13:20:08 db02 pacemaker_remoted[1736]:   notice: operation_finished: 
> p-mysql_start_0:19427:stderr [ 2013/09/11_13:20:08 INFO: MySQL already 
> running ]
> Sep 11 13:20:08 cvmh02 crmd[5081]:   notice: process_lrm_event: LRM operation 
> p-mysql_start_0 (call=2600, rc=0, cib-update=165, confirmed=true) ok
> Sep 11 13:20:08 cvmh03 crmd[4833]:   notice: te_rsc_command: Initiating 
> action 151: monitor p-mysql_monitor_20000 on db02
> Sep 11 13:20:09 db02 pacemaker_remoted[1736]:   notice: operation_finished: 
> p-mysql_monitor_20000:19454:stderr [ 2013/09/11_13:20:09 INFO: MySQL monitor 
> succeeded ]
> 
> So I guess they aren't "error", but rather warnings, which is what we see in 
> unpack_rcs_op_failure, and I do see that is makes OCF_NOT_INSTALLED when 
> asymmetric a special case -- after logging the warning.  Should the test move 
> earlier in this function, and maybe return in that case?

I've moved that message further down into a block that is conditional on 
OCF_NOT_INSTALLED and pe_flag_symmetric_cluster:

   https://github.com/beekhof/pacemaker/commit/4b6def9

>  Also crm_mon reports errors:

The latest in git appears to have resolved this.
I'm reasonably sure it was this commit:

   https://github.com/beekhof/pacemaker/commit/a32474b

> 
> Failed actions:
>     p-mysql-slurm_monitor_0 on cvmh02 'not installed' (5): call=69, 
> status=compl
> ete, last-rc-change='Tue Sep 10 15:52:57 2013', queued=31ms, exec=0ms
>     s-ldap_monitor_0 on cvmh02 'not installed' (5): call=289, status=Not 
> install
> ed, last-rc-change='Tue Sep 10 16:15:19 2013', queued=0ms, exec=0ms
>     p-mysql_monitor_0 on cvmh02 'not installed' (5): call=332, 
> status=complete, last-rc-change='Wed Sep 11 13:20:08 2013', queued=40ms, 
> exec=0ms
>     p-mysql-slurm_monitor_0 on cvmh03 'not installed' (5): call=325, 
> status=complete, last-rc-change='Wed Sep  4 13:44:15 2013', queued=35ms, 
> exec=0ms
>     s-ldap_monitor_0 on cvmh03 'not installed' (5): call=869, status=Not 
> installed, last-rc-change='Tue Sep 10 16:15:19 2013', queued=0ms, exec=0ms
>     p-mysql_monitor_0 on cvmh03 'not installed' (5): call=907, 
> status=complete, last-rc-change='Wed Sep 11 13:20:08 2013', queued=36ms, 
> exec=0ms
>     p-mysql-slurm_monitor_0 on cvmh01 'not installed' (5): call=95, 
> status=complete, last-rc-change='Tue Sep 10 15:48:15 2013', queued=95ms, 
> exec=0ms
>     fence-cvmh02_start_0 on (null) 'unknown error' (1): call=-1, status=Timed 
> Out, last-rc-change='Tue Sep 10 15:49:38 2013', queued=0ms, exec=0ms
>     fence-cvmh02_start_0 on cvmh01 'unknown error' (1): call=-1, status=Timed 
> Out, last-rc-change='Tue Sep 10 15:49:38 2013', queued=0ms, exec=0ms
>     s-ldap_monitor_0 on cvmh01 'not installed' (5): call=279, status=Not 
> installed, last-rc-change='Tue Sep 10 16:15:19 2013', queued=0ms, exec=0ms
>     p-mysql_monitor_0 on cvmh01 'not installed' (5): call=319, 
> status=complete, last-rc-change='Wed Sep 11 13:20:08 2013', queued=42ms, 
> exec=0ms
> 
> Almost all of these are instances of resources being probed on nodes that 
> they shouldn't be running on, aren't installed on, and aren't really errors.  
> (I assume the crm_report has captured the location rules, as well as 
> confirmed that the symmetric-cluster property is false.)  The resources do 
> also start up on the nodes they should run on.
> 
> Previously I'd noticed that LSB resources probed on nodes that don't have the 
> associated init script would fail; looks like that is also getting reported 
> as OCF_NOT_INSTALLED, so perhaps is the same problem.
> 
> 
> On Wed, Sep 4, 2013 at 12:49 AM, Andrew Beekhof <and...@beekhof.net> wrote:
> 
> On 04/09/2013, at 6:18 AM, Lindsay Todd <rltodd....@gmail.com> wrote:
> 
> > We've been attempting to set up an asymmetric pacemaker cluster using 
> > remote cluster nodes, with pacemaker 1.1.10 (actually, building from git 
> > lately, currently at a4eb44f).  We use location constraints to enable 
> > resources to start on nodes they should start on, and rely on asymmetry to 
> > otherwise keep resources from starting.
> 
> You set symmetric-cluster=false or assumed that was the default
> 
> >
> > But we get many monitor operation failures.
> >
> > Resource monitor operations run on the physical real hosts, and frequently 
> > fail because not all the components are present on those hosts.  For 
> > instance, the mysql resource agent's monitor operation fails as "not 
> > installed", since, well, mysql isn't installed on those systems, so the 
> > validate operation, which most or every path through that agent runs, 
> > always fails.  I don't see failures on the remote nodes, even ones without 
> > mysql installed.
> >
> > Previously I'd noticed LSB resources had failed monitor operations on 
> > systems that didn't have the LSB init script installed.
> >
> > Presumably these monitor operations are happening to ensure the resource is 
> > NOT running where it should not be???
> 
> Correct. Although with symmetric-cluster=false it shouldn't show up as an 
> error.
> Logs? crm_mon output?
> 
> >  There doesn't seem to be a way to set up location constraints to prevent 
> > this from happening, at least that I've found.  I wrote an OCF wrapper RA 
> > to help me with LSB init scripts, but not sure what to do about other RA's 
> > like mysql short of maintaining my own version, unless there is a way to 
> > tune where "monitor" runs.  Or more likely I'm missing something ...
> >
> > It would seem to me that a "not installed" failure, OCF_ERR_INSTALLED, 
> > would not really be an error on a node that shouldn't be running that 
> > resource agent anyway, and is probably a pretty good indication that it 
> > isn't running.
> >
> > /Lindsay
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to