subject:"\[ClusterLabs\] ocf\-tester always claims failure, even with built\-in resource agents\?"

Re: [ClusterLabs] ocf-tester always claims failure, even with built-in resource agents?

2021-03-26 Thread Reid Wahl

On Fri, Mar 26, 2021 at 2:44 PM Antony Stone 
wrote:

> On Friday 26 March 2021 at 18:31:51, Ken Gaillot wrote:
>
> > On Fri, 2021-03-26 at 19:59 +0300, Andrei Borzenkov wrote:
> > > On 26.03.2021 17:28, Antony Stone wrote:
> > > >
> > > > So far all is well and good, my cluster synchronises, starts the
> > > > resources, and everything's working as expected.  It'll move the
> > > > resources from one cluster member to another (either if I ask it to,
> or
> > > > if there's a problem), and it seems to work just as the older version
> > > > did.
> >
> > I'm glad this far was easy :)
>
> Well, I've been using corosync & pacemaker for some years now; I've got
> used
> to some of their quirks and foibles :)
>
> Now I just need to learn about the new ones for the newer versions...
>
> > It's worth noting that pacemaker itself doesn't try to validate the
> > agent meta-data, it just checks for the pieces that are interesting to
> > it and ignores the rest.
>
> I guess that's good, so long as what it does pay attention to is what it
> wants
> to see?
>
> > It's also worth noting that the OCF 1.0 standard is horribly outdated
> > compared to actual use, and the OCF 1.1 standard is being adopted today
> > (!) after many years of trying to come up with something more up-to-
> > date.
>
> So, is ocf-tester no longer the right tools I should be using to check
> this
> sort of thing?  What shouold I be doing instead to make sure my
> configuration
> is valid / acceptable to pacemaker?
>
> > Bottom line, it's worth installing xmllint to see if that helps, but I
> > wouldn't worry about meta-data schema issues.
>
> Well, as stated in my other reply to Andrei, I now get:
>
> /usr/lib/ocf/resource.d/heartbeat/asterisk passed all tests
>
> /usr/lib/ocf/resource.d/heartbeat/anything passed all tests
>
> so I guess it means my configuration file is okay, and I need to look
> somewher
> eelse to find out why pacemaker 2.0.1 is throwing wobblies with exactly
> the
> same resources that pacemaker 1.1.16 can manage quite happily and stably...
>
> > > Either agent does not run as root or something blocks chown. Usual
> > > suspects are apparmor or SELinux.
> >
> > Pacemaker itself can also return this error in certain cases, such as
> > not having permissions to execute the agent. Check the pacemaker detail
> > log (usually /var/log/pacemaker/pacemaker.log) and the system log
> > around these times to see if there is more detail.
>
> I've turned on debug logging, but I'm still not sure I'm seeing *exactly*
> what
> the resource agent checker is doing when it gets this failure.
>
> > It is definitely weird that a privileges error would be sporadic.
> > Hopefully the logs can shed some more light.
>
> I've captured a bunch of them this afternoon and will go through them on
> Monday - it's pretty verbose!
>
> > Another possibility would be to set trace_ra=1 on the actions that are
> > failing to get line-by-line info from the agents.
>
> So, that would be an extra parameter to the resource definition in
> cluster.cib?
>
> Change:
>
> primitive Asterisk asterisk meta migration-threshold=3 op monitor
> interval=5
> timeout=30 on-fail=restart failure-timeout=10s
>
> to:
>
> primitive Asterisk asterisk meta migration-threshold=3 op monitor
> interval=5
> timeout=30 on-fail=restart failure-timeout=10s trace_ra=1
>
> ?
>

It's an instance attribute, not a meta attribute. I'm not familiar with
crmsh syntax but trace_ra=1 would go wherever you would configure a
"normal" option, like `ip=x.x.x.x` for an IPaddr2 resource. It will save a
shell trace of each operation to a file in
/var/lib/heartbeat/trace_ra/asterisk. You would then wait for an operation
to fail, find the file containing that operation's trace, and see what it
tells you about the error.

You might already have some more detail about the error in
/var/log/messages and/or /var/log/pacemaker/pacemaker.log. Look in
/var/log/messages around Fri Mar 26 13:37:08 2021 on the node where the
failure occurred. See if there are any additional messages from the
resource agent, or any stdout or stderr logged by lrmd/pacemaker-execd for
the Asterisk resource.


>
> Antony.
>
> --
> "It is easy to be blinded to the essential uselessness of them by the
> sense of
> achievement you get from getting them to work at all. In other words - and
> this is the rock solid principle on which the whole of the Corporation's
> Galaxy-wide success is founded - their fundamental design flaws are
> completely
> hidden by their superficial design flaws."
>
>  - Douglas Noel Adams
>
>Please reply to the
> list;
>  please *don't* CC
> me.
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>

-- 
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
C

Re: [ClusterLabs] ocf-tester always claims failure, even with built-in resource agents?

2021-03-26 Thread Antony Stone

On Friday 26 March 2021 at 18:31:51, Ken Gaillot wrote:

> On Fri, 2021-03-26 at 19:59 +0300, Andrei Borzenkov wrote:
> > On 26.03.2021 17:28, Antony Stone wrote:
> > > 
> > > So far all is well and good, my cluster synchronises, starts the
> > > resources, and everything's working as expected.  It'll move the
> > > resources from one cluster member to another (either if I ask it to, or
> > > if there's a problem), and it seems to work just as the older version
> > > did.
> 
> I'm glad this far was easy :)

Well, I've been using corosync & pacemaker for some years now; I've got used 
to some of their quirks and foibles :)

Now I just need to learn about the new ones for the newer versions...

> It's worth noting that pacemaker itself doesn't try to validate the
> agent meta-data, it just checks for the pieces that are interesting to
> it and ignores the rest.

I guess that's good, so long as what it does pay attention to is what it wants 
to see?

> It's also worth noting that the OCF 1.0 standard is horribly outdated
> compared to actual use, and the OCF 1.1 standard is being adopted today
> (!) after many years of trying to come up with something more up-to-
> date.

So, is ocf-tester no longer the right tools I should be using to check this 
sort of thing?  What shouold I be doing instead to make sure my configuration 
is valid / acceptable to pacemaker?

> Bottom line, it's worth installing xmllint to see if that helps, but I
> wouldn't worry about meta-data schema issues.

Well, as stated in my other reply to Andrei, I now get:

/usr/lib/ocf/resource.d/heartbeat/asterisk passed all tests

/usr/lib/ocf/resource.d/heartbeat/anything passed all tests

so I guess it means my configuration file is okay, and I need to look somewher 
eelse to find out why pacemaker 2.0.1 is throwing wobblies with exactly the 
same resources that pacemaker 1.1.16 can manage quite happily and stably...

> > Either agent does not run as root or something blocks chown. Usual
> > suspects are apparmor or SELinux.
> 
> Pacemaker itself can also return this error in certain cases, such as
> not having permissions to execute the agent. Check the pacemaker detail
> log (usually /var/log/pacemaker/pacemaker.log) and the system log
> around these times to see if there is more detail.

I've turned on debug logging, but I'm still not sure I'm seeing *exactly* what 
the resource agent checker is doing when it gets this failure.

> It is definitely weird that a privileges error would be sporadic.
> Hopefully the logs can shed some more light.

I've captured a bunch of them this afternoon and will go through them on 
Monday - it's pretty verbose!

> Another possibility would be to set trace_ra=1 on the actions that are
> failing to get line-by-line info from the agents.

So, that would be an extra parameter to the resource definition in cluster.cib?

Change:

primitive Asterisk asterisk meta migration-threshold=3 op monitor interval=5 
timeout=30 on-fail=restart failure-timeout=10s

to:

primitive Asterisk asterisk meta migration-threshold=3 op monitor interval=5 
timeout=30 on-fail=restart failure-timeout=10s trace_ra=1

?

Antony.

-- 
"It is easy to be blinded to the essential uselessness of them by the sense of 
achievement you get from getting them to work at all. In other words - and 
this is the rock solid principle on which the whole of the Corporation's 
Galaxy-wide success is founded - their fundamental design flaws are completely 
hidden by their superficial design flaws."

 - Douglas Noel Adams

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] ocf-tester always claims failure, even with built-in resource agents?

2021-03-26 Thread Antony Stone

On Friday 26 March 2021 at 17:59:07, Andrei Borzenkov wrote:

> On 26.03.2021 17:28, Antony Stone wrote:

> > # ocf-tester -n Asterisk /usr/lib/ocf/resource.d/heartbeat/asterisk
> > Beginning tests for /usr/lib/ocf/resource.d/heartbeat/asterisk...
> > /usr/sbin/ocf-tester: 226: /usr/sbin/ocf-tester: xmllint: not found
> > * rc=127: Your agent produces meta-data which does not conform to
> > ra-api-1.dtd * Your agent does not support the notify action (optional)
> > * Your agent does not support the demote action (optional)
> > * Your agent does not support the promote action (optional)
> > * Your agent does not support master/slave (optional)
> > * Your agent does not support the reload action (optional)
> > Tests failed: /usr/lib/ocf/resource.d/heartbeat/asterisk failed 1 tests

> As is pretty clear from error messages, ocf-tester calls xmllint which
> is missing.

Ah, I had not realised that this meant the rest of the output would be 
invalid.

I thought it just meant "you don't have xmllint installed, so there's some 
stuff we might otherwise be able to tell you, but can't".

If xmllint being installed is a requirement for the remained of the output the 
be meaningful, I'd expected that ocf-tester would simply give up at that point 
and tell me that until I install xmllint, ocf-tester can't do its job.

That seems like a bit of a bug to me.

After installing xmllint I now get:

/usr/lib/ocf/resource.d/heartbeat/asterisk passed all tests

/usr/lib/ocf/resource.d/heartbeat/anything passed all tests

So I'm now back to working out how to debug the failures I do see in "normal" 
operation, which were notocurring with the older versions of corosync & 
pacemaker...

> > My second question is: how can I debug what caused pacemaker to decide
> > that it couldn't run Asterisk due to "insufficient privileges"

> Agent returns this error if it fails to chown directory specified in its
> configuration file:
> 
> # Regardless of whether we just created the directory or it
> # already existed, check whether it is writable by the configured
> # user
> if ! su -s /bin/sh - $OCF_RESKEY_user -c "test -w $dir"; then
> ocf_log warn "Directory $dir is not writable by
> $OCF_RESKEY_user, attempting chown"
> ocf_run chown $OCF_RESKEY_user:$OCF_RESKEY_group $dir \
> 
> || exit $OCF_ERR_PERM
> 
> Either agent does not run as root or something blocks chown. Usual
> suspects are apparmor or SELinux.

Well, I'm not running either of those, but your comments point me in what I 
think is a helpful direction - thanks.

Regards,

Antony.

-- 
It may not seem obvious, but (6 x 5 + 5) x 5 - 55 equals 5!

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] ocf-tester always claims failure, even with built-in resource agents?

2021-03-26 Thread Ken Gaillot

On Fri, 2021-03-26 at 19:59 +0300, Andrei Borzenkov wrote:
> On 26.03.2021 17:28, Antony Stone wrote:
> > Hi.
> > 
> > I've just signed up to the list.  I've been using corosync and
> > pacemaker for 
> > several years, mostly under Debian 9, which means:
> > 
> > corosync 2.4.2
> > pacemaker 1.1.16
> > 
> > I've recently upgraded a test cluster to Debian 10, which gives me:
> > 
> > corosync 3.0.1
> > pacemaker 2.0.1
> > 
> > I've made a few adjustments to my /etc/corosync/corosync.conf
> > configuration so 
> > that corosync seems happy, and also some minor changes (mostly to
> > the cluster 
> > defaults) in /etc/corosync/cluster.cib so that pacemaker is happy.
> > 
> > So far all is well and good, my cluster synchronises, starts the
> > resources, 
> > and everything's working as expected.  It'll move the resources
> > from one 
> > cluster member to another (either if I ask it to, or if there's a
> > problem), 
> > and it seems to work just as the older version did.

I'm glad this far was easy :)

> > Then, several times a day, I get resource failures such as:
> > 
> > * Asterisk_start_0 on castor 'insufficient privileges' (4):
> >  call=58,
> >  status=complete,
> >  exitreason='',
> >  last-rc-change='Fri Mar 26 13:37:08 2021',
> >  queued=0ms,
> >  exec=55ms
> > 
> > I have no idea why the machine might tell me it cannot start
> > Asterisk due to 
> > insufficient privilege when it's already been able to run it before
> > the cluster 
> > resources moved back to this machine.  Asterisk *can* and *does*
> > run on this 
> > machine.
> > 
> > Another error I get is:
> > 
> > * Kann-Bear_monitor_5000 on helen 'unknown error' (1):
> >  call=62,
> >  status=complete,
> >  exitreason='',
> >  last-rc-change='Fri Mar 26 14:23:05 2021',
> >  queued=0ms,
> >  exec=0ms
> > 
> > Now, that second resource is one which doesn't have a standard
> > resource agent 
> > available for it under /usr/lib/ocf/resource.d, so I'm using the
> > general-
> > purpose agent /usr/lib/ocf/resource.d/heartbeat/anything to manage
> > it.
> > 
> > I thought, "perhaps there's something dodgy about using this
> > 'anything' agent, 
> > because it can't really know about the resource it's managing", so
> > I tested it 
> > with ocf-tester:
> > 
> > # ocf-tester -n Kann-Bear -o binfile="/usr/sbin/bearerbox" -o 
> > cmdline_options="/etc/kannel/kannel.conf" -o 
> > pidfile="/var/run/kannel/kannel_bearerbox.pid" 
> > /usr/lib/ocf/resource.d/heartbeat/anything
> > Beginning tests for /usr/lib/ocf/resource.d/heartbeat/anything...
> > /usr/sbin/ocf-tester: 226: /usr/sbin/ocf-tester: xmllint: not found
> > * rc=127: Your agent produces meta-data which does not conform to
> > ra-api-1.dtd
> > * Your agent does not support the notify action (optional)
> > * Your agent does not support the demote action (optional)
> > * Your agent does not support the promote action (optional)
> > * Your agent does not support master/slave (optional)
> > * Your agent does not support the reload action (optional)
> > Tests failed: /usr/lib/ocf/resource.d/heartbeat/anything failed 1
> > tests
> > 
> > Okay, something's not right.
> > 
> > BUT, it doesn't matter *which* resource agent I test, it tells me
> > the same 
> > thing every time, including for the built-in standard agents:
> > 
> > * rc=127: Your agent produces meta-data which does not conform to
> > ra-api-1.dtd
> > 
> > For example:
> > 
> > # ocf-tester -n Asterisk /usr/lib/ocf/resource.d/heartbeat/asterisk
> > Beginning tests for /usr/lib/ocf/resource.d/heartbeat/asterisk...
> > /usr/sbin/ocf-tester: 226: /usr/sbin/ocf-tester: xmllint: not found
> > * rc=127: Your agent produces meta-data which does not conform to
> > ra-api-1.dtd
> > * Your agent does not support the notify action (optional)
> > * Your agent does not support the demote action (optional)
> > * Your agent does not support the promote action (optional)
> > * Your agent does not support master/slave (optional)
> > * Your agent does not support the reload action (optional)
> > Tests failed: /usr/lib/ocf/resource.d/heartbeat/asterisk failed 1
> > tests
> > 
> > 
> > # ocf-tester -n IP-Float4 -o ip=10.1.0.42 -o cidr_netmask=28 
> > /usr/lib/ocf/resource.d/heartbeat/IPaddr2
> > Beginning tests for /usr/lib/ocf/resource.d/heartbeat/IPaddr2...
> > /usr/sbin/ocf-tester: 226: /usr/sbin/ocf-tester: xmllint: not found
> > * rc=127: Your agent produces meta-data which does not conform to
> > ra-api-1.dtd
> > * Your agent does not support the notify action (optional)
> > * Your agent does not support the demote action (optional)
> > * Your agent does not support the promote action (optional)
> > * Your agent does not support master/slave (optional)
> > * Your agent does not support the reload action (optional)
> > Tests failed: /usr/lib/ocf/resource.d/heartbeat/IPaddr2 failed 1
> > tests
> > 
> > 
> > So, it seems to be telling me that even the standard built-in
> > resource

Re: [ClusterLabs] ocf-tester always claims failure, even with built-in resource agents?

2021-03-26 Thread Andrei Borzenkov

On 26.03.2021 17:28, Antony Stone wrote:
> Hi.
> 
> I've just signed up to the list.  I've been using corosync and pacemaker for 
> several years, mostly under Debian 9, which means:
> 
>   corosync 2.4.2
>   pacemaker 1.1.16
> 
> I've recently upgraded a test cluster to Debian 10, which gives me:
> 
>   corosync 3.0.1
>   pacemaker 2.0.1
> 
> I've made a few adjustments to my /etc/corosync/corosync.conf configuration 
> so 
> that corosync seems happy, and also some minor changes (mostly to the cluster 
> defaults) in /etc/corosync/cluster.cib so that pacemaker is happy.
> 
> So far all is well and good, my cluster synchronises, starts the resources, 
> and everything's working as expected.  It'll move the resources from one 
> cluster member to another (either if I ask it to, or if there's a problem), 
> and it seems to work just as the older version did.
> 
> Then, several times a day, I get resource failures such as:
> 
>   * Asterisk_start_0 on castor 'insufficient privileges' (4):
>call=58,
>status=complete,
>exitreason='',
>last-rc-change='Fri Mar 26 13:37:08 2021',
>queued=0ms,
>exec=55ms
> 
> I have no idea why the machine might tell me it cannot start Asterisk due to 
> insufficient privilege when it's already been able to run it before the 
> cluster 
> resources moved back to this machine.  Asterisk *can* and *does* run on this 
> machine.
> 
> Another error I get is:
> 
>   * Kann-Bear_monitor_5000 on helen 'unknown error' (1):
>call=62,
>status=complete,
>exitreason='',
>last-rc-change='Fri Mar 26 14:23:05 2021',
>queued=0ms,
>exec=0ms
> 
> Now, that second resource is one which doesn't have a standard resource agent 
> available for it under /usr/lib/ocf/resource.d, so I'm using the general-
> purpose agent /usr/lib/ocf/resource.d/heartbeat/anything to manage it.
> 
> I thought, "perhaps there's something dodgy about using this 'anything' 
> agent, 
> because it can't really know about the resource it's managing", so I tested 
> it 
> with ocf-tester:
> 
> # ocf-tester -n Kann-Bear -o binfile="/usr/sbin/bearerbox" -o 
> cmdline_options="/etc/kannel/kannel.conf" -o 
> pidfile="/var/run/kannel/kannel_bearerbox.pid" 
> /usr/lib/ocf/resource.d/heartbeat/anything
> Beginning tests for /usr/lib/ocf/resource.d/heartbeat/anything...
> /usr/sbin/ocf-tester: 226: /usr/sbin/ocf-tester: xmllint: not found
> * rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd
> * Your agent does not support the notify action (optional)
> * Your agent does not support the demote action (optional)
> * Your agent does not support the promote action (optional)
> * Your agent does not support master/slave (optional)
> * Your agent does not support the reload action (optional)
> Tests failed: /usr/lib/ocf/resource.d/heartbeat/anything failed 1 tests
> 
> Okay, something's not right.
> 
> BUT, it doesn't matter *which* resource agent I test, it tells me the same 
> thing every time, including for the built-in standard agents:
> 
> * rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd
> 
> For example:
> 
> # ocf-tester -n Asterisk /usr/lib/ocf/resource.d/heartbeat/asterisk
> Beginning tests for /usr/lib/ocf/resource.d/heartbeat/asterisk...
> /usr/sbin/ocf-tester: 226: /usr/sbin/ocf-tester: xmllint: not found
> * rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd
> * Your agent does not support the notify action (optional)
> * Your agent does not support the demote action (optional)
> * Your agent does not support the promote action (optional)
> * Your agent does not support master/slave (optional)
> * Your agent does not support the reload action (optional)
> Tests failed: /usr/lib/ocf/resource.d/heartbeat/asterisk failed 1 tests
> 
> 
> # ocf-tester -n IP-Float4 -o ip=10.1.0.42 -o cidr_netmask=28 
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2
> Beginning tests for /usr/lib/ocf/resource.d/heartbeat/IPaddr2...
> /usr/sbin/ocf-tester: 226: /usr/sbin/ocf-tester: xmllint: not found
> * rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd
> * Your agent does not support the notify action (optional)
> * Your agent does not support the demote action (optional)
> * Your agent does not support the promote action (optional)
> * Your agent does not support master/slave (optional)
> * Your agent does not support the reload action (optional)
> Tests failed: /usr/lib/ocf/resource.d/heartbeat/IPaddr2 failed 1 tests
> 
> 
> So, it seems to be telling me that even the standard built-in resource agents 
> "produce meta-data which does not conform to ra-api-1.dtd"
> 
> 
> My first question is: what's going wrong here?  Am I using ocf-tester 
> incorrectly, or is it a bug?
> 

As is pretty clear from error messages, ocf-tester calls xmllint which
is missing.

> My second question is: how can I debug what caused pacemaker to decide

[ClusterLabs] ocf-tester always claims failure, even with built-in resource agents?

2021-03-26 Thread Antony Stone

Hi.

I've just signed up to the list.  I've been using corosync and pacemaker for 
several years, mostly under Debian 9, which means:

corosync 2.4.2
pacemaker 1.1.16

I've recently upgraded a test cluster to Debian 10, which gives me:

corosync 3.0.1
pacemaker 2.0.1

I've made a few adjustments to my /etc/corosync/corosync.conf configuration so 
that corosync seems happy, and also some minor changes (mostly to the cluster 
defaults) in /etc/corosync/cluster.cib so that pacemaker is happy.

So far all is well and good, my cluster synchronises, starts the resources, 
and everything's working as expected.  It'll move the resources from one 
cluster member to another (either if I ask it to, or if there's a problem), 
and it seems to work just as the older version did.

Then, several times a day, I get resource failures such as:

* Asterisk_start_0 on castor 'insufficient privileges' (4):
 call=58,
 status=complete,
 exitreason='',
 last-rc-change='Fri Mar 26 13:37:08 2021',
 queued=0ms,
 exec=55ms

I have no idea why the machine might tell me it cannot start Asterisk due to 
insufficient privilege when it's already been able to run it before the cluster 
resources moved back to this machine.  Asterisk *can* and *does* run on this 
machine.

Another error I get is:

* Kann-Bear_monitor_5000 on helen 'unknown error' (1):
 call=62,
 status=complete,
 exitreason='',
 last-rc-change='Fri Mar 26 14:23:05 2021',
 queued=0ms,
 exec=0ms

Now, that second resource is one which doesn't have a standard resource agent 
available for it under /usr/lib/ocf/resource.d, so I'm using the general-
purpose agent /usr/lib/ocf/resource.d/heartbeat/anything to manage it.

I thought, "perhaps there's something dodgy about using this 'anything' agent, 
because it can't really know about the resource it's managing", so I tested it 
with ocf-tester:

# ocf-tester -n Kann-Bear -o binfile="/usr/sbin/bearerbox" -o 
cmdline_options="/etc/kannel/kannel.conf" -o 
pidfile="/var/run/kannel/kannel_bearerbox.pid" 
/usr/lib/ocf/resource.d/heartbeat/anything
Beginning tests for /usr/lib/ocf/resource.d/heartbeat/anything...
/usr/sbin/ocf-tester: 226: /usr/sbin/ocf-tester: xmllint: not found
* rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
* Your agent does not support the reload action (optional)
Tests failed: /usr/lib/ocf/resource.d/heartbeat/anything failed 1 tests

Okay, something's not right.

BUT, it doesn't matter *which* resource agent I test, it tells me the same 
thing every time, including for the built-in standard agents:

* rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd

For example:

# ocf-tester -n Asterisk /usr/lib/ocf/resource.d/heartbeat/asterisk
Beginning tests for /usr/lib/ocf/resource.d/heartbeat/asterisk...
/usr/sbin/ocf-tester: 226: /usr/sbin/ocf-tester: xmllint: not found
* rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
* Your agent does not support the reload action (optional)
Tests failed: /usr/lib/ocf/resource.d/heartbeat/asterisk failed 1 tests


# ocf-tester -n IP-Float4 -o ip=10.1.0.42 -o cidr_netmask=28 
/usr/lib/ocf/resource.d/heartbeat/IPaddr2
Beginning tests for /usr/lib/ocf/resource.d/heartbeat/IPaddr2...
/usr/sbin/ocf-tester: 226: /usr/sbin/ocf-tester: xmllint: not found
* rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
* Your agent does not support the reload action (optional)
Tests failed: /usr/lib/ocf/resource.d/heartbeat/IPaddr2 failed 1 tests


So, it seems to be telling me that even the standard built-in resource agents 
"produce meta-data which does not conform to ra-api-1.dtd"


My first question is: what's going wrong here?  Am I using ocf-tester 
incorrectly, or is it a bug?

My second question is: how can I debug what caused pacemaker to decide that it 
couldn't run Asterisk due to "insufficient privileges" on a machine which is 
perfectly well capacble of running Asterisk, and including when it gets 
started by pacemaker (in fact, that's the only way Asterisk gets started on 
these machines; it's a floating resource which pacemaker is in charge of).


Please let

Re: [ClusterLabs] ocf-tester always claims failure, even with built-in resource agents?

Re: [ClusterLabs] ocf-tester always claims failure, even with built-in resource agents?

Re: [ClusterLabs] ocf-tester always claims failure, even with built-in resource agents?

Re: [ClusterLabs] ocf-tester always claims failure, even with built-in resource agents?

Re: [ClusterLabs] ocf-tester always claims failure, even with built-in resource agents?

[ClusterLabs] ocf-tester always claims failure, even with built-in resource agents?

6 matches

Site Navigation

Mail list logo

Footer information