Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-02-17 Thread Andrew Beekhof

On 22 Jan 2014, at 10:54 am, Brian J. Murrell (brian) br...@interlinx.bc.ca 
wrote:

 On Thu, 2014-01-16 at 14:49 +1100, Andrew Beekhof wrote:
 
 What crm_mon are you looking at?
 I see stuff like:
 
 virt-fencing (stonith:fence_xvm):Started rhos4-node3 
 Resource Group: mysql-group
 mysql-vip(ocf::heartbeat:IPaddr2):   Started rhos4-node3 
 mysql-fs (ocf::heartbeat:Filesystem):Started rhos4-node3 
 mysql-db (ocf::heartbeat:mysql): Started rhos4-node3 
 
 Yes, you are right.  I couldn't see the forest for the trees.
 
 I initially was optimistic about crm_mon being more truthful than
 crm_resource but it turns out it is not.

It can't be, they're both obtaining their data from the same place (the cib).

 
 Take for example these commands to set a constraint and start a resource
 (which has already been defined at this point):
 
 [21/Jan/2014:13:46:40] cibadmin -o constraints -C -X 'rsc_location 
 id=res1-primary node=node5 rsc=res1 score=20/'
 [21/Jan/2014:13:46:41] cibadmin -o constraints -C -X 'rsc_location 
 id=res1-secondary node=node6 rsc=res1 score=10/'
 [21/Jan/2014:13:46:42] crm_resource -r 'res1' -p target-role -m -v 'Started'
 
 and then these repeated calls to crm_mon -1 on node5:
 
 [21/Jan/2014:13:46:42] crm_mon -1
 Last updated: Tue Jan 21 13:46:42 2014
 Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5
 Stack: openais
 Current DC: node5 - partition with quorum
 Version: 1.1.10-14.el6_5.1-368c726
 2 Nodes configured
 2 Resources configured
 
 
 Online: [ node5 node6 ]
 
 st-fencing(stonith:fence_product):Started node5 
 res1  (ocf::product:Target):  Started node6 
 
 [21/Jan/2014:13:46:42] crm_mon -1
 Last updated: Tue Jan 21 13:46:42 2014
 Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5
 Stack: openais
 Current DC: node5 - partition with quorum
 Version: 1.1.10-14.el6_5.1-368c726
 2 Nodes configured
 2 Resources configured
 
 
 Online: [ node5 node6 ]
 
 st-fencing(stonith:fence_product):Started node5 
 res1  (ocf::product:Target):  Started node6 
 
 [21/Jan/2014:13:46:49] crm_mon -1 -r
 Last updated: Tue Jan 21 13:46:49 2014
 Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5
 Stack: openais
 Current DC: node5 - partition with quorum
 Version: 1.1.10-14.el6_5.1-368c726
 2 Nodes configured
 2 Resources configured
 
 
 Online: [ node5 node6 ]
 
 Full list of resources:
 
 st-fencing(stonith:fence_product):Started node5 
 res1  (ocf::product:Target):  Started node5 
 
 The first two are not correct, showing the resource started on node6
 when it was actually started on node5.

Was it running there to begin with?
Answering my own question... yes. It was:

 Jan 21 13:46:41 node5 crmd[8695]:  warning: status_from_rc: Action 6 
 (res1_monitor_0) on node6 failed (target: 7 vs. rc: 0): Error

and then we try to stop it:

 Jan 21 13:46:41 node5 crmd[8695]:   notice: te_rsc_command: Initiating action 
 7: stop res1_stop_0 on node6


So you are correct that something is wrong, but it isn't pacemaker.


  Finally, 7 seconds later, it is
 reporting correctly.  The logs on node{5,6} bear this out.  The resource
 was actually only ever started on node5 and never on node6.

Wrong.



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-15 Thread Brian J. Murrell (brian)
On Wed, 2014-01-15 at 17:11 +1100, Andrew Beekhof wrote:
 
 Consider any long running action, such as starting a database.
 We do not update the CIB until after actions have completed, so there can and 
 will be times when the status section is out of date to one degree or another.

But that is the opposite of what I am reporting and is acceptable.  It's
acceptable for a resource that is in the process of starting being
reported as stopped, because it's not yet started.

What I am seeing is resources being reported as stopped when they are in
fact started/running and have been for a long time.

 At node startup is another point at which the status could potentially be 
 behind.

Right.  Which is the case I am talking about.

 It sounds to me like you're trying to second guess the cluster, which is a 
 dangerous path.

No, not trying to second guess at all.  I'm just trying to ask the
cluster what the state is and not getting the truth.  I am willing to
believe whatever state the cluster says it's in as long as what I am
getting is the truth.

 What if its the first node to start up?

I'd think a timeout comes in to play here.

 There'd be no fresh copy to arrive in that case.

I can't say that I know how the CIB works internally/entirely, but I'd
imagine that when a cluster node starts up it tries to see if there is a
more fresh CIB out there in the cluster.  Maybe this is part of the
process of choosing/discovering a DC.  But ultimately if the node is the
first one up, it will eventually figure that out so that it can nominate
itself as the DC.  Or it finds out that there is a DC already (and gets
a fresh CIB from it?).  It's during that window that I propose that
crm_resource should not be asserting anything and should just admit that
it does not (yet) know.

 If it had enough information to know it was out of date, it wouldn't be out 
 of date.

But surely it understands if it is in the process of joining a cluster
or not, and therefore does know enough to know that it doesn't know if
it's out of date or not.  But that it could be.

 As above, there are situations when you'd never get an answer.

I should have added to my proposal or has determined that there is
nothing to refresh it's CIB from and that it's local copy is
authoritative for the whole cluster.

b.




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-15 Thread Andrew Beekhof

On 16 Jan 2014, at 6:53 am, Brian J. Murrell (brian) br...@interlinx.bc.ca 
wrote:

 On Wed, 2014-01-15 at 17:11 +1100, Andrew Beekhof wrote:
 
 Consider any long running action, such as starting a database.
 We do not update the CIB until after actions have completed, so there can 
 and will be times when the status section is out of date to one degree or 
 another.
 
 But that is the opposite of what I am reporting

I know, I was giving you another example of when the cib is not completely 
up-to-date with reality.

 and is acceptable.  It's
 acceptable for a resource that is in the process of starting being
 reported as stopped, because it's not yet started.

It may very well be partially started.  Its almost certainly not stopped which 
is what is being reported.

 
 What I am seeing is resources being reported as stopped when they are in
 fact started/running and have been for a long time.
 
 At node startup is another point at which the status could potentially be 
 behind.
 
 Right.  Which is the case I am talking about.
 
 It sounds to me like you're trying to second guess the cluster, which is a 
 dangerous path.
 
 No, not trying to second guess at all.

You're not using the output to decide whether to perform some logic?
Because crm_mon is the more usual command to run right after startup (which 
would give you enough context to know things are still syncing).

  I'm just trying to ask the
 cluster what the state is and not getting the truth.  I am willing to
 believe whatever state the cluster says it's in as long as what I am
 getting is the truth.
 
 What if its the first node to start up?
 
 I'd think a timeout comes in to play here.
 
 There'd be no fresh copy to arrive in that case.
 
 I can't say that I know how the CIB works internally/entirely, but I'd
 imagine that when a cluster node starts up it tries to see if there is a
 more fresh CIB out there in the cluster.

Nope.

  Maybe this is part of the
 process of choosing/discovering a DC.

DC election happens at the crmd.  The cib is a dumb repository of name/value 
pairs.
It doesn't even understand new vs. old - only different. 

  But ultimately if the node is the
 first one up, it will eventually figure that out so that it can nominate
 itself as the DC.  Or it finds out that there is a DC already (and gets
 a fresh CIB from it?).  It's during that window that I propose that
 crm_resource should not be asserting anything and should just admit that
 it does not (yet) know.
 
 If it had enough information to know it was out of date, it wouldn't be out 
 of date.
 
 But surely it understands if it is in the process of joining a cluster
 or not, and therefore does know enough to know that it doesn't know if
 it's out of date or not.

And if it has a newer config compared to the existing nodes?

  But that it could be.
 
 As above, there are situations when you'd never get an answer.
 
 I should have added to my proposal or has determined that there is
 nothing to refresh it's CIB from and that it's local copy is
 authoritative for the whole cluster.
 
 b.
 
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-15 Thread Brian J. Murrell (brian)
On Thu, 2014-01-16 at 08:35 +1100, Andrew Beekhof wrote:
 
 I know, I was giving you another example of when the cib is not completely 
 up-to-date with reality.

Yeah, I understood that.  I was just countering with why that example is
actually more acceptable.

 It may very well be partially started.

Sure.

 Its almost certainly not stopped which is what is being reported.

Right.  But until it is completely started (and ready to do whatever
it's supposed to do), it might as well be considered stopped.  If you
have to make a binary state out of stopped, starting, started, I think
most people will agree that the states are stopped and starting and
stopped is anything  starting since most things are not useful until
they are fully started.

 You're not using the output to decide whether to perform some logic?

Nope.  Just reporting the state.  But that's difficult when you have two
participants making positive assertions about state when one is not
really in a position to do so.

 Because crm_mon is the more usual command to run right after startup

The problem with crm_mon is that it doesn't tell you where a resource is
running.

  (which would give you enough context to know things are still syncing).

That's interesting.  Would polling crm_mon be more efficient than
polling the remote CIB with cibadmin -Q?

 DC election happens at the crmd.

So would it be fair to say then that I should not trust the local CIB
until DC election has finished or could there be latency between that
completing and the CIB being refreshed?

If DC election completion is accurate, what's the best way to determine
that has completed?

b.




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-15 Thread Andrew Beekhof

On 16 Jan 2014, at 1:13 pm, Brian J. Murrell (brian) br...@interlinx.bc.ca 
wrote:

 On Thu, 2014-01-16 at 08:35 +1100, Andrew Beekhof wrote:
 
 I know, I was giving you another example of when the cib is not completely 
 up-to-date with reality.
 
 Yeah, I understood that.  I was just countering with why that example is
 actually more acceptable.
 
 It may very well be partially started.
 
 Sure.
 
 Its almost certainly not stopped which is what is being reported.
 
 Right.  But until it is completely started (and ready to do whatever
 it's supposed to do), it might as well be considered stopped.  If you
 have to make a binary state out of stopped, starting, started, I think
 most people will agree that the states are stopped and starting and
 stopped is anything  starting since most things are not useful until
 they are fully started.
 
 You're not using the output to decide whether to perform some logic?
 
 Nope.  Just reporting the state.  But that's difficult when you have two
 participants making positive assertions about state when one is not
 really in a position to do so.
 
 Because crm_mon is the more usual command to run right after startup
 
 The problem with crm_mon is that it doesn't tell you where a resource is
 running.

What crm_mon are you looking at?
I see stuff like:

 virt-fencing   (stonith:fence_xvm):Started rhos4-node3 
 Resource Group: mysql-group
 mysql-vip  (ocf::heartbeat:IPaddr2):   Started rhos4-node3 
 mysql-fs   (ocf::heartbeat:Filesystem):Started rhos4-node3 
 mysql-db   (ocf::heartbeat:mysql): Started rhos4-node3 


 
 (which would give you enough context to know things are still syncing).
 
 That's interesting.  Would polling crm_mon be more efficient than
 polling the remote CIB with cibadmin -Q?

crm_mon in interactive mode subscribes to updates from the cib.
which would be more efficient than repeatedly calling cibadmin or crm_mon 

 
 DC election happens at the crmd.
 
 So would it be fair to say then that I should not trust the local CIB
 until DC election has finished or could there be latency between that
 completing and the CIB being refreshed?

After the join completes (which happens after the election or when a new node 
is found), then it is safe.
You can tell this by running crmadmin -S -H `uname -n` and looking for S_IDLE, 
S_POLICY_ENGINE or S_TRANSITION_ENGINE iirc

 
 If DC election completion is accurate, what's the best way to determine
 that has completed?

Ideally it doesn't happen when a node joins an existing cluster.



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-14 Thread Brian J. Murrell (brian)
On Tue, 2014-01-14 at 16:01 +1100, Andrew Beekhof wrote:
 
  On Tue, 2014-01-14 at 08:09 +1100, Andrew Beekhof wrote:
  
  The local cib hasn't caught up yet by the looks of it.

I should have asked in my previous message: is this entirely an artifact
of having just restarted or are there any other times where the local
CIB can in fact be out of date (and thus crm_resource is inaccurate), if
even for a brief period of time?  I just want to completely understand
the nature of this situation.

 It doesn't know that it doesn't know.

But it (pacemaker at least) does know that it's just started up, and
should also know whether it's gotten a fresh copy of the CIB since
starting up, right?  I think I'd consider it required behaviour that
pacemaker not consider itself authoritative enough to provide answers
like location until it has gotten a fresh copy of the CIB.

 Does it show anything as running?  Any nodes as online?


 I'd not expect that it stays in that situation for more than a second or 
 two...

You are probably right about that.  But unfortunately that second or two
provides a large enough window to provide mis-information.

 We could add an option to force crm_resource to use the master instance 
 instead of the local one I guess.

Or, depending on the answers to above (like can this local-is-not-true
situation every manifest itself at times other than just started)
perhaps just don't allow crm_resource (or any other tool) to provide
information from the local CIB until it's been refreshed at least once
since a startup.

I would much rather crm_resource experience some latency in being able
to provide answers than provide wrong ones.  Perhaps there needs to be a
switch to indicate if it should block waiting for the local CIB to be
up-to-date or should return immediately with an unknown type response
if the local CIB has not yet been updated since a start.

Cheers,
b.




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-14 Thread Andrew Beekhof

On 14 Jan 2014, at 11:50 pm, Brian J. Murrell (brian) br...@interlinx.bc.ca 
wrote:

 On Tue, 2014-01-14 at 16:01 +1100, Andrew Beekhof wrote:
 
 On Tue, 2014-01-14 at 08:09 +1100, Andrew Beekhof wrote:
 
 The local cib hasn't caught up yet by the looks of it.
 
 I should have asked in my previous message: is this entirely an artifact
 of having just restarted or are there any other times where the local
 CIB can in fact be out of date (and thus crm_resource is inaccurate), if
 even for a brief period of time?  I just want to completely understand
 the nature of this situation.

Consider any long running action, such as starting a database.
We do not update the CIB until after actions have completed, so there can and 
will be times when the status section is out of date to one degree or another.
At node startup is another point at which the status could potentially be 
behind.

It sounds to me like you're trying to second guess the cluster, which is a 
dangerous path.

 
 It doesn't know that it doesn't know.
 
 But it (pacemaker at least) does know that it's just started up, and
 should also know whether it's gotten a fresh copy of the CIB since
 starting up, right?  

What if its the first node to start up?  There'd be no fresh copy to arrive in 
that case.
Many things are obvious to external observers that are not at all obvious to 
the cluster.

If it had enough information to know it was out of date, it wouldn't be out of 
date.

 I think I'd consider it required behaviour that
 pacemaker not consider itself authoritative enough to provide answers
 like location until it has gotten a fresh copy of the CIB.
 
 Does it show anything as running?  Any nodes as online?
 
 
 I'd not expect that it stays in that situation for more than a second or 
 two...
 
 You are probably right about that.  But unfortunately that second or two
 provides a large enough window to provide mis-information.
 
 We could add an option to force crm_resource to use the master instance 
 instead of the local one I guess.
 
 Or, depending on the answers to above (like can this local-is-not-true
 situation every manifest itself at times other than just started)
 perhaps just don't allow crm_resource (or any other tool) to provide
 information from the local CIB until it's been refreshed at least once
 since a startup.

As above, there are situations when you'd never get an answer.

 
 I would much rather crm_resource experience some latency in being able
 to provide answers than provide wrong ones.  Perhaps there needs to be a
 switch to indicate if it should block waiting for the local CIB to be
 up-to-date or should return immediately with an unknown type response
 if the local CIB has not yet been updated since a start.
 
 Cheers,
 b.
 
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Brian J. Murrell (brian)
Hi,

I found a situation using pacemaker 1.1.10 on RHEL6.5 where the output
of crm_resource -L is not trust-able, shortly after a node is booted.

Here is the output from crm_resource -L on one of the nodes in a two
node cluster (the one that was not rebooted):

 st-fencing (stonith:fence_foo):Started 
 res1   (ocf::foo:Target):  Started 
 res2   (ocf::foo:Target):  Started 

Here is the output from the same command on the other node in the two
node cluster right after it was rebooted:

 st-fencing (stonith:fence_foo):Stopped 
 res1   (ocf::foo:Target):  Stopped 
 res2   (ocf::foo:Target):  Stopped 

These were collected at the same time (within the same second) on the
two nodes.

Clearly the rebooted node is not telling the truth.  Perhaps the truth
for it is I don't know, which would be fair enough but that's not what
pacemaker is asserting there.

So, how do I know (i.e. programmatically -- what command can I issue to
know) if and when crm_resource can be trusted to be truthful?

b.




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Andrew Beekhof

On 14 Jan 2014, at 5:13 am, Brian J. Murrell (brian) br...@interlinx.bc.ca 
wrote:

 Hi,
 
 I found a situation using pacemaker 1.1.10 on RHEL6.5 where the output
 of crm_resource -L is not trust-able, shortly after a node is booted.
 
 Here is the output from crm_resource -L on one of the nodes in a two
 node cluster (the one that was not rebooted):
 
 st-fencing(stonith:fence_foo):Started 
 res1  (ocf::foo:Target):  Started 
 res2  (ocf::foo:Target):  Started 
 
 Here is the output from the same command on the other node in the two
 node cluster right after it was rebooted:
 
 st-fencing(stonith:fence_foo):Stopped 
 res1  (ocf::foo:Target):  Stopped 
 res2  (ocf::foo:Target):  Stopped 
 
 These were collected at the same time (within the same second) on the
 two nodes.
 
 Clearly the rebooted node is not telling the truth.  Perhaps the truth
 for it is I don't know, which would be fair enough but that's not what
 pacemaker is asserting there.
 
 So, how do I know (i.e. programmatically -- what command can I issue to
 know) if and when crm_resource can be trusted to be truthful?

The local cib hasn't caught up yet by the looks of it.
You could compare 'cibadmin -Ql' with 'cibadmin -Q'

 
 b.
 
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Brian J. Murrell (brian)
On Tue, 2014-01-14 at 08:09 +1100, Andrew Beekhof wrote:
 
 The local cib hasn't caught up yet by the looks of it.

Should crm_resource actually be [mis-]reporting as if it were
knowledgeable when it's not though?  IOW is this expected behaviour or
should it be considered a bug?  Should I open a ticket?

 You could compare 'cibadmin -Ql' with 'cibadmin -Q'

Is there no other way to force crm_resource to be truthful/accurate or
silent if it cannot be truthful/accurate?  Having to run this kind of
pre-check before every crm_resource --locate seems like it's going to
drive overhead up quite a bit.

Maybe I am using the wrong tool for the job.  Is there a better tool
than crm_resource to ascertain, with full truthfullness (or silence if
truthfullness is not possible), where resources are running?

Cheers,
b.




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Andrew Beekhof

On 14 Jan 2014, at 3:41 pm, Brian J. Murrell (brian) br...@interlinx.bc.ca 
wrote:

 On Tue, 2014-01-14 at 08:09 +1100, Andrew Beekhof wrote:
 
 The local cib hasn't caught up yet by the looks of it.
 
 Should crm_resource actually be [mis-]reporting as if it were
 knowledgeable when it's not though?  IOW is this expected behaviour or
 should it be considered a bug?  Should I open a ticket?

It doesn't know that it doesn't know.
Does it show anything as running?  Any nodes as online?

I'd not expect that it stays in that situation for more than a second or two...

 
 You could compare 'cibadmin -Ql' with 'cibadmin -Q'
 
 Is there no other way to force crm_resource to be truthful/accurate or
 silent if it cannot be truthful/accurate?  Having to run this kind of
 pre-check before every crm_resource --locate seems like it's going to
 drive overhead up quite a bit.

True.

 
 Maybe I am using the wrong tool for the job.  Is there a better tool
 than crm_resource to ascertain, with full truthfullness (or silence if
 truthfullness is not possible), where resources are running?

We could add an option to force crm_resource to use the master instance instead 
of the local one I guess.


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org