Re: [Pacemaker] crmd restart due to internal error - pacemaker 1.1.8

2013-05-09 Thread pavan tc
On Fri, May 10, 2013 at 6:21 AM, Andrew Beekhof and...@beekhof.net wrote:


 On 08/05/2013, at 9:16 PM, pavan tc pavan...@gmail.com wrote:


Hi Andrew,

Thanks much for looking into this. I have some queries inline.


  Hi,
 
  I have a two-node cluster with STONITH disabled.

 Thats not a good idea.


Ok. I'll try and configure stonith.

 I am still running with the pcmk plugin as opposed to the recommended
 CMAN plugin.

 On rhel6?


Yes.



 
  With 1.1.8, I see some messages (appended to this mail) once in a while.
 I do not understand some keywords here - There is a Leave action. I am
 not sure what that is.

 It means the cluster is not going to change the state of the resource.


Why did the cluster execute the Leave action at this point? Is there some
other error that triggers this? Or is it a benign message?


  And, there is a CIB update failure that leads to a RECOVER action. There
 is a message that says the RECOVER action is not supported. Finally this
 leads to a stop and start of my resource.

 Well, and also Pacemaker's crmd process.
 My guess... the node is overloaded which is causing the cib queries to
 time out.


Is there a cib query timeout value that I can set? I was earlier getting
the TOTEM timeout.
So, I set the token to a larger value (5 seconds) in corosync.conf and
things were much better.
But now, I have started hitting this problem.

Thanks,
Pavan

 I can copy the crm configure show output, but nothing special there.
 
  Thanks much.
  Pavan
 
  PS: The resource vha-bcd94724-3ec0-4a8d-8951-9d27be3a6acb is stale. The
 underlying device that represents this resource has been removed. However,
 the resource is still part of the CIB. All errors related to that resource
 can be ignored. But can this cause a node to be stopped/fenced?

 Not if fencing is disabled.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crmd restart due to internal error - pacemaker 1.1.8

2013-05-09 Thread pavan tc

 Is there a cib query timeout value that I can set? I was earlier getting
 the TOTEM timeout.
 So, I set the token to a larger value (5 seconds) in corosync.conf and
 things were much better.
 But now, I have started hitting this problem.


I'll experiment with the cibadmin -t (--timeout) option to see if it helps.
As I can see from the code, the default seems to be 30 ms.
Is there a widely used default for systems with a high load or is it found
out the hard way for each setup?

Pavan
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crmd restart due to internal error - pacemaker 1.1.8

2013-05-09 Thread pavan tc


 I'll experiment with the cibadmin -t (--timeout) option to see if it helps.
 As I can see from the code, the default seems to be 30 ms.
 Is there a widely used default for systems with a high load or is it found
 out the hard way for each setup?


Easier said than done. Can someone help with how to use the --timeout
option in cibadmin?

Pavan
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-20 Thread pavan tc

 Another user hit the same issue and was able to reproduce.
 You can see the resolution at
 https://bugzilla.redhat.com/show_bug.cgi?id=951340


Thanks much for letting me know. I will watch the Fixed in version field
and upgrade as necessary.

Pavan
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-18 Thread pavan tc
Yes, but looking at the code it should be impossible.

 Would it be possible for you to add:

 export PCMK_trace_functions=peer_update_callback

 to /etc/sysconfig/pacemaker and re-test (and send me the new logs -
 probably in /var/log/pacemaker.log)?


Sorry about the delay.

I have put these in place and am running tests now. The next time I hit
this, I'll post the messages.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-16 Thread pavan tc
On Fri, Apr 12, 2013 at 9:27 AM, pavan tc pavan...@gmail.com wrote:

  Absolutely none in the syslog. Only the regular monitor logs from my
 resource agent which continued to report as secondary.


 This is very strange, because the thing that caused the I_PE_CALC is a
 timer that goes off every 15 minutes.
 Which would seem to imply that there was a transition of some kind about
 when the failure happened - but somehow it didnt go into the logs.

 Could you post the complete logs from 14:00 to 14:30?


 Sure. Here goes. Attached are two logs and corosync.conf -
 1. syslog (Edited, messages from other modules removed. I have not touched
 the pacemaker/corosync related messages)
 2 corosync.log (Unedited)
 3 corosync.conf

 Wanted to mention a couple of things:
 -- 14:06 is when the system was coming back up from a reboot. I have
 started from the earliest message during boot to the point the I_PE_CALC
 timer popped and a promote was called.
 -- I see the following during boot up. Does that mean pacemaker did not
 start?
 Apr 10 14:06:26 corosync [pcmk  ] info: process_ais_conf: Enabling MCP
 mode: Use the Pacemaker init script to complete Pacemaker startup

 Could that contribute to any of this behaviour?

 I'll be glad to provide any other information.


Did anybody get a chance to look at the information attached in the
previous email?

Thanks,
Pavan



 Pavan


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-11 Thread pavan tc
Hi Andrew,

Thanks much for looking at this.


 Then (after about 15 minutes), I see the following:

 There were no logs at all in between?


Absolutely none in the syslog. Only the regular monitor logs from my
resource agent which continued to report as secondary.
I also checked /var/log/cluster/corosync.log. The only difference between
this and the ones in syslog are the messages below:

From /var/log/cluster/corosync.log:
---
Apr 10 14:12:38 [3391] vsanqa4   crmd:   notice:
ais_dispatch_message:  Membership 166060: quorum lost
Apr 10 14:12:38 [3386] vsanqa4cib:   notice:
crm_update_peer_state: crm_update_ais_node: Node vsanqa3[1950617772] -
state is now lost
Apr 10 14:12:38 [3391] vsanqa4   crmd:   notice:
crm_update_peer_state: crm_update_ais_node: Node vsanqa3[1950617772] -
state is now lost
Apr 10 14:12:38 [3391] vsanqa4   crmd: info:
peer_update_callback:  vsanqa3 is now lost (was member)
Apr 10 14:12:38 corosync [CPG   ] chosen downlist: sender r(0)
ip(172.16.68.117) ; members(old:2 left:1)
Apr 10 14:12:38 corosync [MAIN  ] Completed service synchronization, ready
to provide service.

Apr 10 14:12:38 [3386] vsanqa4cib: info:
cib_process_request:   Operation complete: op cib_modify for section
nodes (origin=local/crmd/62, version=0.668.12): OK (rc=0)
Apr 10 14:12:38 [3386] vsanqa4cib: info:
cib_process_request:   Operation complete: op cib_modify for section
cib (origin=local/crmd/64, version=0.668.14): OK (rc=0)
Apr 10 14:12:38 [3391] vsanqa4   crmd: info:
crmd_ais_dispatch: Setting expected votes to 2
Apr 10 14:12:38 [3386] vsanqa4cib: info:
cib_process_request:   Operation complete: op cib_modify for section
crm_config (origin=local/crmd/66, version=0.668.15): OK (rc=0)

The first six out of the 10 messages above were seen on syslog too. Adding
them here for context. The last four are the extra messages in
corosync.log

Pavan


 
  Apr 10 14:26:46 vsanqa4 crmd[3391]:   notice: do_state_transition: State
 transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
 origin=crm_timer_popped ]
  Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: unpack_config: On loss
 of CCM Quorum: Ignore
  Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: LogActions: Promote
 vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e:0#011(Slave - Master vsanqa4)
  Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: process_pe_message:
 Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-392.bz2
 
  Thanks,
  Pavan
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-10 Thread pavan tc
Hi,

[I did go through the mail thread titled: RHEL6 and clones: CMAN needed
anyway?, but was not sure about some answers there]

I recently moved from pacemaker 1.1.7 to 1.1.8-7 on centos 6.2. I see the
following in syslog:

corosync[2966]:   [pcmk  ] ERROR: process_ais_conf: You have configured a
cluster using the Pacemaker plugin for Corosync. The plugin is not
supported in this environment and will be removed very soon.
corosync[2966]:   [pcmk  ] ERROR: process_ais_conf:  Please see Chapter 8
of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on
using Pacemaker with CMAN

Does this mean that my current configuration is incorrect and will not work
as it used to with pacemaker 1.1.7/Corosync?

I looked at the Clusters from Scratch instructions and it talks mostly
about GFS2. I don't have any filesystem requirements. In that case, can I
live with Pacemaker/Corosync?

I do understand that this config is not recommended, but the reason I ask
is because I am hitting a weird problem with this setup which I will
explain below. Just want to make sure that I don't start off with an
erroneous setup.

I have a two-node multi-state resource configured with the following config:

[root@vsanqa4 ~]# crm configure show
node vsanqa3
node vsanqa4
primitive vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e
ocf:heartbeat:vgc-cm-agent.ocf \
params cluster_uuid=6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \
op monitor interval=30s role=Master timeout=100s \
op monitor interval=31s role=Slave timeout=100s
ms ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e
vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \
meta clone-max=2 globally-unique=false target-role=Started
location ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e-nodes
ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \
rule $id=ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e-nodes-rule -inf:
#uname ne vsanqa4 and #uname ne vsanqa3
property $id=cib-bootstrap-options \
dc-version=1.1.8-7.el6-394e906 \
cluster-infrastructure=classic openais (with plugin) \
expected-quorum-votes=2 \
stonith-enabled=false \
no-quorum-policy=ignore
rsc_defaults $id=rsc-options \
resource-stickiness=100

With this config, if I simulate a crash on the master with echo c 
/proc/sysrq-trigger, the slave does not get promoted for about 15 minutes.
It does detect the peer going down, but does not seem to issue the promote
immediately:

Apr 10 14:12:32 vsanqa4 corosync[2966]:   [TOTEM ] A processor failed,
forming new configuration.
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] notice:
pcmk_peer_update: Transitional membership event on ring 166060: memb=1,
new=0, lost=1
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: pcmk_peer_update:
memb: vsanqa4 1967394988
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: pcmk_peer_update:
lost: vsanqa3 1950617772
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] notice:
pcmk_peer_update: Stable membership event on ring 166060: memb=1, new=0,
lost=0
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: pcmk_peer_update:
MEMB: vsanqa4 1967394988
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info:
ais_mark_unseen_peer_dead: Node vsanqa3 was not seen in the previous
transition
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info: update_member:
Node 1950617772/vsanqa3 is now: lost
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [pcmk  ] info:
send_member_notification: Sending membership update 166060 to 2 children
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Apr 10 14:12:38 vsanqa4 cib[3386]:   notice: ais_dispatch_message:
Membership 166060: quorum lost
Apr 10 14:12:38 vsanqa4 crmd[3391]:   notice: ais_dispatch_message:
Membership 166060: quorum lost
Apr 10 14:12:38 vsanqa4 cib[3386]:   notice: crm_update_peer_state:
crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost
Apr 10 14:12:38 vsanqa4 crmd[3391]:   notice: crm_update_peer_state:
crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [CPG   ] chosen downlist: sender
r(0) ip(172.16.68.117) ; members(old:2 left:1)
Apr 10 14:12:38 vsanqa4 corosync[2966]:   [MAIN  ] Completed service
synchronization, ready to provide service.

Then (after about 15 minutes), I see the following:

Apr 10 14:26:46 vsanqa4 crmd[3391]:   notice: do_state_transition: State
transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
origin=crm_timer_popped ]
Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: unpack_config: On loss of
CCM Quorum: Ignore
Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: LogActions: Promote
vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e:0#011(Slave - Master vsanqa4)
Apr 10 14:26:46 vsanqa4 pengine[3390]:   notice: process_pe_message:
Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-392.bz2

Thanks,
Pavan
___
Pacemaker 

[Pacemaker] CentOS 6.2 and pacemaker versions

2013-02-21 Thread pavan tc
Hi,

I have installed pacemaker/corosync from the standard yum repositories on
my CentOS 6.2 box.
What I get is the following:

pacemaker-cli-1.1.7-6.el6.x86_64
pacemaker-cluster-libs-1.1.7-6.el6.x86_64
pacemaker-libs-1.1.7-6.el6.x86_64
pacemaker-1.1.7-6.el6.x86_64
corosynclib-1.4.1-7.el6_3.1.x86_64
corosync-1.4.1-7.el6_3.1.x86_64

In one of my earlier queries to this list, I was advised against using
pacemaker version 1.1.7.
But if I try to move to pacemaker 1.1.8, it has a dependency on glibc-2.14,
whereas the default
glibc shipped with CentOS 6.2 is glibc-2.12, and I'd prefer to stick with
it.

Is it possible for me to move to later versions of pacemaker in some way?

Thanks,
Pavan
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker stop behaviour when underlying resource is unavailable

2012-12-17 Thread pavan tc
[..]

 The idea is to make sure that stop does not fail when the underlying

  resource goes away.
  (Otherwise I see that the resource gets to an unmanaged state)
  Also, the expectation is that when the resource comes back, it joins the
  cluster without much fuss.
 
  What I see is that pacemaker calls stop twice

 That would not be expected. Bug?


Are you pointing at stop getting called 'twice'? If yes, I will confirm
once more about
the behaviour and will raise a bug.



  and if it finds that stop
  returns success,
  it does not continue with monitor any more. I also do not see an attempt
 to
  start.

 Anywhere?  Or just on the same node?


On the same node. The resource does get promoted on the other node.
My expectation was that if I kept returning OCF_NOT_RUNNING in monitor,
then it should attempt a start-stop-monitor cycle till the resource came
back.
It seems this is not what the cluster manager does?


  Is there a way to keep the monitor going in such circumstances?

 Not really. You can define a recurring monitor for the Stopped role though.


I did not want to go there if I could achieve it via the usual mechanisms.
If that is not, possible, I will explore this option in more detail.

But why would it come back?  You _really_ should not be starting
 services outside of the cluster - not least of all because we've
 probably started it somewhere else in the meantime.


Even if we started the resource elsewhere, we are running in degraded mode.
(My bad, I did not mention this is a _two-node_ multi-state resource).
We would like to come back to the available mode as early as possible and
with the least amount of manual intervention with the cluster.

Pavan


  Am I using incorrect resource agent return codes?
 
  Thanks,
  Pavan
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Pacemaker stop behaviour when underlying resource is unavailable

2012-12-14 Thread pavan tc
Hi,

I have structured my multi-state resource agent as below when the
underlying resource becomes unavailable for some reason:

monitor()
{
state=get_primitive_resource_state()

...
...
if ($state == unavailable)
   return $OCF_NOT_RUNNING

...
...
}

stop()
{
monitor()
ret=$?

if (ret == $OCF_NOT_RUNNING)
   return $OCF_SUCCESS
}

start()
{
start_primitive()
if (start_primitive_failure)
return OCF_ERR_GENERIC
}

The idea is to make sure that stop does not fail when the underlying
resource goes away.
(Otherwise I see that the resource gets to an unmanaged state)
Also, the expectation is that when the resource comes back, it joins the
cluster without much fuss.

What I see is that pacemaker calls stop twice and if it finds that stop
returns success,
it does not continue with monitor any more. I also do not see an attempt to
start.

Is there a way to keep the monitor going in such circumstances?
Am I using incorrect resource agent return codes?

Thanks,
Pavan
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Moving multi-state resources

2012-12-12 Thread pavan tc
Hi,

My requirement was to do some administration on one of the nodes where a
2-node multi-state resource was running.
To effect a resource instance stoppage on one of the nodes, I added a
resource constraint as below:

crm configure location ms_stop_res_on_node ms_resource rule -inf: \#uname
eq `hostname`

The resource cleanly moved over to the other node. Incidentally, the
resource was the master on this node
and was successfully moved to a master state on the other node too.
Now, I want to bring the resource back onto the original node.

But the above resource constraint seems to have a persistent behaviour.
crm resource unmigrate ms_resource does not seem to undo the effects of
the constraint addition.

I think the location constraint is preventing the resource from starting on
the original node.
How do I delete this location constraint now?

Is there a more standard way of doing such administrative tasks? The
requirement is that I do not want to offline the
entire node while doing the administration but rather would want to stop
only the resource instance, do the admin work
and restart the resource instance on the node.

Thanks,
Pavan
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Moving multi-state resources

2012-12-12 Thread pavan tc
On Wed, Dec 12, 2012 at 6:46 PM, Dejan Muhamedagic deja...@fastmail.fmwrote:

 Hi,

 On Wed, Dec 12, 2012 at 03:50:01PM +0530, pavan tc wrote:
  Hi,
 
  My requirement was to do some administration on one of the nodes where a
  2-node multi-state resource was running.
  To effect a resource instance stoppage on one of the nodes, I added a
  resource constraint as below:
 
  crm configure location ms_stop_res_on_node ms_resource rule -inf:
 \#uname
  eq `hostname`
 
  The resource cleanly moved over to the other node. Incidentally, the
  resource was the master on this node
  and was successfully moved to a master state on the other node too.
  Now, I want to bring the resource back onto the original node.
 
  But the above resource constraint seems to have a persistent behaviour.
  crm resource unmigrate ms_resource does not seem to undo the effects of
  the constraint addition.

 You can try to remove your constraint:

 crm configure delete ms_stop_res_on_node


That did the job. Thanks a ton!

Pavan


 migrate/unmigrate generate/remove special constraints.

 Thanks,

 Dejan

 
  I think the location constraint is preventing the resource from starting
 on
  the original node.
  How do I delete this location constraint now?
 
  Is there a more standard way of doing such administrative tasks? The
  requirement is that I do not want to offline the
  entire node while doing the administration but rather would want to stop
  only the resource instance, do the admin work
  and restart the resource instance on the node.
 
  Thanks,
  Pavan

  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Listing resources by attributes

2012-12-12 Thread pavan tc
Hi,

Is there a way in which resources can be listed based on some attributes?
For example, listing resource running on a certain node, or listing ms
resources.

The crm_resource manpage talks about the -N and -t options that seem to
address the requirements above.
But they do not provide the expected result.
crm_resource --list or crm_resource --list-raw give the same output
immaterial of whether it was provided with -N or -t.

I had to do the following to pull out 'ms' resources, for example:
crm configure show | grep -w ^ms | awk '{print $2}'

Is there a cleaner way to list resources?

Thanks,
Pavan
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Nodes OFFLINE with not in our membership messages

2012-12-06 Thread pavan tc
On Thu, Dec 6, 2012 at 5:21 PM, Nikita Michalko
michalko.sys...@a-i-p.comwrote:

 Hi,

 did you already  try to google on:
 not in our membership ?


Not sure which part you were addressing.
I mean, I did not pluck the github link out of thin air ;)

And if it is the lack of information in my email that you are talking about,
I think I pointed at the fix for the issue and hence I presume the
conditions under which it happens are known
and sending the details about the same from my setup is a little
superfluous.

What I wanted to know was if there is a bug ID that describes the problem
and the fix, and if I could address
this issue by staying on Pacemaker 1.1.7.

Thanks,
Pavan

E.g. :
 http://lists.linux-ha.org/pipermail/linux-ha/2007-February/023469.html

 Nikita Michalko


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Difference between crm resource and crm_resource

2012-12-05 Thread pavan tc
Hi,

Can someone please explain how the commands -

crm resource stop resource name

and

crm_resource --resource resource name --set-parameter target-role --meta
--parameter-value Stopped

are different?

Also, I see that crm has a -w option (which gives synchronous behaviour
to the command)
Is there something similar for crm_resource?

Thanks,
Pavan
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Difference between crm resource and crm_resource

2012-12-05 Thread pavan tc

 They are not. crm shell just provides a more coherent wrapper around
 the various commands.

  Also, I see that crm has a -w option (which gives synchronous behaviour
  to the command)
  Is there something similar for crm_resource?

 No. crm shell then watches the DC until the transition triggered by the
 change has completed. crm_resource just modifies the configuration.


Thanks much.

Pavan




 Regards,
 Lars

 --
 Architect Storage/HA
 SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix
 Imendörffer, HRB 21284 (AG Nürnberg)
 Experience is the name everyone gives to their mistakes. -- Oscar Wilde


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Nodes OFFLINE with not in our membership messages

2012-12-05 Thread pavan tc
Hi,

I have now hit this issue twice in my setup.
I see the following github commit addressing this issue:
https://github.com/ClusterLabs/pacemaker/commit/03f6105592281901cc10550b8ad19af4beb5f72f

From the patch, it appears there is an incorrect conclusion about the
status of the membership of nodes.
Is there a root cause analysis of this issue that I can read through?
I am currently using 1.1.7. Would the suggestion be to move to 1.1.8, or is
there a workaround?
(I have already done a good deal of testing with 1.1.7, and would like to
live with it if possible)

Thanks,
Pavan
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org