[Pacemaker] How to serialize/control resource startup on Standby node

2011-12-28 Thread neha chatrath
Hello,

I have  cluster with 2 nodes with multiple Master/slave resources.
The ordering of resources on the master node is achieved using order option
of crm. When standby node started, the processes are started one after the
another.
Following is the configuration info:
p*rimitive ClusterIP ocf:mcg:MCG_VIPaddr_RA \
params ip="192.168.113.67" cidr_netmask="255.255.255.0"
nic="eth0:1" \
op monitor interval="40" timeout="20"
primitive Rmgr ocf:mcg:RM_RA \
op monitor interval="60" role="Master" timeout="30"
on-fail="restart" \
op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
primitive Tmgr ocf:mcg:TM_RA \
op monitor interval="60" role="Master" timeout="30"
on-fail="restart" \
op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
primitive pimd ocf:mcg:PIMD_RA \
op monitor interval="60" role="Master" timeout="30"
on-fail="restart" \
op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
ms ms_Rmgr Rmgr \
meta master-max="1" master-max-node="1" clone-max="2"
clone-node-max="1" notify="true"
ms ms_Tmgr Tmgr \
meta master-max="1" master-max-node="1" clone-max="2"
clone-node-max="1" notify="true"
ms ms_pimd pimd \
meta master-max="1" master-max-node="1" clone-max="2"
clone-node-max="1" notify="true"
colocation ip_with_Rmgr inf: ClusterIP ms_Rmgr:Master
colocation ip_with_Tmgr inf: ClusterIP ms_Tmgr:Master
colocation ip_with_pimd inf: ClusterIP ms_pimd:Master
order TM-after-RM inf: ms_Rmgr:promote ms_Tmgr:start
order ip-after-pimd inf: ms_pimd:promote ClusterIP:start
order pimd-after-TM inf: ms_Tmgr:promote ms_pimd:start
property $id="cib-bootstrap-options" \
dc-version="1.0.11-**db98485d06ed3fe0fe236509f023e1**bd4a5566f1" \
cluster-infrastructure="**Heartbeat" \
no-quorum-policy="ignore" \
stonith-enabled="false"
rsc_defaults $id="rsc-options" \
migration_threshold="3" \
resource-stickiness="100"

*I have a system requirement in which start of resource (e.g. pimd) is
dependent on successful start of  another resource (e.g. Tmgr)
Everything run smoothly on the master node. This is due to *ordering and
few seconds delay* untill a resource is promoted as Master.
But on the standby node since the resources are started one after the
another without any delay , Standby node in the cluster behaves erratically

Is there a way, through which I can serialize/control resource start up on
the standby node.

Thanks and regards
Neha Chatrath
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Remote CRM shell from LCMC

2011-12-28 Thread Lars Ellenberg
On Wed, Dec 28, 2011 at 12:57:33AM +0100, Rasto Levrinc wrote:
> Hi,
> 
> this being a slow news day, There is this great new feature in LCMC, but
> probably completely useless. :) The LCMC used to show for testing purposes
> the CRM shell configuration, but people started to use it, so I left it
> there, made it now editable and added a commit button, that commits the
> changes. You can see it as a hole in the bottom of the car, if you are stuck
> you can still power the car by your feet.
> 
> There are also some unexpected advantages over "crm configure edit", see
> the video.
> 
> http://youtu.be/X75wzUTRmjU?hd=1

Nice.

Sound is missing for me from 3:00 onwards.
Just in case that was not intentional...

Lars

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How should I configure the STONITH?

2011-12-28 Thread Andreas Kurz
Hello,

On 12/21/2011 03:34 AM, Qiu Zhigang wrote:
> Hi
> 
> 
>> -Original Message-
>> From: Andreas Kurz [mailto:andr...@hastexo.com]
>> Sent: Wednesday, December 21, 2011 6:53 AM
>> To: pacemaker@oss.clusterlabs.org
>> Subject: Re: [Pacemaker] How should I configure the STONITH?
>>
>> Hello,
>>
>> On 12/20/2011 05:01 AM, Qiu Zhigang wrote:
>>> Hi,
>>>
 -Original Message-
 From: Andreas Kurz [mailto:andr...@hastexo.com]
 Sent: Monday, December 19, 2011 6:50 PM
 To: pacemaker@oss.clusterlabs.org
 Subject: Re: [Pacemaker] How should I configure the STONITH?

 Hello,

 On 12/19/2011 08:31 AM, Qiu Zhigang wrote:
> Hi all,
>
> I want to configure the STONITH, but I couldn't find the following
> CLI described in the reference material (like
> Pacemaker-1.1-Pacemaker_Explained-en-US.pdf).
>
> stonith -L
> stonith -t ibmhmc -n
> stonith -t apcmaster -h
>
> I only found the stonith_admin command, but when I execute this cli,
> some problem occurred.
>
> [root@h10_151 ~]# stonith_admin  -L
> stonith_admin[14402]: 2011/12/19_15:29:20 info: crm_log_init_worker:
> Changed active directory to /var/lib/heartbeat/cores/root
> stonith_admin[14402]: 2011/12/19_15:29:20 notice: log_data_element:
> st_callback: st_notify_disconnect  subt="st_notify_disconnect" />
>
> What's the reason of this problem ?
>
> Moreover I didn't find the stonith plugins, which rpm package
> contain the stonith plugins ? I use the redhat 6, pacemaker version
> is pacemaker-1.1.2-7.el6.x86_64.rpm.

 RHEL6 and derivates ship "fence-agents" package known from it's
 redaht-cluster and not the stonith agents ... the agents are all
 named "fence_*' and come with nice man-pages.

 You can use them (nearly) like the stonith agents that used to come
 with Pacemaker as prerequisite.
>>>
>>> Thank you, but I'm doubt the way of configure fence-agents in pacemaker.
>>> Could I configure like following? If this isn't right, please point me
>>> the right way, thank you.
>>>
>>> configure
>>> primitive st-apc stonith: fence_apc \
>>> params ip="xx.xx.xx.xx" username="admin" password="password"
>>> clone fencing st-apc
>>> commit
>>
>> try something like this for the primitive:
>>
>> primitive st-apc stonith:fence_apc \
>>  params ipaddr="xx.xx.xx.xx" login="admin" passwd="password" \
>>  action="reboot" port="dummy" pcmk_host_list="node1 node2" \
>>  pcmk_host_map="node1:1,node2:2" pcmk_host_check="static-list"
>>
>> "man stonithd" and "man fence_apc" should also be helpful
>>
> Thank you, I'll try.
> 
>>>
>>> Another question is about Quorum device, which component is the quorum
>>> device in pacemaker, like qdisk in CMAN, is SBD?
>>
>> corosync or heatbeat CCM do this job for Pacemaker ... deliver node
> availability
>> information so Pacemaker can calculate quorum based on node count.
>>
> But how to process the split-brain with 2-nodes ? In cman we could use qdisk
> to 
> arbitrate the quorum node, what about the corosync+pacemaker?

You have to configure stonith and ignore quorum ... if you really run
into a split-brain, one node will be faster in fencing the other node
... see http://ourobengr.com/ha for some hints to not run into a reboot
cycle.

You could use redundant rings for corosync communication to lower risk
of split-brain and/or add e.g. an extra "quorum" node.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
>> Regards,
>> Andreas
>>
>> --
>> Need help with Pacemaker?
>> http://www.hastexo.com/now
>>
>>>
>>>
>>> Best Regards,
>>> Qiu Zhigang

 Regards,
 Andreas

 --
 Need help with Pacemaker?
 http://www.hastexo.com/now

>
>
> Best Regards,
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


>>>
>>>
>>>
>>> ___
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing 

Re: [Pacemaker] CMAN and Pacemaker

2011-12-28 Thread Andreas Kurz
Hello,

On 12/24/2011 09:13 AM, Fil wrote:
> Hi everyone,
> 
> Happy holidays!
> 
> I need some help with adding CMAN to my current cluster config.
> Currently I have a two node Corosync/Pacemaker (Active/Passive) cluster.
> It works as expected. Now I need to add a distributed filesystem to my
> setup. I would like to test GFS2. As much as I understand I need to
> setup CMAN to manage dlm/gfs_controld, am I correct? I have followed the
> Clusters_from_Scratch document but I am having issues starting
> pacemakerd once the cman is up and running. Is it possible to use
> dlm/gfs_controld without cman, directly from pacemaker? How do I strat
> pacemaker when CMAN is running, and do I even need to, and if not how do
> I manage my resources? Currently I am using:
> 
> Fedora 16
> corosync-1.4.2-1.fc16.x86_64
> pacemaker-1.1.6-4.fc16.x86_64
> cman-3.1.7-1.fc16.x86_64

Only start cman service -- not corosync -- and then start pacemaker
service, that should be enough. What is the error you get when starting
pacemaker via its init script?

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> Thanks
> filip
> 
> cluster.conf
> -
> 
> 
>   
>   
> 
>   
>   
> 
>   
> 
>   
> 
> 
>   
>   
> 
>   
> 
>   
> 
>   
>   
> 
>   
>   
> 
>   
> 
> 
> 
> corosync.conf
> --
> compatibility: whitetank
> 
> totem {
> version: 2
> secauth: off
> threads: 0
> rrp_mode: passive
> 
> interface {
> ringnumber: 0
> bindnetaddr: 192.168.10.0
> mcastaddr: 226.94.1.1
> mcastport: 5405
> }
> }
> 
> logging {
> fileline: off
> to_stderr: no
> to_logfile: yes
> to_syslog: yes
> logfile: /var/log/cluster/corosync.log
> debug: off
> timestamp: on
> }
> 
> amf {
> mode: disabled
> }
> 
> pacemaker conf
> --
> node server01 \
>   attributes standby="off"
> node server02 \
>   attributes standby="off"
> primitive scsi_reservation ocf:adriatic:sg_persist \
>   params sg_persist_resource="scsi_reservation0"
> devs="/dev/disk/by-path/ip-192.168.10.5:3260-iscsi-iqn.2004-04.com.qnap:ts-459proii:iscsi.test.cb4d16-lun-0"
> required_devs_nof="1" reservation_type="1" \
>   op start interval="0" timeout="30s" \
>   op stop interval="0" timeout="30s"
> primitive vm_test ocf:adriatic:VirtualDomain \
>   params config="/etc/libvirt/qemu/test.xml" hypervisor="qemu:///system"
> migration_transport="tcp" \
>   meta allow-migrate="true" is-managed="true" target-role="Stopped" \
>   op start interval="0" timeout="120s" \
>   op stop interval="0" timeout="120s" \
>   op migrate_from interval="0" timeout="120s" \
>   op migrate_to interval="0" timeout="120s" \
>   op monitor interval="10" timeout="30" depth="0" \
>   utilization cpu="1" hv_memory="1024"
> ms ms_scsi_reservation scsi_reservation \
>   meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" migration-threshold="1"
> allow-migrate="true" globally-unique="false" target-role="Stopped"
> location cli-prefer-vm_test vm_test \
>   rule $id="cli-prefer-rule-vm_test" inf: #uname eq server02
> colocation service_on_scsi_reservation inf: vm_test
> ms_scsi_reservation:Master
> order service_after_scsi_reservation inf: ms_scsi_reservation:promote
> vm_test:start
> property $id="cib-bootstrap-options" \
>   dc-version="1.1.6-4.fc16-89678d4947c5bd466e2f31acd58ea4e1edb854d5" \
>   cluster-infrastructure="openais" \
>   expected-quorum-votes="2" \
>   stonith-enabled="false" \
>   no-quorum-policy="ignore" \
>   default-resource-stickiness="0" \
>   last-lrm-refresh="1324069959"
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Re: [Pacemaker] How can a node join another cluster safely??

2011-12-28 Thread Andreas Kurz
On 12/26/2011 01:17 PM, Mars gu wrote:
> Hi All??
> I have two clusters??cluster_A   and  cluster_B.
> A node named node_b was in cluster_B before.
>  
> And now I want to move the node_b to Cluster_A. Then I changed the
> configfile(corosync.conf) of node_b into the configfile of Cluster_A's node.
> when I restart corosync. node_a joined cluster_A Successfully, but I
> noticed that the cib.xml in cluster_A  changed just like Cluster_B's.
> Then the resources I defined in cluster_A gone.
> How can the node_b join cluster_a safely?
> I find that when i execute this cli before the service started,
> Cluster_A's configfle will not be chaneged.
> 
> 
> cibadmin --modify  --crm_xml ''
> 
>   
> Is this the right way to solve the problem?
> thanks.

completely clean the "/var/run/heartbeat/crm" directory on that node,
before you put it into an other cluster.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Patch: use NFSv4 with RA nfsserver

2011-12-28 Thread Vogt Josef
On Tue, Dec 27, 2011 at 3:30 PM, Vogt Josef  wrote:
>> Just a question here: I could't get it to work without setting the gracetime 
>> - which isn't set in the exportfs RA. Are you sure this works as expected?

> Thanks, good input. I'd be happy to add that (as in, 
> wait_for_gracetime_on_start or similar). However, can you do me a favor 
> please? Take a look at the discussion archived at 
> http://www.spinics.net/lists/linux-nfs/msg22670.html and let me know if 
> nlm_grace_period (as mentioned in
http://www.spinics.net/lists/linux-nfs/msg22737.html) made any difference?

Yes, that did the trick. /proc/sys/fs/nfs/nlm_grace_period is not needed for 
NFSv4 so I just set it to a very small value. 
The problem is this: When using boht, NFSv2/NFSv3 and NFSv4, in case of a 
failover a v4 client could get a new lock on a locked file by a v2/v3 client 
before the v2/v3 client has a chance to reclaim its lock... Thats why we have 
to wait the whole 90 seconds (nlm_grace_period). 

The other two values (leasetime and gracetime) need to be small (which means 
higher load but faster failover). As far as I understand it, you would have to 
wait just until the bigger of /proc/fs/nfsd/nfsv4leastime and 
/proc/fs/nfsd/nfsv4gracetime is reached. 
Note: when you set this values manually, they are gone after you boot the 
machine. 
That's why I made sure in a case of failover, these values are set properly. 

I guess it would not be so easy to use NFSv4 and NFSv2/NFSv3 at the same time. 
On the other hand: Setting all this 3 values mentioned above to 10 should be 
save.

I found this thread very helpful:
http://marc.info/?l=linux-nfs&m=131590701830261&w=2

Kind regards
Josef

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org