Re: [Pacemaker] rhel6/cman+pacemaker - how to use clvm?

2013-04-08 Thread Vadym Chepkov

On Apr 8, 2013, at 6:52 AM, David Coulson wrote:

> 
> On 4/8/13 6:42 AM, Yuriy Demchenko wrote:
>> 
>> The purpose of my cluster is to provide HA VM and routing/gateway (thus RHCS 
>> isn't an option for me - no IPaddr2 and Route resources).
>> But I cannot find any documentation how to use cLVM in cman+pacemaker 
>> cluster, everything I found requires use of "ocf:lvm2:clvmd" resource, but 
>> there's no such resource in rhel6.4/centos6.4
>> I've tried just starting service clvmd and enabling clustering in lvm - that 
>> works, but clvmd wouldn't start without running cluster - so apparently i 
>> need to start clvmd after pacemaker startup. But in that case resource 
>> "ocf:heartbeat:lvm" would fail to start, as it requires running clvmd and 
>> clvmd starts only after pacemaker...
>> So I'm confused - how to use cLVM in my case?
> clvmd is dependent on cman, not pacemaker. Just make sure it starts up in 
> this order: cman, clvmd, pacemaker.

What if a clustered volume group appears only when pacemaker establishes iSCSI 
connection?

Cheers,
Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-17 Thread Vadym Chepkov

On Apr 17, 2013, at 11:57 AM, T. wrote:

> Hi,
> 
>> b) If I can't do it woith pcs, is there a reliable
>> and secure way to do it with pacemaker low level tools?
> why not just installing the crmsh from a different repository?
> 
> This is what I have done on CentOS 6.4.

My sentiments exactly. And "erase" is not the most important missed 
functionality. 
crm configure save, crm configure load (update | replace) is what made 
configurations easily manageable 
and trackable with a version control software.

Cheers,
Vadym




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-17 Thread Vadym Chepkov

On Apr 17, 2013, at 8:04 PM, Chris Feist wrote:

> On 04/17/13 11:13, Vadym Chepkov wrote:
>> 
>> On Apr 17, 2013, at 11:57 AM, T. wrote:
>> 
>>> Hi,
>>> 
>>>> b) If I can't do it woith pcs, is there a reliable
>>>> and secure way to do it with pacemaker low level tools?
>>> why not just installing the crmsh from a different repository?
>>> 
>>> This is what I have done on CentOS 6.4.
>> 
>> My sentiments exactly. And "erase" is not the most important missed 
>> functionality.
>> crm configure save, crm configure load (update | replace) is what made 
>> configurations easily manageable
>> and trackable with a version control software.
> 
> There is currently a command in pcs ('pcs cluster cib' & 'pcs cluster push 
> cib') to save and replace the current cib, however it will save the actual 
> xml from the cib, so reading/editing the file might be a little more 
> complicated than output from 'crm configure save'.

I might be missing something, but how is it different from old dark cibadmin 
days ;) ?

Thanks,
Vadym




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] failcount,start/stop-failure in crm_mon

2013-06-06 Thread Vadym Chepkov

On Jun 6, 2013, at 10:29 AM, Wolfgang Routschka 
 wrote:

> Hi,
> 
> one question today about deleting start/stop error in crm_mon.
> 
> How can I delete failure/errors in crm_mon without having to restart/refresh 
> resources?

crm resource cleanup some-resource




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker in RHEL6.

2011-08-10 Thread Vadym Chepkov

On Aug 10, 2011, at 11:43 AM, Marco van Putten wrote:

> On 08/10/2011 04:31 PM, Andreas Kurz wrote:
>> On 2011-08-10 14:13, Marco van Putten wrote:
>>> Hi,
>>> 
>>> Is it possible to get the pacemaker rpm's available for RHEL6 on the
>>> Clusterlabs repository (like for RHEL5)?
>>> 
>>> I know they are available through Redhat's "High Availability" channel.
>>> But since we have academic licences we don't have this channel available.
>> scientific linux 6.1 should provide all packages
>> 
>> Regards,
>> Andreas
>> 
> 
> 
> Thanks Andreas. But our managers persist on using Redhat.
> 


I assume you can safely install SL's rpms on the RHEL node.

Regards,
Vadym

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] group depending on clones restarting unnescessary

2011-08-26 Thread Vadym Chepkov

On Aug 26, 2011, at 2:24 PM, Michael Schwartzkopff wrote:

> Hi,
> 
> I set up HA NFS Server according to the HOWTO from linbit. Basically it is a 
> clone of the NFS server and a clone of the root filesystem. A group of the 
> Filesystem, the exportfs and the ip address depends on a DRBD and the root-
> exportfs clone.
> 
> See below for the configuration.
> 
> Lets say, the group run in node A and I put node B into standby everything 
> looks good. But when I set node B online again the NFS-group  restarts 
> although it runs on node A where which is not touched by the restart of the 
> second half of the clone.
> 
> Any explanation? I tried to set the clone interleave and non-globally-uniq, 
> but nothing helps.
> 
> Thanks for any hints.

How about :

order ord_Root_NFS 0: cloneExportRoot groupNFS

> 
> --- config
> 
> primitive resDRBD ocf:pacemaker:Stateful
> primitive resExportHome ocf:pacemaker:Dummy
> primitive resExportRoot ocf:pacemaker:Dummy
> primitive resFilesystem ocf:pacemaker:Dummy
> primitive resIP ocf:pacemaker:Dummy
> primitive resNFSServer ocf:pacemaker:Dummy
> group groupNFS resFilesystem resExportHome resIP
> ms msDRBD resDRBD
> clone cloneExportRoot resExportRoot \
>   meta interleave="true" globally-uniq="false"
> clone cloneNFSServer resNFSServer \
>   meta interleave="true" globally-uniq="false"
> colocation col_NFS_DRBD inf: groupNFS:Started msDRBD:Master
> colocation col_NFS_Root inf: groupNFS cloneExportRoot
> order ord_DRBD_NFS inf: msDRBD:promote groupNFS:start
> order ord_Root_NFS inf: cloneExportRoot groupNFS
> 
> ---
> 
> system: fedora 15 with pacemaker 1.1.5
> 
> -- 
> Dr. Michael Schwartzkopff
> Guardinistr. 63
> 81375 München
> 
> Tel: (0163) 172 50 98
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] resource moving unnecessarily due to ping race condition

2011-09-10 Thread Vadym Chepkov

On Sep 8, 2011, at 3:40 PM, Florian Haas wrote:

>>> On 09/08/11 20:59, Brad Johnson wrote:
 We have a 2 node cluster with a single resource. The resource must run
 on only a single node at one time. Using the pacemaker:ocf:ping RA we
 are pinging a WAN gateway and a LAN host on each node so the resource
 runs on the node with the greatest connectivity. The problem is when a
 ping host goes down (so both nodes lose connectivity to it), the
 resource moves to the other node due to timing differences in how fast
 they update the score attribute. The dampening value has no effect,
 since it delays both nodes by the same amount. These unnecessary
 fail-overs aren't acceptable since they are disruptive to the network
 for no reason.
 Is there a way to dampen the ping update by different amounts on the
 active and passive nodes? Or some other way to configure the cluster to
 try to keep the resource where it is during these tie score scenarios?
> 
> location pingd-constraint group_1 \
>  rule $id="pingd-constraint-rule" pingd: defined pingd
> 
> May I suggest that you simply change this constraint to
> 
> location pingd-constraint group_1 \
>  rule $id="pingd-constraint-rule" \
>-inf: not_defined pingd or pingd lte 0
> 
> That way, only a host that definitely has _no_ connectivity carries a
> -INF score for that resource group. And I believe that is what you
> really want, rather than take the actual ping score as a placement
> weight (your "best connectivity" approach).
> 
> Just my 2 cents, though.
> 

Even though this approach was recommended many times, there is a problem with 
it.
What if all nodes for some reason are not able to ping ? 
This rule would cause a resource to be brought down completely, whereas if you 
use "best connectivity" approach it will stay up where it was before network 
failed.

Vadym




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Pacemaker and DRBD - Will Not Promote

2011-10-28 Thread Vadym Chepkov
>From DRBD point of view, it's running, just not being used.  You need a
promote constraint and a resource depending on Master, IMHO.

Cheers,
Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] stonith and pacemaker with Centos 6

2012-07-12 Thread Vadym Chepkov

On Sep 20, 2011, at 1:59 PM, Charles Richard wrote:

> Hi, on my quest to trying to understand STONITH more and to get it working 
> with CentOS 6, i installed the fencing_agents on the os and also installed 
> OpenIPMI.  If i understood a little, the fence_ipmilan fencing agent is the 
> one I'd use in my pacemaker config and this fencing agent would call 
> "ipmitool" or some other similar command.
> 
> So i set out to make sure i could run the following on both my nodes: 
> 
> ipmitool -I lan -U root -H xxx.xxx.xxx.xxx -a chassis power status
> 
> The problem being on one node this work and on another this doesn't.  On the 
> one it doesn't, i get:
> 
> Activate Session command failed
> Error: Unable to establish LAN session
> Unable to get Chassis Power Status
> 
> which seems to be a standard error.
> 
> I'm hoping somebody might have some suggestions on why this isn't working or 
> if I'm way out in left field and this isn't relevant to my DRBD, Pacemaker 
> config on CentOS 6 to give me a nudge in another direction.  I'm kind of 
> getting stuck and google searches galore are not giving me any more 
> inspiration.
> 
> Thanks,
> Charles


You didn't say what device you are trying to use. On Dell servers, for example, 
you can't call your own iDRAC controller if it's network interface is shared 
with LAN interface.
Also, new models use IPMI 2.0 which requires lanplus interface.

Cheers,
Vadym




signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Service automatically restart after IP moving

2012-07-12 Thread Vadym Chepkov

On Jul 12, 2012, at 3:18 AM, CHAMLEY Stephane wrote:

> Ah, Thanx you  ! :)
> 
> But it is not working ... : (

I think it's a bug. I saw the same behavior in 1.0.12 as well. I ended up 
defining independent resources, not clone.
What's interesting though, if you stop ipv_test resource, the monclone will 
properly stop too, as expected.

Cheers,
Vadym



> I did this test configuration (see below) and then I made a shift in the 
> resouce IP (crm resource migrate).
> Yet the process "ntp" is always the old date..
> 
> CLI conf:
> 
> node $id="19838a79-4459-4e2d-864e-53b8c103f011" v-testweb02
> node $id="e7fe4f86-081a-4092-9d50-bcfbcfc02ae4" v-testweb01
> primitive ipv_test ocf:heartbeat:IPaddr \
>params ip="10.10.10.10" cidr_netmask="255.255.255.0" \
>meta migration-threshold="2" \
>op monitor interval="10s"
> primitive ntp lsb:ntp \
>op monitor interval="60s"
> clone monclone ntp \
>meta target-role="Started"
> order testorder inf: ipv_test monclone
> property $id="cib-bootstrap-options" \
>dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
>cluster-infrastructure="Heartbeat" \
>stonith-enabled="false" \
>no-quorum-policy="ignore" \
>default-resource-stickiness="1" \
>last-lrm-refresh="1331635194"
> 
> 
> XML conf:
> +
> 
>  crm_feature_set="3.0.1" dc-uuid="19838a79-4459-4e2d-864e-53b8c103f011" 
> epoch="111" have-quorum="1" num_updates="1" validate-with="pacemaker-1.0">
>  
>
>  
> value="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b"/>
> name="cluster-infrastructure" value="Heartbeat"/>
> name="stonith-enabled" value="false"/>
> name="no-quorum-policy" value="ignore"/>
> name="default-resource-stickiness" value="1"/>
> name="last-lrm-refresh" value="1331635194"/>
>  
>
>
>
>
>   uname="v-testweb02"/>
>   uname="v-testweb01"/>
>
>
>  
>
>   value="10.10.10.10"/>
>   name="cidr_netmask" value="255.255.255.0"/>
>
>
>   name="migration-threshold" value="2"/>
>
>
>  
>
>  
>  
>
>   value="Started"/>
>
>
>  
>
>  
>
>  
>
>
>   then="monclone"/>
>
>  
> 
> 
> -Original Message-
> From: David Vossel [mailto:dvos...@redhat.com] 
> Sent: mercredi 11 juillet 2012 21:01
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] Service automatically restart after IP moving
> 
> 
> 
> - Original Message -
>> From: "CHAMLEY Stephane" 
>> To: "pacemaker@oss.clusterlabs.org" 
>> Sent: Wednesday, July 11, 2012 1:59:58 AM
>> Subject: Re: [Pacemaker] Service automatically restart after IP moving
>> 
>> 
>> 
>> 
>> 
>> Nobody ? :S
> 
> use an order constraint.
> 
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-ordering.html
> 
> "If the first resource is (re)started while the then resource is running, the 
> then resource will be stopped and restarted."
> 
> So if an ip is stopped and started somewhere else, the 'then' resource should 
> restart.  is that what you were looking for?
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Pacemaker 1.1.7 order constraint syntax

2012-07-19 Thread Vadym Chepkov
Hi,

When Pacemaker 1.1.7 was announced, a new feature was mentioned:

The ability to specify that A starts after ( B or C or D )

I wasn't able to find an example how to express it crm shell in neither man crm 
nor in Pacemaker Explained.
In fact, 
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-ordering.html
 doesn't have new attribute listed either.
Is it supported in crm ?

Thanks,
Vadym



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1.7 order constraint syntax

2012-07-19 Thread Vadym Chepkov

On Jul 19, 2012, at 6:55 AM, Phillip Frost wrote:

> On Jul 19, 2012, at 5:47 AM, Vadym Chepkov wrote:
> 
>> Hi,
>> 
>> When Pacemaker 1.1.7 was announced, a new feature was mentioned:
>> 
>> The ability to specify that A starts after ( B or C or D )
>> 
>> I wasn't able to find an example how to express it crm shell in neither man 
>> crm nor in Pacemaker Explained.
>> In fact, 
>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-ordering.html
>>  doesn't have new attribute listed either.
>> Is it supported in crm ?
> 
> crm configure help order

Not there either.

pacemaker-1.1.7-6.el6.x86_64

This constraint expresses the order of actions on two resources
or more resources. If there are more than two resources, then the
constraint is called a resource set. Ordered resource sets have an
extra attribute to allow for sets of resources whose actions may run
in parallel. The shell syntax for such sets is to put resources in
parentheses.

Usage:
...
order  score-type: [:] [:] ...
  [symmetrical=]

score-type :: advisory | mandatory | 
...
Example:
...
order c_apache_1 mandatory: apache:start ip_1
order o1 inf: A ( B C )
...


Vadym



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1.7 order constraint syntax

2012-07-19 Thread Vadym Chepkov

On Jul 19, 2012, at 8:57 AM, Rasto Levrinc wrote:

> On Thu, Jul 19, 2012 at 2:38 PM, Andreas Kurz  wrote:
>> On 07/19/2012 11:47 AM, Vadym Chepkov wrote:
>>> Hi,
>>> 
>>> When Pacemaker 1.1.7 was announced, a new feature was mentioned:
>>> 
>>> The ability to specify that A starts after ( B or C or D )
>>> 
>>> I wasn't able to find an example how to express it crm shell in neither man 
>>> crm nor in Pacemaker Explained.
>>> In fact, 
>>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-ordering.html
>>>  doesn't have new attribute listed either.
>>> Is it supported in crm ?
>> 
>> I don't think it is supported in crm or any other configuration tool
>>  syntax for above example in xml looks like:
> 
> Well, LCMC supports this, btw last time I checked this feature
> is still not enabled in constrains rng in 1.1.7 by default, so you
> have to wait at least for 1.1.8, or enable it yourself.
> It also doesn't work if combined with colocation.
> 


Oh, so in other words it's not supported by 1.1.7 ? Why was it in release-notes 
then?




> Rasto
> 
>> 
>> 
>>  
>>
>>
>>
>>  
>>  
>>
>>  
>> 
>> 
>> ... can be found in the pengine regression tests directory in Pacemaker
>> source ...
> 
> -- 
> Dipl.-Ing. Rastislav Levrinc
> rasto.levr...@gmail.com
> Linux Cluster Management Console
> http://lcmc.sf.net/
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1.7 order constraint syntax

2012-07-19 Thread Vadym Chepkov

On Jul 19, 2012, at 8:16 AM, Phillip Frost wrote:

> On Jul 19, 2012, at 7:44 AM, Vadym Chepkov wrote:
> 
>> Not there either.
> 
> Maybe I'm not understanding your question. Isn't this what you are seeking?
> 
>> Ordered resource sets have an
>> extra attribute to allow for sets of resources whose actions may run
>> in parallel. The shell syntax for such sets is to put resources in
>> parentheses.
>> 
>> Example:
>>   order o1 inf: A ( B C )
> 
> Other examples are:
> 
> # A, then B, C, D, then E
> order o2 inf: A B C D E
> 
> # A, then B and C and D in parallel, then E
> order o2 inf: A ( B C D ) E
> 


which translates to

A (B and C and D) E

compare to 

A (B or C or D ) E



> If none of that is it, but you can find documentation of what you want in 
> XML, you could try inserting the desired XML into your configuration, then 
> running "crm configure show" to find the corresponding CRM syntax.
> 

Andreas beat me to it.





> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1.7 order constraint syntax

2012-07-19 Thread Vadym Chepkov

On Jul 19, 2012, at 8:57 AM, Rasto Levrinc wrote:

> On Thu, Jul 19, 2012 at 2:38 PM, Andreas Kurz  wrote:
>> On 07/19/2012 11:47 AM, Vadym Chepkov wrote:
>>> Hi,
>>> 
>>> When Pacemaker 1.1.7 was announced, a new feature was mentioned:
>>> 
>>> The ability to specify that A starts after ( B or C or D )
>>> 
>>> I wasn't able to find an example how to express it crm shell in neither man 
>>> crm nor in Pacemaker Explained.
>>> In fact, 
>>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-ordering.html
>>>  doesn't have new attribute listed either.
>>> Is it supported in crm ?
>> 
>> I don't think it is supported in crm or any other configuration tool
>>  syntax for above example in xml looks like:
> 
> Well, LCMC supports this, btw last time I checked this feature
> is still not enabled in constrains rng in 1.1.7 by default, so you
> have to wait at least for 1.1.8, or enable it yourself.
> It also doesn't work if combined with colocation

I presume it doesn't work if score is INFINITY ? Otherwise it would be strange.


> Rasto
> 
>> 
>> 
>>  
>>
>>
>>
>>  
>>  
>>
>>  
>> 
>> 
>> ... can be found in the pengine regression tests directory in Pacemaker
>> source ...
> 
> -- 
> Dipl.-Ing. Rastislav Levrinc
> rasto.levr...@gmail.com
> Linux Cluster Management Console
> http://lcmc.sf.net/
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] None of the standard agents in ocf:heartbeat are working in centos 6

2012-07-24 Thread Vadym Chepkov

On Jul 24, 2012, at 12:25 AM, Vladislav Bogdanov wrote:

> 24.07.2012 04:50, Andrew Beekhof wrote:
>> On Tue, Jul 24, 2012 at 5:38 AM, David Barchas  wrote:
>>> 
>>> On Monday, July 23, 2012 at 7:48 AM, David Barchas wrote:
>>> 
>>> 
>>> Date: Mon, 23 Jul 2012 14:15:27 +0300
>>> From: Vladislav Bogdanov
>>> 
>>> 23.07.2012 08:06, David Barchas wrote:
>>> 
>>> Hello.
>>> 
>>> I have been working on this for 3 days now, and must be so stressed out
>>> that I am being blinded to what is probably an obvious cause of this. In
>>> a word, HELP.
>>> 
>>> 
>>> setenforce 0 ?
>>> 
>>> i am familiar with it but have never had to disable it. I would be surprised
>>> for packages in standard repos.
>> 
>> No-one has written an selinux policy for pacemaker yet.
>> I would imagine that will come in the next month or so.
>> 
> 
> Highly appreciated. However lrmd part may be not as easy to implement
> properly as it seems at the first glance.


You can add runcon -t unconfined_t into /etc/init.d/pacemaker for now if you 
don't want to totally turn selinux off

Cheers,
Vadym



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] about iTCO_wdt watchdog

2012-09-09 Thread Vadym Chepkov

On Aug 2, 2012, at 4:06 AM, Mia Lueng wrote:

> you misunderstand me. I just simulate a system crash to test if the
> watchdog can reboot the system .
> 

All kernel wdt modules still rely on a functioning kernel.
But you crashed kernel at this point, so no one will reboot your system.
What I think you want is to have watchdog enabled in the BIOS (if you have it) 
and then use some vendor program to support that real "hardware" watchdog. I 
think most of them are available via IPMI.

P.S. I think it's off-topic for this list though - you probably will get a 
better chance to get answer in linux-ha maillist.

Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] duality and equality

2010-04-10 Thread Vadym Chepkov
Hi,

I noticed there are quite a few configuration parameters in pacemaker that can 
be set two different ways: via cluster properties or rsc/op_defaults.
For example,
property default-resource-stickiness and rsc_defaults resource-stickiness,
property is-managed-default and rsc_defaults is-managed, property 
stop-all-resources and rsc_defaults target-role, property 
default-action-timeout and op_defaults timeout. I assume this duality exists 
for historical reasons and in computing world it is not unusual to achieve the 
same results in different ways. But in this case curios minds want to know 
which parameter takes precedence if equal parameters are both set and 
contradict each other? 

I also noticed some differences in how these settings are assessed.

# crm configure show
node c20.chepkov.lan
node c21.chepkov.lan
primitive ip_rg0 ocf:heartbeat:IPaddr2 \
params nic="eth0" ip="10.10.10.22" cidr_netmask="32"
primitive ping ocf:pacemaker:ping \
params name="ping" dampen="5s" multiplier="200" host_list="10.10.10.250"
clone connected ping \
meta globally-unique="false"
property $id="cib-bootstrap-options" \
dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
stonith-enabled="false"

# crm configure verify
WARNING: ping: default-action-timeout 20s for start is smaller than the advised 
60
WARNING: ip_rg0: default-action-timeout 20s for start is smaller than the 
advised 90
WARNING: ip_rg0: default-action-timeout 20s for stop is smaller than the 
advised 100

# crm configure op_defaults timeout=120
WARNING: ping: default-action-timeout 20s for start is smaller than the advised 
60
WARNING: ip_rg0: default-action-timeout 20s for start is smaller than the 
advised 90
WARNING: ip_rg0: default-action-timeout 20s for stop is smaller than the 
advised 100

But,

# crm configure property default-action-timeout=120

makes it happy.

And this makes me wonder, are these parameters really the same or do they have 
a different meanings? Thank you.

Sincerely yours,
  Vadym Chepkov

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] OpenAIS priorities

2010-04-29 Thread Vadym Chepkov
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/node-score-equal.html

On Apr 29, 2010, at 10:20 AM, Dan Frincu wrote:

> Greetings all,
> 
> In the case of two servers in a cluster with OpenAIS, take the following 
> example:
> 
> location Failover_Alert_1 Failover_Alert 100: abc.localdomain
> location Failover_Alert_2 Failover_Alert 200: def.localdomain
> 
> This will setup the preference of a resource to def.localdomain because it 
> has the higher priority assigned to it, but what happens when the priorities 
> match, is there a tiebreaker, some sort of election process to choose which 
> node will be the one handling the resource?
> 
> Thank you in advance,
> Best regards.
> 
> -- 
> Dan FRINCU
> Internal Support Engineer
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


[Pacemaker] pacemaker and gnbd

2010-05-01 Thread Vadym Chepkov
Hi,

I found out I can't use gnbd if I use pacemaker rpm from clusterlabs 
repository, because gnbd depends on cman which requires openais which conflicts 
with corosync pacemaker depends on .
Is it just a matter of recompiling cman rpm using corosync libraries instead of 
openais? Or something else needs to be done?

Thank you,
Vadym Chepkov
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] pacemaker and gnbd

2010-05-03 Thread Vadym Chepkov

On May 3, 2010, at 2:23 AM, Andrew Beekhof wrote:
> 
> 
> I doubt openais conflicts with corosync, unless you have a very old
> version of cman.
> The repos include openais 1.0.x which is built against corosync.
> 

Unless I am doing something terribly wrong, this is not the case.

Redhat 5.5 (the latest at the moment) comes with cman-2.0.115-34.el5.x86_64.rpm

# rpm -q --requires -p cman-2.0.115-34.el5.x86_64.rpm 
warning: cman-2.0.115-34.el5.x86_64.rpm: Header V3 DSA signature: NOKEY, key ID 
37017186
kernel >= 2.6.18-36.el5
/sbin/chkconfig  
/sbin/chkconfig  
openais  
pexpect  
/bin/sh  
/bin/sh  
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(CompressedFileNames) <= 3.0.4-1
/bin/bash  
/usr/bin/perl  
/usr/bin/python  
libcpg.so.2()(64bit)  
libcpg.so.2(OPENAIS_CPG_1.0)(64bit)  
libc.so.6()(64bit)  
libc.so.6(GLIBC_2.2.5)(64bit)  
libc.so.6(GLIBC_2.3.2)(64bit)  
libc.so.6(GLIBC_2.3.3)(64bit)  
libc.so.6(GLIBC_2.3)(64bit)  
libdlm.so.2()(64bit)  
libdl.so.2()(64bit)  
libm.so.6()(64bit)  
libnss3.so()(64bit)  
libnss3.so(NSS_3.2)(64bit)  
libnss3.so(NSS_3.4)(64bit)  
libpthread.so.0()(64bit)  
libpthread.so.0(GLIBC_2.2.5)(64bit)  
libpthread.so.0(GLIBC_2.3.2)(64bit)  
librt.so.1()(64bit)  
librt.so.1(GLIBC_2.2.5)(64bit)  
libSaCkpt.so.2()(64bit)  
libSaCkpt.so.2(OPENAIS_CKPT_B.01.01)(64bit)  
libxml2.so.2()(64bit)  
libz.so.1()(64bit)  
perl(Getopt::Std)  
perl(IPC::Open3)  
perl(Net::Telnet)  
perl(POSIX)  
perl(strict)  
perl(warnings)  
perl(XML::LibXML)  

So, it depends on openais 0.8 (libcpg.so.2) 

And here is yum output:

# yum install gnbd
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package gnbd.x86_64 0:1.1.7-1.el5 set to be updated
--> Processing Dependency: libcman.so.2()(64bit) for package: gnbd
--> Running transaction check
---> Package cman.x86_64 0:2.0.115-34.el5 set to be updated
--> Processing Dependency: libSaCkpt.so.2(OPENAIS_CKPT_B.01.01)(64bit) for 
package: cman
--> Processing Dependency: perl(Net::Telnet) for package: cman
--> Processing Dependency: perl(XML::LibXML) for package: cman
--> Processing Dependency: pexpect for package: cman
--> Processing Dependency: openais for package: cman
--> Processing Dependency: libcpg.so.2(OPENAIS_CPG_1.0)(64bit) for package: cman
--> Processing Dependency: libSaCkpt.so.2()(64bit) for package: cman
--> Processing Dependency: libcpg.so.2()(64bit) for package: cman
--> Running transaction check
---> Package openais.x86_64 0:0.80.6-16.el5 set to be updated
---> Package perl-Net-Telnet.noarch 0:3.03-5 set to be updated
---> Package perl-XML-LibXML.x86_64 0:1.58-6 set to be updated
--> Processing Dependency: perl-XML-NamespaceSupport for package: 
perl-XML-LibXML
--> Processing Dependency: perl-XML-LibXML-Common for package: perl-XML-LibXML
--> Processing Dependency: perl(XML::SAX::Exception) for package: 
perl-XML-LibXML
--> Processing Dependency: perl(XML::LibXML::Common) for package: 
perl-XML-LibXML
--> Processing Dependency: perl-XML-SAX for package: perl-XML-LibXML
--> Processing Dependency: perl(XML::SAX::DocumentLocator) for package: 
perl-XML-LibXML
--> Processing Dependency: perl(XML::SAX::Base) for package: perl-XML-LibXML
--> Processing Dependency: perl(XML::NamespaceSupport) for package: 
perl-XML-LibXML
---> Package pexpect.noarch 0:2.3-3.el5 set to be updated
--> Running transaction check
---> Package perl-XML-LibXML-Common.x86_64 0:0.13-8.2.2 set to be updated
---> Package perl-XML-NamespaceSupport.noarch 0:1.09-1.2.1 set to be updated
---> Package perl-XML-SAX.noarch 0:0.14-8 set to be updated
--> Processing Conflict: corosync conflicts openais <= 0.89
--> Finished Dependency Resolution
corosync-1.2.1-1.el5.x86_64 from installed has depsolving problems
  --> corosync conflicts with openais
Error: corosync conflicts with openais


Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] pacemaker and gnbd

2010-05-03 Thread Vadym Chepkov

On May 3, 2010, at 10:27 AM, Andrew Beekhof wrote:

> It is the case, the conflict is slightly different than you think.
> Corosync doesn't conflict with all versions of openais, just the one
> cman wants to use.
> 
> You need to rebuild cman to use the newer version of openais.

Hmm, this is what I asked at the very beginning:

On Sat, May 1, 2010 at 3:30 PM, Vadym Chepkov  wrote:
> Hi,
> 
> I found out I can't use gnbd if I use pacemaker rpm from clusterlabs 
> repository, because gnbd depends on cman which requires openais which 
> conflicts with corosync pacemaker depends on .
> Is it just a matter of recompiling cman rpm using corosync libraries instead 
> of openais? Or something else needs to be done?


Unfortunately, cman doesn't get compiled "right away":

DEBUG: make[1]: Entering directory 
`/builddir/build/BUILD/cman-2.0.115/cman/daemon'
DEBUG: gcc -Wall  -fPIC -I//builddir/build/BUILD/cman-2.0.115/ccs/lib 
-I//usr/include -I../config -DCMAN_RELEASE_NAME=\"2.0.115\" 
-DOPENAIS_EXTERNAL_SERVICE -O2 -c -o daemon.o daemon.c
DEBUG: daemon.c:32:35: error: openais/totem/aispoll.h: No such file or directory
DEBUG: daemon.c:33:35: error: openais/totem/totemip.h: No such file or directory
DEBUG: In file included from daemon.c:37:
DEBUG: cnxman-private.h:17:33: error: openais/totem/totem.h: No such file or 
directory
DEBUG: In file included from daemon.c:42:
DEBUG: ais.h:25: error: array type has incomplete element type
DEBUG: ais.h:26: error: array type has incomplete element type
DEBUG: daemon.c:59: error: expected '=', ',', ';', 'asm' or '__attribute__' 
before 'ais_poll_handle'
DEBUG: daemon.c:62: error: expected ')' before 'handle'
DEBUG: daemon.c:63: error: expected ')' before 'handle'
DEBUG: daemon.c: In function 'send_reply_message':
DEBUG: daemon.c:89: warning: implicit declaration of function 'remove_client'
DEBUG: daemon.c:89: error: 'ais_poll_handle' undeclared (first use in this 
function)
DEBUG: daemon.c:89: error: (Each undeclared identifier is reported only once
DEBUG: daemon.c:89: error: for each function it appears in.)
DEBUG: daemon.c:108: warning: implicit declaration of function 
'poll_dispatch_modify'
DEBUG: daemon.c:108: error: 'process_client' undeclared (first use in this 
function)
DEBUG: daemon.c: At top level:
DEBUG: daemon.c:113: error: expected ')' before 'handle'
DEBUG: daemon.c: In function 'send_queued_reply':
DEBUG: daemon.c:168: error: 'ais_poll_handle' undeclared (first use in this 
function)
DEBUG: daemon.c:168: error: 'process_client' undeclared (first use in this 
function)
DEBUG: daemon.c: At top level:
DEBUG: daemon.c:173: error: expected ')' before 'handle'
DEBUG: daemon.c:323: error: expected ')' before 'handle'
DEBUG: daemon.c:354: error: expected declaration specifiers or '...' before 
'poll_handle'
DEBUG: daemon.c: In function 'open_local_sock':
DEBUG: daemon.c:402: warning: implicit declaration of function 
'poll_dispatch_add'
DEBUG: daemon.c:402: error: 'handle' undeclared (first use in this function)
DEBUG: daemon.c:402: error: 'process_rendezvous' undeclared (first use in this 
function)
DEBUG: daemon.c: At top level:
DEBUG: daemon.c:500: error: expected '=', ',', ';', 'asm' or '__attribute__' 
before 'aisexec_poll_handle'
DEBUG: daemon.c: In function 'cman_init':
DEBUG: daemon.c:506: error: 'ais_poll_handle' undeclared (first use in this 
function)
DEBUG: daemon.c:506: error: 'aisexec_poll_handle' undeclared (first use in this 
function)
DEBUG: daemon.c:512: error: too many arguments to function 'open_local_sock'
DEBUG: daemon.c:516: error: too many arguments to function 'open_local_sock'
DEBUG: make[1]: Leaving directory 
`/builddir/build/BUILD/cman-2.0.115/cman/daemon'
DEBUG: RPM build errors:
DEBUG: make[1]: *** [daemon.o] Error 1



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] pacemaker and gnbd

2010-05-03 Thread Vadym Chepkov

On May 3, 2010, at 5:39 PM, Andrew Beekhof wrote:

> 
> perhaps try the srpm from F-12

Would be nice, but the last one was in F-9, it seems:

http://koji.fedoraproject.org/koji/packageinfo?packageID=182

Vadym

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] pacemaker and gnbd

2010-05-03 Thread Vadym Chepkov

On May 3, 2010, at 6:03 PM, Vadym Chepkov wrote:

> 
> On May 3, 2010, at 5:39 PM, Andrew Beekhof wrote:
> 
>> 
>> perhaps try the srpm from F-12
> 
> Would be nice, but the last one was in F-9, it seems:
> 
> http://koji.fedoraproject.org/koji/packageinfo?packageID=182

Oh, I found out it's part of cluster package now
But it also doesn't compile :(

DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c: In function 
'create_lockspace_v5':
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1231: error: 
'DLM_LOCKSPACE_LEN' undeclared (first use in this function)
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1231: error: 
(Each undeclared identifier is reported only once
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1231: error: for 
each function it appears in.)
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1236: warning: 
left-hand operand of comma expression has no effect
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1231: warning: 
unused variable 'reqbuf'
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c: In function 
'create_lockspace_v6':
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1255: error: 
'DLM_LOCKSPACE_LEN' undeclared (first use in this function)
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1260: warning: 
left-hand operand of comma expression has no effect
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1255: warning: 
unused variable 'reqbuf'
DEBUG: make[2]: make[2]: Leaving directory 
`/builddir/build/BUILD/cluster-3.0.7/dlm/libdlm'

Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] pacemaker and gnbd

2010-05-04 Thread Vadym Chepkov
On Tue, May 4, 2010 at 3:41 AM, Andrew Beekhof  wrote:

>
> Hmmm... I wonder if the RHEL5.5 kernel is new enough to run the dlm.
> I suspect not.
>
> Why not try the RHEL6 beta?  It comes with compatible versions of
> everything (including pacemaker).
>
>
http://ftp.redhat.com/redhat/rhel/beta/6/x86_64/os/Packages/

I don't see gnbd.

And EPEL is not supporting RHEL6 yet.

Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

[Pacemaker] corosync rpm in clusterlabs repo

2010-05-04 Thread Vadym Chepkov
Hi,

I noticed my mock fails on the corosync libraries and I find out corosync
rpms were replaced with packages with the same version number.
This is what I have installed:

$ rpm -qi corosync
Name: corosync Relocations: (not relocatable)
Version : 1.2.1 Vendor: (none)
Release : 1.el5 Build Date: Thu 08 Apr 2010
04:17:05 PM UTC
Install Date: Fri 09 Apr 2010 11:01:09 AM UTC  Build Host: localhost
Group   : System Environment/Base   Source RPM:
corosync-1.2.1-1.el5.src.rpm
Size: 336476   License: BSD
Signature   : (none)
URL : http://www.openais.org
Summary : The Corosync Cluster Engine and Application Programming
Interfaces
Description :
This package contains the Corosync Cluster Engine Executive, several default
APIs and libraries, default configuration files, and an init script.

This is what in repository now:

$ rpm -qip corosync-1.2.1-1.el5.x86_64.rpm
Name: corosync Relocations: (not relocatable)
Version : 1.2.1 Vendor: (none)
Release : 1.el5 Build Date: Mon 03 May 2010
09:54:51 AM UTC
Install Date: (not installed)   Build Host: localhost
Group   : System Environment/Base   Source RPM:
corosync-1.2.1-1.el5.src.rpm
Size: 336476   License: BSD
Signature   : (none)
URL : http://www.openais.org
Summary : The Corosync Cluster Engine and Application Programming
Interfaces
Description :
This package contains the Corosync Cluster Engine Executive, several default
APIs and libraries, default configuration files, and an init script.

different rpm, same release version
Was it an over-site ?

 Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] clone ip definition and location stops my resources...

2010-05-10 Thread Vadym Chepkov
# crm ra meta ping

name (string, [undef]): Attribute name
The name of the attributes to set.  This is the name to be used in the
constraints.

By default is "pingd", but you are checking against pinggw

I suggest you do not change name though, but adjust your location constraint
to use pingd instead.
crm_mon only notices "pingd" at the moment whenn you pass -f argument: it's
hardcoded


On Mon, May 10, 2010 at 9:34 AM, Gianluca Cecchi
wrote:

> Hello,
> using pacemaker 1.0.8 on rh el 5 I have some problems understanding the way
> ping clone works to setup monitoring of gw... even after reading docs...
>
> As soon as I run:
> crm configure location nfs-group-with-pinggw nfs-group rule -inf:
> not_defined pinggw or pinggw lte 0
>
> the resources go stopped and don't re-start
>
> Then, as soon as I run
> crm configure delete nfs-group-with-pinggw
>
> the resources of the group start again...
>
> config (part of it, actually) I try to apply is this:
> group nfs-group ClusterIP lv_drbd0 NfsFS nfssrv \
> meta target-role="Started"
> ms NfsData nfsdrbd \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
> notify="true"
> primitive pinggw ocf:pacemaker:ping \
> params host_list="192.168.101.1" multiplier="100" \
> op start interval="0" timeout="90" \
>  op stop interval="0" timeout="100"
> clone cl-pinggw pinggw \
> meta globally-unique="false"
> location nfs-group-with-pinggw nfs-group \
> rule $id="nfs-group-with-pinggw-rule" -inf: not_defined pinggw or pinggw
> lte 0
>
> Is the location constraint to be done with ping resource or with its clone?
> Is it a cause of the problem that I have also defined an nfs client on the
> other node with:
>
> primitive nfsclient ocf:heartbeat:Filesystem \
> params device="nfsha:/nfsdata/web" directory="/nfsdata/web" fstype="nfs" \
>  op start interval="0" timeout="60" \
> op stop interval="0" timeout="60"
> colocation nfsclient_not_on_nfs-group -inf: nfs-group nfsclient
> order nfsclient_after_nfs-group inf: nfs-group nfsclient
>
> Thansk in advance,
> Gianluca
>
> From messages of the server running the nfs-group at that moment:
> May 10 15:18:27 ha1 cibadmin: [29478]: info: Invoked: cibadmin -Ql
> May 10 15:18:27 ha1 cibadmin: [29479]: info: Invoked: cibadmin -Ql
> May 10 15:18:28 ha1 crm_shadow: [29536]: info: Invoked: crm_shadow -c
> __crmshell.29455
> May 10 15:18:28 ha1 cibadmin: [29537]: info: Invoked: cibadmin -p -U
> May 10 15:18:28 ha1 crm_shadow: [29539]: info: Invoked: crm_shadow -C
> __crmshell.29455 --force
> May 10 15:18:28 ha1 cib: [8470]: info: cib_replace_notify: Replaced:
> 0.267.14 -> 0.269.1 from 
> May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: -  epoch="267" num_updates="14" admin_epoch="0" />
> May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +  epoch="269" num_updates="1" admin_epoch="0" >
> May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
> 
> May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
> 
> May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
>  __crm_diff_marker__="added:top" >
> May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
>   
> May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
>  operation="not_defined" />
> May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
>  operation="lte" value="0" />
> May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
>   
> May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
> 
> May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
> 
> May 10 15:18:28 ha1 crmd: [8474]: info: abort_transition_graph:
> need_abort:59 - Triggered transition abort (complete=1) : Non-status change
> May 10 15:18:28 ha1 attrd: [8472]: info: do_cib_replaced: Sending full
> refresh
> May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
> 
> May 10 15:18:28 ha1 crmd: [8474]: info: need_abort: Aborting on change to
> epoch
> May 10 15:18:28 ha1 attrd: [8472]: info: attrd_trigger_update: Sending
> flush op to all hosts for: master-nfsdrbd:0 (1)
> May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: + 
> May 10 15:18:28 ha1 crmd: [8474]: info: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
> origin=abort_transition_graph ]
> May 10 15:18:28 ha1 cib: [8470]: info: cib_process_request: Operation
> complete: op cib_replace for section 'all' (origin=local/crm_shadow/2,
> version=0.269.1): ok (rc=0)
> May 10 15:18:28 ha1 crmd: [8474]: info: do_state_transition: All 2 cluster
> nodes are eligible to run resources.
> May 10 15:18:28 ha1 cib: [8470]: info: cib_process_request: Operation
> complete: op cib_modify for section nodes (origin=local/crmd/203,
> version=0.269.1): ok (rc=0)
> May 10 15:18:28 ha1 crmd: [8474]: info: do_pe_invoke: Query 205: Requesting
> the current CIB: S_POLICY_ENGINE
> May 10 15:18:28 ha1 attr

Re: [Pacemaker] clone ip definition and location stops my resources...

2010-05-11 Thread Vadym Chepkov
You forgot to turn on monitor operation for ping (actual job)


On May 11, 2010, at 5:15 AM, Gianluca Cecchi wrote:

> On Mon, May 10, 2010 at 4:39 PM, Vadym Chepkov  wrote:
> # crm ra meta ping
> 
> name (string, [undef]): Attribute name
> The name of the attributes to set.  This is the name to be used in the 
> constraints.
> 
> By default is "pingd", but you are checking against pinggw
> 
> I suggest you do not change name though, but adjust your location constraint 
> to use pingd instead.
> crm_mon only notices "pingd" at the moment whenn you pass -f argument: it's 
> hardcoded
> 
> 
> On Mon, May 10, 2010 at 9:34 AM, Gianluca Cecchi  
> wrote:
> Hello,
> using pacemaker 1.0.8 on rh el 5 I have some problems understanding the way 
> ping clone works to setup monitoring of gw... even after reading docs...
> 
> As soon as I run:
> crm configure location nfs-group-with-pinggw nfs-group rule -inf: not_defined 
> pinggw or pinggw lte 0
> 
> the resources go stopped and don't re-start
> 
> [snip]
> 
> hem...
> I changed the location line so that now I have:
> primitive pinggw ocf:pacemaker:ping \
>   params host_list="192.168.101.1" multiplier="100" \
>   op start interval="0" timeout="90" \
>   op stop interval="0" timeout="100"
> 
> clone cl-pinggw pinggw \
>   meta globally-unique="false"
> 
> location nfs-group-with-pinggw nfs-group \
>   rule $id="nfs-group-with-pinggw-rule" -inf: not_defined pingd or pingd 
> lte 0
> 
> But now nothing happens  if I run for example
>  iptables -A OUTPUT -p icmp -d 192.168.101.1 -j REJECT (or DROP)
> in the node where nfs-group is running.
> 
> Do I have to name the primitive itself to pingd
> It seems that the binary /bin/ping is not accessed at all (with ls -lu ...)
> 
> Or do I have to change the general property I previously define to avoide 
> failback:
> rsc_defaults $id="rsc-options" \
>   resource-stickiness="100"
> 
> crm_mon -f -r gives:
> Online: [ ha1 ha2 ]
> 
> Full list of resources:
> 
> SitoWeb (ocf::heartbeat:apache):Started ha1
>  Master/Slave Set: NfsData
>  Masters: [ ha1 ]
>  Slaves: [ ha2 ]
>  Resource Group: nfs-group
>  ClusterIP  (ocf::heartbeat:IPaddr2): Started ha1
>  lv_drbd0   (ocf::heartbeat:LVM):   Started ha1
>  NfsFS(ocf::heartbeat:Filesystem):Started ha1
>  nfssrv (ocf::heartbeat:nfsserver): Started ha1
> nfsclient (ocf::heartbeat:Filesystem):Started ha2
>  Clone Set: cl-pinggw
>  Started: [ ha2 ha1 ]
> 
> Migration summary:
> * Node ha1:  pingd=100
> * Node ha2:  pingd=100
> 
> Probably I didn't understand correctly what described at the link:
> http://www.clusterlabs.org/wiki/Pingd_with_resources_on_different_networks
> or it is outdated now... and instead of defining two clones it is better (aka 
> works) to populate the host_list parameter as described here in case of more 
> networks connected:
> 
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch09s03s03.html
> 
> Probably I'm missing something very simple but I don't get a clue to it...
> Gianluca
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] clone ip definition and location stops my resources...

2010-05-11 Thread Vadym Chepkov
First of all, none of the monitor operation is on by default in pacemaker, this 
is something that you have to turn on
For the ping RA  start and stop op parameters don't do much, so you can safely 
drop them.

Here is my settings,  they do work for me:

primitive ping ocf:pacemaker:ping \
params name="pingd" host_list="10.10.10.250" multiplier="200" 
timeout="3" \
op monitor interval="10"
clone connected ping \
meta globally-unique="false"
location rg0-connected rg0 \
rule -inf: not_defined pingd or pingd lte 0


On May 11, 2010, at 7:06 AM, Gianluca Cecchi wrote:

> On Tue, May 11, 2010 at 12:50 PM, Vadym Chepkov  wrote:
> You forgot to turn on monitor operation for ping (actual job)
> 
> 
> 
> I saw from the 
> [r...@ha1 ~]# crm ra meta ping 
> command
> 
> Operations' defaults (advisory minimum):
> 
> start timeout=60
> stop  timeout=20
> reloadtimeout=100
> monitor_0 interval=10 timeout=60
> 
> So I presumed it was by default in place for the ping resource.
> Do you mean that I should define the resource this way:
> crm configure primitive pinggw ocf:pacemaker:ping \
> > params host_list="192.168.101.1" multiplier="100" \
> > op start interval="0" timeout="90" \
> > op stop interval="0" timeout="100" \
> > op monitor interval=10 timeout=60
> 
> Ok, I did it and I now get the same behavior as with pingd. Thanks ;-)
> 
> Migration summary:
> * Node ha1:  pingd=0
> * Node ha2:  pingd=100
> 
> And if I remove the iptables rule  I get:
> Migration summary:
> * Node ha1:  pingd=100
> * Node ha2:  pingd=100
> 
> Now I will check the "all resources stopped" problem...
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] clone ip definition and location stops my resources...

2010-05-11 Thread Vadym Chepkov
By the way, there is another issue with your config

Since you set multiplier to 100, it will negate your resource-stickiness which 
is also set to 100.
Either reduce multiplier or increase default resource-stickiness ( I have mine 
at 1000)


Vadym
On May 11, 2010, at 7:06 AM, Gianluca Cecchi wrote:

> On Tue, May 11, 2010 at 12:50 PM, Vadym Chepkov  wrote:
> You forgot to turn on monitor operation for ping (actual job)
> 
> 
> 
> I saw from the 
> [r...@ha1 ~]# crm ra meta ping 
> command
> 
> Operations' defaults (advisory minimum):
> 
> start timeout=60
> stop  timeout=20
> reloadtimeout=100
> monitor_0 interval=10 timeout=60
> 
> So I presumed it was by default in place for the ping resource.
> Do you mean that I should define the resource this way:
> crm configure primitive pinggw ocf:pacemaker:ping \
> > params host_list="192.168.101.1" multiplier="100" \
> > op start interval="0" timeout="90" \
> > op stop interval="0" timeout="100" \
> > op monitor interval=10 timeout=60
> 
> Ok, I did it and I now get the same behavior as with pingd. Thanks ;-)
> 
> Migration summary:
> * Node ha1:  pingd=0
> * Node ha2:  pingd=100
> 
> And if I remove the iptables rule  I get:
> Migration summary:
> * Node ha1:  pingd=100
> * Node ha2:  pingd=100
> 
> Now I will check the "all resources stopped" problem...
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] clone ip definition and location stops my resources...

2010-05-11 Thread Vadym Chepkov
pingd is a daemon with is running all the time and does it job
you still need to define monitor operation though, what if the daemon dies?
op monitor  just have a different meaning for ping and pingd.
with pingd - monitor daemon
with ping - monitor connectivity

as for warnings:

crm configure property default-action-timeout="120s"

On Tue, May 11, 2010 at 11:00 AM, Gianluca Cecchi  wrote:

> On Tue, May 11, 2010 at 1:13 PM, Vadym Chepkov  wrote:
>
>> First of all, none of the monitor operation is on by default in pacemaker,
>> this is something that you have to turn on
>> For the ping RA  start and stop op parameters don't do much, so you can
>> safely drop them.
>>
>>
>>
> Yes, but for the pacemaker:pingd RA I didn't need to pass the "op monitor"
> parameter to have it working
>
> Also, in general I added the start/stop op parameters, because without them
> I get, for example with the command you suggested:
>
> [r...@ha1 ~]# crm configure primitive pinggw ocf:pacemaker:ping \
> > params host_list="192.168.101.1" multiplier="200" timeout="3" \
> > op monitor interval="10"
> WARNING: pinggw: default-action-timeout 20s for start is smaller than the
> advised 60
> WARNING: pinggw: default-action-timeout 20s for monitor_0 is smaller than
> the advised 60
>
> Do I have to ignore the warnings?
> Or do I have to adapt the resource creation with:
> [r...@ha1 ~]# crm configure primitive pinggw ocf:pacemaker:ping \
> > params host_list="192.168.101.1" multiplier="200" timeout="3" \
> > op start timeout="60"
>
> That gives no warnings (even if I would have expected the warning about the
> monitor_0 timeout as I didn't set it...???)
>
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] clone ip definition and location stops my resources...

2010-05-11 Thread Vadym Chepkov
The is no "default" unless it's set, that's why crm complains



On Tue, May 11, 2010 at 12:41 PM, Gianluca Cecchi  wrote:

> On Tue, May 11, 2010 at 5:47 PM, Vadym Chepkov  wrote:
>
>> pingd is a daemon with is running all the time and does it job
>> you still need to define monitor operation though, what if the daemon
>> dies?
>> op monitor  just have a different meaning for ping and pingd.
>> with pingd - monitor daemon
>> with ping - monitor connectivity
>>
>> as for warnings:
>>
>> crm configure property default-action-timeout="120s"
>>
>>
> Thanks again!
> Now it is more clear.
>
> Only doubt: why pacemaker doesn't set directly as a default 120s for
> timeout?
> Any drawbacks in setting it to 120?
> Also, with
> crm configure show
> I can see
>  property $id="cib-bootstrap-options" \
> dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
> cluster-infrastructure="openais" \
>  expected-quorum-votes="2" \
> stonith-enabled="false" \
>  no-quorum-policy="ignore" \
> last-lrm-refresh="1273484758"
> rsc_defaults $id="rsc-options" \
>  resource-stickiness="1000"
>
> Any way to see what is the default value for "default-action-timeout"
> parameter that I'm going to change (I presume it is 20s from the warnings I
> received) and for other ones for example that are not shown with the show
> command?
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] Pacemaker installation on CentOs 5.3

2010-05-11 Thread Vadym Chepkov
You didn't have to do 'yum makecache'

Sometimes ago Andrew accidentally replaced some rpms  without bumping up
revision number.
This made yum to complain.
'yum clean all' should have cured all that.



On Tue, May 11, 2010 at 2:09 PM, Simon Lavigne-Giroux wrote:

> I found the solution to my problem, I had to do a 'yum clean all' and 'yum
> makecache' before doing the 'yum update'
>
> I'm just getting used to yum.
>
> Simon
>
> On Mon, May 10, 2010 at 12:55 PM, Simon Lavigne-Giroux  > wrote:
>
>> Hi,
>>
>> I'm trying to install pacemaker from your epel-5 repository from your
>> guide for a CentOs installation and it doesn't work.
>>
>> There is a checksum failure when using 'yum update' :
>>
>> http://www.clusterlabs.org/rpm/epel-5/repodata/filelists.xml.gz: [Errno
>> -1] Metadata file does not match checksum
>> Trying other mirror.
>> Error: failure: repodata/filelists.xml.gz from clusterlabs: [Errno 256] No
>> more mirrors to try.
>>
>> When I call 'yum install pacemaker', I have missing dependency errors for
>> these elements
>>
>> libnetsnmpagent.so.15
>> libcrypto.so.8
>> libtinfo.so.5
>> libxml2.so.2
>> ... and more.
>>
>> Can you repair the checksum problem? Is there an alternative way to get
>> pacemaker from a repository on CentOs 5.3.
>>
>> Thanks
>>
>> Simon
>>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] two nodes fenced when drbd link fails

2010-05-13 Thread Vadym Chepkov

On May 13, 2010, at 1:37 PM, Ivan Coronado wrote:

> Hello to everybody,
>  
> I have a problem with the corosync.conf setup. I have a drbd service runing 
> on eth3, and a general network and the stonith device (idrac6) in the eth0. 
> If I unplug the eth3 to simulate a network failure two nodes are fenced 
> (first the slave followed by the master). If I only leave ringnumber 0 in the 
> coroync.conf file I don't have this problem. Is this normal operation?
>  
> Here you have the section of corosync.conf where I have the problem, and 
> thanks for the help.
>  
> rrp_mode: active
> interface {
> # eth0
> ringnumber: 0
> bindnetaddr: 200.200.201.0
> mcastaddr: 226.94.1.1
> mcastport: 5405
> }
> interface {
> #eth3
> ringnumber: 1
> bindnetaddr: 192.168.2.0
> mcastaddr: 226.94.1.2
> mcastport: 5406
> }
> -
> Ivan


I read in the list open...@lists.osdl.org setting ports at least two apart 
helps (5405, 5407)

Vadym

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

[Pacemaker] pengine self-maintenance

2010-05-15 Thread Vadym Chepkov
Hi

I noticed pengine (pacemaker-1.0.8-6.el5) creates quite a lot of files in 
/var/lib/pengine,
especially when cluster-recheck-interval is set to enable failure-timeout 
checks.
/var/lib/heartbeat/crm/ seems also growing unattended.
Does pacemaker do any self-maintenance or it will cause system to crash 
eventually by utilizing all inodes?

Also, why "cluster-recheck-interval" not in "pengine metadata" output? Is it 
deprecated?

Thanks,
Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] pengine self-maintenance

2010-05-17 Thread Vadym Chepkov

On May 17, 2010, at 2:52 AM, Andrew Beekhof wrote:

> On Sun, May 16, 2010 at 1:09 AM, Vadym Chepkov  wrote:
>> Hi
>> 
>> I noticed pengine (pacemaker-1.0.8-6.el5) creates quite a lot of files in
>> /var/lib/pengine,
>> especially when cluster-recheck-interval is set to enable failure-timeout
>> checks.
> 
> pengine metadata | grep series-max

Great, thanks, after I set it, I take it I need to clean "excessive" manually?

# crm configure show |grep series-max
pe-error-series-max="10" \
pe-warn-series-max="10" \
pe-input-series-max="10"

# ls /var/lib/pengine/|wc -l
123500

> 
>> /var/lib/heartbeat/crm/ seems also growing unattended.
> 
> Unless there is a bug somewhere, it should be storing only the last
> 100 configurations.

you are right, they are being "reused"

> 
>> Does pacemaker do any self-maintenance or it will cause system to crash
>> eventually by utilizing all inodes?
>> 
>> Also, why "cluster-recheck-interval" not in "pengine metadata" output? Is it
>> deprecated?
> 
> Its controlled by the crmd, so its in the "crmd metadata" output.

Ah, then crm cli has a bug? 

When you click  metadata of crmd is not shown:

crm(live)configure# property 
batch-limit=  no-quorum-policy= 
pe-input-series-max=  stonith-enabled=
cluster-delay=node-health-green=pe-warn-series-max= 
  stonith-timeout=
default-action-timeout=   node-health-red=  remove-after-stop=  
  stop-all-resources=
default-resource-stickiness=  node-health-strategy= 
start-failure-is-fatal=   stop-orphan-actions=
is-managed-default=   node-health-yellow=   startup-fencing=
  stop-orphan-resources=
maintenance-mode= pe-error-series-max=  stonith-action= 
  symmetric-cluster=

Thanks,
Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] Detecting a lost network connection

2010-05-17 Thread Vadym Chepkov

On May 17, 2010, at 11:56 AM, Simon Lavigne-Giroux wrote:

> Hi,
> 
> I have 2 servers running Pacemaker. When the router fails, both nodes become 
> primary. Is it possible for Pacemaker on the secondary server to detect that 
> the network connection is not available and not become primary.
> 

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ch09s03s03s02.html



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] IP address does not failover on a new test cluster

2010-05-17 Thread Vadym Chepkov

On May 17, 2010, at 5:40 PM, Ruiyuan Jiang wrote:

> Hi, Gianluca
>  
> I modified my configuration and deleted “crm configure property 
> no-quorum-policy=ignore” as you suggested but I have the same problem that 
> the IP address does not fail. Thanks.
>  
> [r...@usnbrl52 log]# crm configure show
> node usnbrl52
> node usnbrl53
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
> params ip="156.146.22.48" cidr_netmask="32" \
> op monitor interval="30s"
> property $id="cib-bootstrap-options" \
> dc-version="1.0.8-fab8db4bbd271ba0a630578ec23776dfbaa4e2cf" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="default"

did you run 'crm configure show' after you set the property?
Because the option is not shown in your output.

also resource-stickiness="default" seems suspicious
what default? I thought it should be a numeric value.



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] IP address does not failover on a new test cluster

2010-05-18 Thread Vadym Chepkov
On Tue, May 18, 2010 at 2:22 PM, Ruiyuan Jiang wrote:

>  Hi, Vadym
>
>
>
> I modified the configuration per your suggestion. Here is the current
> configuration of the cluster:
>
>
>
> [r...@usnbrl52 ~]# crm configure show
>
> node usnbrl52
>
> node usnbrl53
>
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>
> params ip="156.146.22.48" cidr_netmask="32" \
>
> op monitor interval="30s"
>
> property $id="cib-bootstrap-options" \
>
> dc-version="1.0.8-fab8db4bbd271ba0a630578ec23776dfbaa4e2cf" \
>
> cluster-infrastructure="openais" \
>
> expected-quorum-votes="2" \
>
> stonith-enabled="false"
>
> rsc_defaults $id="rsc-options" \
>
> resource-stickiness="100"
>
> [r...@usnbrl52 ~]#
>
>
>
> After the change, the IP address still does not fail to the other node
> usnbrl53 after I shutdown openais on node usnbrl52. The cluster IP has no
> problem to bound on usnbrl52 when the “openais” gets stopped and started on
> the node.
>
>
That's because no-quorum-policy=ignore is still not there, it is not listed
in crm configure show output
Run the command again

crm configure property no-quorum-policy=ignore

and make sure 'crm configure show' has changed accordingly

Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] IP address does not failover on a new test cluster

2010-05-18 Thread Vadym Chepkov
On Tue, May 18, 2010 at 3:58 PM, Ruiyuan Jiang wrote:

>  Thanks, Vadym
>
>
>
> This time it failed over to another node. For two nodes cluster, does the
> cluster have to set to “no-quorum-policy=ignore” to failover or work
> correctly?
>
>
I can't say it better myself:

http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

two-node cluster only has quorum when both nodes are running, which is no
longer the case for our
cluster. This would normally make the creation of a two-node cluster
pointless, however it is possible to
control how Pacemaker behaves when quorum is lost. In particular, we can
tell the cluster to simply ignore
quorum altogether.
crm configure property no-quorum-policy=ignore
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

[Pacemaker] crm complains about resource id

2010-05-19 Thread Vadym Chepkov
Hi,

It seems I found a bug in crm

# rpm -q pacemaker
pacemaker-1.0.8-6.el5

# crm configure rsc_defaults failure-timeout="10min"

# crm configure show | tail -2
rsc_defaults $id="rsc-options" \
failure-timeout="10min"

# crm configure edit rsc-options
:%s/10/20/
ZZ

ERROR: element meta_attributes (rsc-options) not recognized
Do you want to edit again? 

or you just do 'crm configure edit' and try to change rsc_defaults - same 
result - complains about id
If I try to remove $id="rsc-options" altogether, it accepts it, but removes 
rsc_defaults completely - D'oh

Vadym



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] pengine self-maintenance

2010-05-19 Thread Vadym Chepkov

On May 17, 2010, at 11:38 AM, Dejan Muhamedagic wrote:
> 
> You don't want to set it that low. PE input files are part of
> your cluster history. Set it to a few thousand.
> 

What could be the side-backs of having it too low?
How are these files being used?

And shouldn't be some reasonable default be in place? I just happened to notice 
90% inode utilization on my /var, some could be not so lucky.


>> # ls /var/lib/pengine/|wc -l
>> 123500
>> 
>>> 
 /var/lib/heartbeat/crm/ seems also growing unattended.
>>> 
>>> Unless there is a bug somewhere, it should be storing only the last
>>> 100 configurations.
>> 
>> you are right, they are being "reused"


I found another bug/feature :)
When it's time to reutilize cib-/pe-xxx the process starts with 1, but initial 
start creates files with 0 suffix
So you have your pe-warn-0.bz2 frozen in time, for example :)



>> 
>>> 
 Does pacemaker do any self-maintenance or it will cause system to crash
 eventually by utilizing all inodes?
 
 Also, why "cluster-recheck-interval" not in "pengine metadata" output? Is 
 it
 deprecated?
>>> 
>>> Its controlled by the crmd, so its in the "crmd metadata" output.
>> 
>> Ah, then crm cli has a bug? 
>> 
>> When you click  metadata of crmd is not shown:
>> 
>> crm(live)configure# property 
>> batch-limit=  no-quorum-policy= 
>> pe-input-series-max=  stonith-enabled=
>> cluster-delay=node-health-green=
>> pe-warn-series-max=   stonith-timeout=
>> default-action-timeout=   node-health-red=  
>> remove-after-stop=stop-all-resources=
>> default-resource-stickiness=  node-health-strategy= 
>> start-failure-is-fatal=   stop-orphan-actions=
>> is-managed-default=   node-health-yellow=   startup-fencing= 
>>  stop-orphan-resources=
>> maintenance-mode= pe-error-series-max=  stonith-action=  
>>  symmetric-cluster=
> 
> Yes, you can file a bugzilla for that. Note that the property
> will still be set if you type it.
> 

Done, Bug 2419

Thanks,
Vadym



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] pengine self-maintenance

2010-05-19 Thread Vadym Chepkov
On Wed, May 19, 2010 at 1:26 PM, Dejan Muhamedagic wrote:

> > And shouldn't be some reasonable default be in place? I just
> > happened to notice 90% inode utilization on my /var, some could
> > be not so lucky.
>
>
> Yes, that could be a problem. Perhaps that default could be
> changed to say 1 which would be close enough to unlimited for
> clusters in normal use :)
>

Even if your cluster absolutely solid and none of applications ever go up or
down this will be reached in 104 days :)


> > I found another bug/feature :)
> > When it's time to reutilize cib-/pe-xxx the process starts with 1, but
> initial start creates files with 0 suffix
> > So you have your pe-warn-0.bz2 frozen in time, for example :)
>


Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

[Pacemaker] manage/unmanage and stop/start logic

2010-05-19 Thread Vadym Chepkov
There is some flow in start/stop and manage/unmanage logic in the crm, in my 
opinion.

For example,

I unmanaged some resource to do maintenance, then I issued crm resource manage 
again.
At this point crm will add  meta is-managed="true" to the resource.

Later on I need to upgrade pacemaker software
so I issue 'crm configure property is-managed-default=false'
and all resources will be unmanaged except those that I was "managing" before.
so I have to go into each individually and remove meta is-managed.

And basically the same problem with target-role.

maybe the logic should be changed and not to add a redundant meta? If the 
result can be achieved without adding meta don't add it?
And add only if we add "force" option, for exampe?

Just a thought.

Vadym



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] manage/unmanage and stop/start logic

2010-05-20 Thread Vadym Chepkov
On Thu, May 20, 2010 at 5:05 AM, Dejan Muhamedagic wrote:

> Too late for that, we shouldn't change semantics. I did think
> about it at the time and say "resource manage rsc" seemed
> unequivocal. BTW, there's a way to remove a meta attribute:
>
> crm resource meta  delete 
>
> Thanks,
>
> Dejan
>
>
I know, that's how I clean it later. But it becomes somewhat a tedious task.
we already have to remember "unmove" resources, in addition we have to
"unstop" and "demanage" :)
It's not a life stopper, it's inconvenience. crm shell was introduced to
make pacemaker more convenient, right?

Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] SNMP/SMTP alerts on move or STONITH?

2010-05-24 Thread Vadym Chepkov

On May 24, 2010, at 10:05 AM, Simpson, John R wrote:

> Greetings all,
>  
>  First, my compliments to the Pacemaker and Corosync developers.  I’ve 
> been trying out Pacemaker for the past few months, and (especially from the 
> command line) I’ve found building and managing Pacemaker-based clusters more 
> intuitive and flexible than RHCS.
>  
>  Is there any way to generate SNMP traps and/or email notifications when 
> a resource is moved or a node is STONITH’d?  
>  
>  Using the Pacemaker resource agent ClusterMon to run crm_mon I receive 
> the start, stop, and monitor notifications I expect, but there are no 
> specific notifications when a resource is moved or a node is killed.  I’d 
> like to send up a giant red flag when one of these major events occurs, 
> rather than having to derive it from start/stop/monitor alerts (i.e. all the 
> resources usually hosted on node01 suddenly started and were monitored on 
> node02 – node01 must have been stonith’d).  I’m using the external/ssh 
> stonith agent for lab tests, if that is a factor.
>  
>  I’m using the following ClusterMon configuration and Pacemaker / 
> Corosync / SNMP versions:
>  
> primitive Monitor-Cluster ocf:pacemaker:ClusterMon \
> params htmlfile="/var/www/html/rlb-cluster-monitor.html" \
> params pidfile="/var/run/rlb-cluster-monitor.pid" \
> params extra_options="--mail-host=outbound.msg.reyrey.net:25 
> --mail-from=john_simp...@reyrey.com --mail-to=john_simp...@reyrey.com 
> --snmp-traps=10.205.1.18" \
> op start interval="0" timeout="90s" \
> op stop interval="0" timeout="100s"
>  
> pacemaker-libs-devel-1.0.8-3.el5
> pacemaker-libs-1.0.8-3.el5 is
> pacemaker-1.0.8-3.el5
> corosynclib-1.2.0-1.el5
> corosync-1.2.0-1.el5
> corosynclib-devel-1.2.0-1.el5
> net-snmp-libs-5.3.2.2-7.el5_4.2
> net-snmp-5.3.2.2-7.el5_4.2
>  
> Best regards,
>  
> John
> John Simpson 
> Senior Software Engineer, I. T. Engineering and Operations
> 

Ironically, pacemaker does not provide self-monitoring facilities, at least I 
wasn't able to find anything usable.
I strongly suggest you to disable --mail-to feature of crm_mon, because you 
will induce a DoS attack on your mail server. 
You will be flooded with pretty much useless e-mails 
(http://developerbugs.linux-foundation.org/show_bug.cgi?id=2313) 
Also crm_mon is not usable with nagios at the moment 
(http://developerbugs.linux-foundation.org/show_bug.cgi?id=2344)
Your best bet is either write a cron script or a "Dummy" derived resource agent 
that would parse 'crm resource status' output or
 create a MailTo resource for each single leaf node and accompanied 
collocation/ordering constraints.

If there is another "native" way to get nagios-like notifications to my pager, 
I would be happy to know them as well.

Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

[Pacemaker] power failure handling

2010-05-26 Thread Vadym Chepkov
Hi,

What would be the proper way to shutdown members of two-node cluster in case of 
a power outage? 
I assume as soon I issue 'crm node standby node-1 reboot' resources will start 
to fail-over to the second node and, 
first of all, there is no reason for that, and, second of all, 
consecutive 'crm node standby node-2 reboot' might get into some race 
condition. 

In pseudo-property terms:

crm confgure property 
stop-all-resources-even-if-target-role-is-started-until-reboot=true

:)


Thanks,
Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] power failure handling

2010-05-27 Thread Vadym Chepkov

On May 27, 2010, at 7:21 AM, Andrew Beekhof wrote:

> On Wed, May 26, 2010 at 9:07 PM, Vadym Chepkov  wrote:
>> Hi,
>> 
>> What would be the proper way to shutdown members of two-node cluster in case 
>> of a power outage?
>> I assume as soon I issue 'crm node standby node-1 reboot' resources will 
>> start to fail-over to the second node and,
>> first of all, there is no reason for that, and, second of all,
>> consecutive 'crm node standby node-2 reboot' might get into some race 
>> condition.
> 
> Why?

Just a gutsy feeling and I would prefer to have it in one transaction, call me 
a purist :)
I would use 'crm load update standby.cfg", but I can't figure out how to set 
lifetime reboot attribute properly. 
crm is definitely using a hack on this one because when I issue this command 
the node goes standby, but 'crm configure' and 'crm node show' 
indicates that standby attribute is off, weird.

> 
>> 
>> In pseudo-property terms:
>> 
>> crm confgure property 
>> stop-all-resources-even-if-target-role-is-started-until-reboot=true
> 
> crm confgure property stop-all-resources=true
> 
> followed by:
>  cibadmin --delete-all --xpath '//nvp...@name="target-role"]'
> 
> should work

it would also alter those that were stopped for a reason, and certainly can be 
tweaked, 
but it won't take care of the "until-reboot" part.

Thanks,
Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] corosync/openais fails to start

2010-05-27 Thread Vadym Chepkov

On May 27, 2010, at 11:40 AM, Diego Remolina wrote:

> Is there any workaround for this? Perhaps a slightly older version of the 
> rpms? If so where do I find those?

chkconfig corosync off
chkconfig heartbeat on

Unfortunately, that's what I had to do on PPC64 RHEL5


> 
> I cannot get the opensuse-ha rpms any more so I am stuck with a 
> non-functioning cluster.
> 
> Diego
> 
> Steven Dake wrote:
>> This is a known issue on some platforms, although the exact cause is 
>> unknown.  I have tried RHEL 5.5 as well as CentOS 5.5 with clusterrepo rpms 
>> and been unable to reproduce.  I'll keep looking.
>> Regards
>> -steve
>> On 05/27/2010 06:07 AM, Diego Remolina wrote:
>>> Hi,
>>> 
>>> I was running the old rpms from the opensuse repo and wanted to change
>>> over to the latest packages from the clusterlabs repo in my RHEL 5.5
>>> machines.
>>> 
>>> Steps I took
>>> 1. Disabled the old repo
>>> 2. Set the nodes to standby (two node drbd cluster) and turned of openais
>>> 3. Enabled the new repo.
>>> 4. Performed an update with yum -y update which replaced all packages.
>>> 5. The configuration file for ais was renamed openais.conf.rpmsave
>>> 6. I ran corosync-keygen and copied the key to the second machine
>>> 7. I copied the file openais.conf.rpmsave to /etc/corosync/corosync.conf
>>> and modified it by removing the service section and moving that to
>>> /etc/corosync/service.d/pcmk
>>> 8. I copied the configurations to the other machine.
>>> 9. When I try to start either openais or corosync with the init scripts
>>> I get a failure and nothing that can really point me to an error in the
>>> logs.
>>> 
>>> Updated packages:
>>> May 26 14:29:32 Updated: cluster-glue-libs-1.0.5-1.el5.x86_64
>>> May 26 14:29:32 Updated: resource-agents-1.0.3-2.el5.x86_64
>>> May 26 14:29:34 Updated: cluster-glue-1.0.5-1.el5.x86_64
>>> May 26 14:29:34 Installed: libibverbs-1.1.3-2.el5.x86_64
>>> May 26 14:29:34 Installed: corosync-1.2.2-1.1.el5.x86_64
>>> May 26 14:29:34 Installed: librdmacm-1.0.10-1.el5.x86_64
>>> May 26 14:29:34 Installed: corosynclib-1.2.2-1.1.el5.x86_64
>>> May 26 14:29:34 Installed: openaislib-1.1.0-2.el5.x86_64
>>> May 26 14:29:34 Updated: openais-1.1.0-2.el5.x86_64
>>> May 26 14:29:34 Installed: libnes-0.9.0-2.el5.x86_64
>>> May 26 14:29:35 Installed: heartbeat-libs-3.0.3-2.el5.x86_64
>>> May 26 14:29:35 Updated: pacemaker-libs-1.0.8-6.1.el5.x86_64
>>> May 26 14:29:36 Updated: heartbeat-3.0.3-2.el5.x86_64
>>> May 26 14:29:36 Updated: pacemaker-1.0.8-6.1.el5.x86_64
>>> 
>>> Apparently corosync is sec faulting when run from the command line:
>>> 
>>> # /usr/sbin/corosync -f
>>> Segmentation fault
>>> 
>>> Any help would be greatly appreciated.
>>> 
>>> Diego
>>> 
>>> 
>>> 
>>> ___
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


[Pacemaker] master/slave or unique clones

2010-05-28 Thread Vadym Chepkov
Hi,

I want to convert our home-made application to be managed by pacemaker cluster. 
The way it works now: application starts, discovers all IPs configured on the 
system and if it sees preconfigured IP it becomes "master" and will serve 
configuration requests, 
if not - "node" and will try to connect to "master" node to get configuration 
data. Other then that, application instances are absolutely the same on all 
"cluster" servers and 
they start collecting their own unique data.

I started to write a resource agent in "stateful" manner, but the problem is - 
I can't promote or demote and the documentation says : application has to start 
in "slave" mode, 
which I can't supply. So I thought of unique clones instead, but I need to 
create collocation/ordering constraint with IP for the clone:0 and again 
documentation says I shouldn't do it.

As a workaround I could probably create a separate "master" instance and clones 
with clone-max=node-1 and create -INFINITY collocation constraint between them, 
but it's more of a hack it seems.

What would Andrew do ? :)

Thank you,
Vadym Chepkov


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] master/slave or unique clones

2010-05-28 Thread Vadym Chepkov

On May 28, 2010, at 8:12 AM, Florian Haas wrote:

> On 2010-05-28 14:01, Vadym Chepkov wrote:
>> Hi,
>> 
>> I want to convert our home-made application to be managed by pacemaker 
>> cluster. 
>> The way it works now: application starts, discovers all IPs configured on 
>> the system and if it sees preconfigured IP it becomes "master" and will 
>> serve configuration requests, 
>> if not - "node" and will try to connect to "master" node to get 
>> configuration data. Other then that, application instances are absolutely 
>> the same on all "cluster" servers and 
>> they start collecting their own unique data.
>> 
>> I started to write a resource agent in "stateful" manner, but the problem is 
>> - I can't promote or demote and the documentation says : application has to 
>> start in "slave" mode, 
>> which I can't supply. So I thought of unique clones instead, but I need to 
>> create collocation/ordering constraint with IP for the clone:0 and again 
>> documentation says I shouldn't do it.
>> 
>> As a workaround I could probably create a separate "master" instance and 
>> clones with clone-max=node-1 and create -INFINITY collocation constraint 
>> between them, but it's more of a hack it seems.
> 
> Show us your RA and I'm sure quite a few people will come up with
> helpful suggestions.
> 

Imperative word was "started". You think I still should go multi-state RA for 
this application?

Thanks,
Vadym Chepkov
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] master/slave or unique clones

2010-05-28 Thread Vadym Chepkov

On May 28, 2010, at 8:27 AM, Florian Haas wrote:
>> 
>> Imperative word was "started". You think I still should go multi-state RA 
>> for this application?
> 
> If the application which that RA applies to distinguishes between roles
> equivalent to a Master and a Slave, and you want the RA to manage those,
> then probably yes.
> 
> What makes you think you can't start the RA in Slave mode?

Because when application starts and able do bind to "master" ip it becomes 
master right away, there is no way to promote or demote.
Should I fake it in RA?

Thanks,
Vadym




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] master/slave or unique clones

2010-05-28 Thread Vadym Chepkov

On May 28, 2010, at 11:17 AM, Florian Haas wrote:

> On 05/28/2010 02:37 PM, Vadym Chepkov wrote:
>> 
>> On May 28, 2010, at 8:27 AM, Florian Haas wrote:
>>>> 
>>>> Imperative word was "started". You think I still should go multi-state RA 
>>>> for this application?
>>> 
>>> If the application which that RA applies to distinguishes between roles
>>> equivalent to a Master and a Slave, and you want the RA to manage those,
>>> then probably yes.
>>> 
>>> What makes you think you can't start the RA in Slave mode?
>> 
>> Because when application starts and able do bind to "master" ip it becomes 
>> master right away, there is no way to promote or demote.
>> Should I fake it in RA?
> 
> Perhaps you could share just what you're trying to achieve, for what
> application, and how from the application's point of view the master is
> different from the slave?  I understand that you have this all laid out
> in your head, but the information you've given here is very sparse. This
> makes it hard to follow, and difficult to give advice. It's a bit like
> discussing the view of a beautiful landscape with blindfolds on. :)

I am sorry I wasn't informative before, I will try again

>> I want to convert our home-made application to be managed by pacemaker 
>> cluster. 
>> The way it works now: application starts, discovers all IPs configured on 
>> the system and if it sees preconfigured IP it becomes "master" and will 
>> serve configuration requests, 
>> if not - "node" and will try to connect to "master" node to get 
>> configuration data. Other then that, application instances are absolutely 
>> the same on all "cluster" servers and 
>> they start collecting their own unique data.
> 

So, application acts as "master" if it was able to bind to the pre-configured 
IP and as a "node" if it wasn't. If it's a master it listens on an additional 
port and receives updates from nodes. Each application pulls video feed out of 
attached video cameras and stores them on the local drives, that what makes 
them unique.

What I want to achieve:

1. Have "master" IP running on one of the 2 servers (attached to shared EMC 
storage)
2. Have "master" app started on "master" IP and insure it's running.
3. Start application on each node and make sure it's running.

Hopefully this clears the landscape a bit :)

Appreciate your help.

Thanks,
Vadym Chepkov




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


[Pacemaker] Stateful RA

2010-06-01 Thread Vadym Chepkov
Hi,

I was looking into Stateful resource agent  (branch:  stable-1.0)

stateful_start() {
stateful_check_state master
if [ $? = 0 ]; then
# CRM Error - Should never happen
return $OCF_RUNNING_MASTER
fi
...

Why does it return $OCF_RUNNING_MASTER when master is not running? I am 
confused.

Thanks,
Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Stateful RA

2010-06-02 Thread Vadym Chepkov

On Jun 2, 2010, at 3:08 AM, Andrew Beekhof wrote:

> On Wed, Jun 2, 2010 at 8:55 AM, Vadym Chepkov  wrote:
>> Hi,
>> 
>> I was looking into Stateful resource agent  (branch:  stable-1.0)
>> 
>> stateful_start() {
>>stateful_check_state master
>>if [ $? = 0 ]; then
>># CRM Error - Should never happen
>>return $OCF_RUNNING_MASTER
>>fi
>> ...
>> 
>> Why does it return $OCF_RUNNING_MASTER when master is not running?

> But it is running, as a master. Thats what the code says.
> 

D'oh, I read it wrong, sorry (too much C code before)
But it's not an error to have "master" start in "master" mode right away, 
without waiting for "promote" it seems, right?


Thanks,
Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-03 Thread Vadym Chepkov
Hi,

Not sure what I am doing wrong

primitive dummy1 ocf:pacemaker:Dummy
primitive dummy2 ocf:pacemaker:Dummy
primitive dummy3 ocf:pacemaker:Dummy

only two nodes alive in 3 node cluster, so I can see two dummy resources 
started on one node:

 dummy2 (ocf::pacemaker:Dummy): Started c20
 dummy1 (ocf::pacemaker:Dummy): Started c20
 dummy3 (ocf::pacemaker:Dummy): Started c22

Now I want to have only one resource running on one node at any given time, so 
I created a constraint:

# crm configure show one-dummy
colocation one-dummy -inf: ( dummy1 dummy2 dummy3 )

# cibadmin -Q -o constraints

  

  
  
  

  


I would expect one of the dummies to be down at this point. But it is not.
Maybe pacemaker can't decide which one, I thought, so I set a priority:

# crm resource dummy1 meta priority="10"

Still no dice.

pacemaker-1.0.8-6.1.el5

What am I missing?

Thank you,
Vadym Chepkov





___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-05 Thread Vadym Chepkov

On Jun 3, 2010, at 9:16 AM, Vadym Chepkov wrote:

> Hi,
> 
> Not sure what I am doing wrong
> 
> primitive dummy1 ocf:pacemaker:Dummy
> primitive dummy2 ocf:pacemaker:Dummy
> primitive dummy3 ocf:pacemaker:Dummy
> 
> only two nodes alive in 3 node cluster, so I can see two dummy resources 
> started on one node:
> 
> dummy2(ocf::pacemaker:Dummy): Started c20
> dummy1(ocf::pacemaker:Dummy): Started c20
> dummy3(ocf::pacemaker:Dummy): Started c22
> 
> Now I want to have only one resource running on one node at any given time, 
> so I created a constraint:
> 
> # crm configure show one-dummy
> colocation one-dummy -inf: ( dummy1 dummy2 dummy3 )
> 
> # cibadmin -Q -o constraints
> 
>  
>
>  
>  
>  
>
>  
> 
> 
> I would expect one of the dummies to be down at this point. But it is not.
> Maybe pacemaker can't decide which one, I thought, so I set a priority:
> 
> # crm resource dummy1 meta priority="10"
> 
> Still no dice.
> 
> pacemaker-1.0.8-6.1.el5
> 
> What am I missing?


I am sorry for being impatient, but it's a really critical bug (or lack of 
knowledge) for me :(
I can't prevent two resources to run on the same host for some reasons. I 
thought, ok,
I recall "groups" had some issues, so I completely stopped using that syntax

But I do have proper constraints, I believe, and resources are still got 
started on the same host :(

# crm configure show|grep only   
colocation only-one-sql -inf: _rsc_set_ ( fs_node00_sql fs_node01_sql )

# cibadmin -Q -o constraints
  

  
  

  

# crm_mon -1rf|grep sql
 fs_node00_sql  (ocf::heartbeat:Filesystem):Started c21
 fs_node01_sql  (ocf::heartbeat:Filesystem):Started c21
 pgsql_node00   (ocf::heartbeat:pgsql): Stopped 
 pgsql_node01   (ocf::heartbeat:pgsql): Started c21
   pgsql_node00: migration-threshold=100 fail-count=14 last-failure='Sat 
Jun  5 14:19:24 2010'

[r...@c21 ~]# df -k
Filesystem   1K-blocks  Used Available Use% Mounted on

/dev/mapper/node01-sql
       2031952 69740   1858996   4% /node01/sql
/dev/mapper/node00-sql
   2031952 87016   1841720   5% /node00/sql


Please, help

Thank you,
Vadym Chepkov
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] i stop mysql service but the crm status is still runing

2010-06-06 Thread Vadym Chepkov

On Jun 6, 2010, at 9:15 PM, ch huang wrote:

> mysql is running ,and crm status output is
> 
> 
> Last updated: Sat Jun  5 09:48:58 2010
> Stack: openais
> Current DC: PRIM - partition with quorum
> Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> 
> 
> Online: [ PRIM SEC ]
> 
>  Resource Group: mysql
>  fs_mysql   (ocf::heartbeat:Filesystem):Started PRIM
>  ip_mysql   (ocf::heartbeat:IPaddr2):   Started PRIM
>  mysqld (lsb:mysqld):   Started PRIM
>  Master/Slave Set: ms_drbd_mysql
>  Masters: [ PRIM ]
>  Slaves: [ SEC ]
> 
> and i finished the mysql by 
> 
> #service mysqld stop
> Stopping MySQL:[  OK  ]
> # service mysqld status
> mysqld is stopped
> 
> but in the crm status output , mysql still in running ,i do not understand 
> why?
> 
> 
> Last updated: Sat Jun  5 09:48:58 2010
> Stack: openais
> Current DC: PRIM - partition with quorum
> Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> 
> 
> Online: [ PRIM SEC ]
> 
>  Resource Group: mysql
>  fs_mysql   (ocf::heartbeat:Filesystem):Started PRIM
>  ip_mysql   (ocf::heartbeat:IPaddr2):   Started PRIM
>  mysqld (lsb:mysqld):   Started PRIM
>  Master/Slave Set: ms_drbd_mysql
>  Masters: [ PRIM ]
>  Slaves: [ SEC ]
> 
> and here is my configure
> 
> # crm
> crm(live)# configure
> crm(live)configure# show
> node PRIM
> node SEC
> primitive drbd_mysql ocf:linbit:drbd \
> params drbd_resource="r1" \
> op monitor interval="15s"
> primitive fs_mysql ocf:heartbeat:Filesystem \
> params device="/dev/drbd/by-res/r1" directory="/drbddata/" 
> fstype="ext3"
> primitive ip_mysql ocf:heartbeat:IPaddr2 \
> params ip="192.168.76.227" nic="eth0"
> primitive mysqld lsb:mysqld
> group mysql fs_mysql ip_mysql mysqld
> ms ms_drbd_mysql drbd_mysql \
> meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1" notify="true"
> colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
> order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
> property $id="cib-bootstrap-options" \
> no-quorum-policy="ignore" \
> stonith-enabled="false" \
> expected-quorum-votes="2" \
> dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
> cluster-infrastructure="openais" \
> default-action-timeout="240s"
> 


First of all, nothing is monitored by default, you need to enabled monitor 
operation.
Second, you were advised do not use lsb, use ocf resource agent instead:

in sql create a simple monitoring database

create database cluster;
use cluster;
create table monitor (int i);
insert into monitor values(1);
grant select on cluster.monitor to moni...@localhost identified by 'monitor';

it's just so the monitor script can do a select.

Then somewhat long primitive definition:

primitive mysqld ocf:heartbeat:mysql \
params binary="/usr/bin/mysqld_safe" 
config="/drbddata/mysql/etc/my.cnf" enable_creation="0" 
datadir="/drbddata/mysql/data" \
user="mysql" test_user="monitor" test_passwd="monitor" 
test_table="cluster.monitor" \
op monitor start-delay="60s" interval="300s"

Enjoy

Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] i stop mysql service but the crm status is still runing

2010-06-06 Thread Vadym Chepkov

On Jun 6, 2010, at 9:23 PM, ch huang wrote:

> i notice my drbd status is abnormal,it seems they can not find each other,and 
> i try to restart drbd ,but still can not find each other


Well, this has nothing to do with Pacemaker,  you would need to issue

crm resource stop drbd_mysql and then fix it by hands

by the way, in your configuration you don't need "start" drbd, linbit RA will 
do it for you.


> 
> before restart drbd
> [r...@prim init.d]# cat /proc/drbd
> version: 8.3.2 (api:88/proto:86-90)
> GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by 
> mockbu...@v20z-x86-64.home.local, 2009-08-29 14:02:24
> 
>  1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r
> ns:0 nr:0 dw:100 dr:2863 al:3 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:16384
> 
> [r...@sec ~]# cat /proc/drbd
> version: 8.3.2 (api:88/proto:86-90)
> GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by 
> mockbu...@v20z-x86-64.home.local, 2009-08-29 14:02:24
> 
>  1: cs:StandAlone ro:Secondary/Unknown ds:Outdated/DUnknown   r
> ns:0 nr:0 dw:0 dr:0 al:0 bm:18 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:270284
> 
> after restart drbd
> 
> [r...@prim init.d]# cat /proc/drbd
> version: 8.3.2 (api:88/proto:86-90)
> GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by 
> mockbu...@v20z-x86-64.home.local, 2009-08-29 14:02:24
> 
>  1: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r
> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:16384
> 
> [r...@sec ~]# cat /proc/drbd
> version: 8.3.2 (api:88/proto:86-90)
> GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by 
> mockbu...@v20z-x86-64.home.local, 2009-08-29 14:02:24
> 
>  1: cs:StandAlone ro:Secondary/Unknown ds:Outdated/DUnknown   r
> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:270284
> 
> 
> 
> 
> On Mon, Jun 7, 2010 at 9:15 AM, ch huang  wrote:
> mysql is running ,and crm status output is
> 
> 
> Last updated: Sat Jun  5 09:48:58 2010
> Stack: openais
> Current DC: PRIM - partition with quorum
> Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> 
> 
> Online: [ PRIM SEC ]
> 
>  Resource Group: mysql
>  fs_mysql   (ocf::heartbeat:Filesystem):Started PRIM
>  ip_mysql   (ocf::heartbeat:IPaddr2):   Started PRIM
>  mysqld (lsb:mysqld):   Started PRIM
>  Master/Slave Set: ms_drbd_mysql
>  Masters: [ PRIM ]
>  Slaves: [ SEC ]
> 
> and i finished the mysql by 
> 
> #service mysqld stop
> Stopping MySQL:[  OK  ]
> # service mysqld status
> mysqld is stopped
> 
> but in the crm status output , mysql still in running ,i do not understand 
> why?
> 
> 
> Last updated: Sat Jun  5 09:48:58 2010
> Stack: openais
> Current DC: PRIM - partition with quorum
> Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> 
> 
> Online: [ PRIM SEC ]
> 
>  Resource Group: mysql
>  fs_mysql   (ocf::heartbeat:Filesystem):Started PRIM
>  ip_mysql   (ocf::heartbeat:IPaddr2):   Started PRIM
>  mysqld (lsb:mysqld):   Started PRIM
>  Master/Slave Set: ms_drbd_mysql
>  Masters: [ PRIM ]
>  Slaves: [ SEC ]
> 
> and here is my configure
> 
> # crm
> crm(live)# configure
> crm(live)configure# show
> node PRIM
> node SEC
> primitive drbd_mysql ocf:linbit:drbd \
> params drbd_resource="r1" \
> op monitor interval="15s"

I am pretty sure you need to modify this to something like this:
 op monitor interval="59s" role="Master" timeout="30s" \
 op monitor interval="60s" role="Slave" timeout="30s"




> primitive fs_mysql ocf:heartbeat:Filesystem \
> params device="/dev/drbd/by-res/r1" directory="/drbddata/" 
> fstype="ext3"
> primitive ip_mysql ocf:heartbeat:IPaddr2 \
> params ip="192.168.76.227" nic="eth0"
> primitive mysqld lsb:mysqld
> group mysql fs_mysql ip_mysql mysqld
> ms ms_drbd_mysql drbd_mysql \
> meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1" notify="true"
> colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
> order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
> property $id="cib-bootstrap-options" \
> no-quorum-policy="ignore" \
> stonith-enabled="false" \
> expected-quorum-votes="2" \
> dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
> cluster-infrastructure="openais" \
> default-action-timeout="240s"
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

_

Re: [Pacemaker] i stop mysql service but the crm status is still runing

2010-06-06 Thread Vadym Chepkov

On Jun 6, 2010, at 9:44 PM, ch huang wrote:

> thanks for quick reply ,i do have monit on mysql ,mysql datadir is on drbd 
> ,as u see ,the drbd is defined as resource,but as your way,it sames just 
> monit  one host's service ,if the service is unavariable,pacemaker will try 
> to restart it on this host,but i want  mysql start on another backup host


No, you are not, this is your config, right?

> > primitive mysqld lsb:mysqld

no monitor operation is defined




> 
> On Mon, Jun 7, 2010 at 9:30 AM, Vadym Chepkov  wrote:
> 
> On Jun 6, 2010, at 9:15 PM, ch huang wrote:
> 
> > mysql is running ,and crm status output is
> >
> > 
> > Last updated: Sat Jun  5 09:48:58 2010
> > Stack: openais
> > Current DC: PRIM - partition with quorum
> > Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
> > 2 Nodes configured, 2 expected votes
> > 2 Resources configured.
> > 
> >
> > Online: [ PRIM SEC ]
> >
> >  Resource Group: mysql
> >  fs_mysql   (ocf::heartbeat:Filesystem):Started PRIM
> >  ip_mysql   (ocf::heartbeat:IPaddr2):   Started PRIM
> >  mysqld (lsb:mysqld):   Started PRIM
> >  Master/Slave Set: ms_drbd_mysql
> >  Masters: [ PRIM ]
> >  Slaves: [ SEC ]
> >
> > and i finished the mysql by
> >
> > #service mysqld stop
> > Stopping MySQL:[  OK  ]
> > # service mysqld status
> > mysqld is stopped
> >
> > but in the crm status output , mysql still in running ,i do not understand 
> > why?
> >
> > 
> > Last updated: Sat Jun  5 09:48:58 2010
> > Stack: openais
> > Current DC: PRIM - partition with quorum
> > Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
> > 2 Nodes configured, 2 expected votes
> > 2 Resources configured.
> > 
> >
> > Online: [ PRIM SEC ]
> >
> >  Resource Group: mysql
> >  fs_mysql   (ocf::heartbeat:Filesystem):Started PRIM
> >  ip_mysql   (ocf::heartbeat:IPaddr2):   Started PRIM
> >  mysqld (lsb:mysqld):   Started PRIM
> >  Master/Slave Set: ms_drbd_mysql
> >  Masters: [ PRIM ]
> >  Slaves: [ SEC ]
> >
> > and here is my configure
> >
> > # crm
> > crm(live)# configure
> > crm(live)configure# show
> > node PRIM
> > node SEC
> > primitive drbd_mysql ocf:linbit:drbd \
> > params drbd_resource="r1" \
> > op monitor interval="15s"
> > primitive fs_mysql ocf:heartbeat:Filesystem \
> > params device="/dev/drbd/by-res/r1" directory="/drbddata/" 
> > fstype="ext3"
> > primitive ip_mysql ocf:heartbeat:IPaddr2 \
> > params ip="192.168.76.227" nic="eth0"
> > primitive mysqld lsb:mysqld
> > group mysql fs_mysql ip_mysql mysqld
> > ms ms_drbd_mysql drbd_mysql \
> > meta master-max="1" master-node-max="1" clone-max="2" 
> > clone-node-max="1" notify="true"
> > colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
> > order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
> > property $id="cib-bootstrap-options" \
> > no-quorum-policy="ignore" \
> > stonith-enabled="false" \
> > expected-quorum-votes="2" \
> > dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
> > cluster-infrastructure="openais" \
> > default-action-timeout="240s"
> >
> 
> 
> First of all, nothing is monitored by default, you need to enabled monitor 
> operation.
> Second, you were advised do not use lsb, use ocf resource agent instead:
> 
> in sql create a simple monitoring database
> 
> create database cluster;
> use cluster;
> create table monitor (int i);
> insert into monitor values(1);
> grant select on cluster.monitor to moni...@localhost identified by 'monitor';
> 
> it's just so the monitor script can do a select.
> 
> Then somewhat long primitive definition:
> 
> primitive mysqld ocf:heartbeat:mysql \
>params binary="/usr/bin/mysqld_safe" 
> config="/drbddata/mysql/etc/my.cnf" enable_creation="0" 
> datadir="/drbddata/mysql/data" \
>user="mysql" test_user="monitor" test_passwd="monitor" 
> test_table="cluster.monitor" \
>op monitor start-delay="60s&qu

Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-07 Thread Vadym Chepkov

On Jun 7, 2010, at 5:05 AM, Dejan Muhamedagic wrote:

> Hi,
> 
> On Sat, Jun 05, 2010 at 10:38:17AM -0400, Vadym Chepkov wrote:
>> 
>> On Jun 3, 2010, at 9:16 AM, Vadym Chepkov wrote:
>> 
>>> Hi,
>>> 
>>> Not sure what I am doing wrong
>>> 
>>> primitive dummy1 ocf:pacemaker:Dummy
>>> primitive dummy2 ocf:pacemaker:Dummy
>>> primitive dummy3 ocf:pacemaker:Dummy
>>> 
>>> only two nodes alive in 3 node cluster, so I can see two dummy resources 
>>> started on one node:
>>> 
>>> dummy2  (ocf::pacemaker:Dummy): Started c20
>>> dummy1  (ocf::pacemaker:Dummy): Started c20
>>> dummy3  (ocf::pacemaker:Dummy): Started c22
>>> 
>>> Now I want to have only one resource running on one node at any given time, 
>>> so I created a constraint:
>>> 
>>> # crm configure show one-dummy
>>> colocation one-dummy -inf: ( dummy1 dummy2 dummy3 )
>>> 
>>> # cibadmin -Q -o constraints
>>> 
>>> 
>>>   
>>> 
>>> 
>>> 
>>>   
>>> 
>>> 
>>> 
>>> I would expect one of the dummies to be down at this point. But it is not.
>>> Maybe pacemaker can't decide which one, I thought, so I set a priority:
>>> 
>>> # crm resource dummy1 meta priority="10"
>>> 
>>> Still no dice.
>>> 
>>> pacemaker-1.0.8-6.1.el5
>>> 
>>> What am I missing?
>> 
>> 
>> I am sorry for being impatient, but it's a really critical bug (or lack of 
>> knowledge) for me :(
>> I can't prevent two resources to run on the same host for some reasons. I 
>> thought, ok,
>> I recall "groups" had some issues, so I completely stopped using that syntax
>> 
>> But I do have proper constraints, I believe, and resources are still got 
>> started on the same host :(
>> 
>> # crm configure show|grep only   
>> colocation only-one-sql -inf: _rsc_set_ ( fs_node00_sql fs_node01_sql )
> 
> If you convert this to a normal two-resource constrain does that
> work? If so, then this seems to be a problem with resource sets
> and you should file a bugzilla. If there are not too many
> resources, you can use a chain of two-resource constraints.

Chains can't be used, if fs_node01_sql won't be available then fs_node00_sql 
won't be available too :(

I filed bug 2435, glad to hear "it's not me"

By the way, inf: is also broken in resource set and something trivial as this 
also doesn't work:

node c19.chepkov.lan
node c20.chepkov.lan
node c21.chepkov.lan
primitive dummy1 ocf:pacemaker:Dummy
primitive dummy2 ocf:pacemaker:Dummy
primitive dummy3 ocf:pacemaker:Dummy
primitive dummy4 ocf:pacemaker:Dummy
colocation just-do-it inf: dummy4 ( dummy1 dummy2 dummy3 )

 dummy1 (ocf::pacemaker:Dummy): Started c19.chepkov.lan
 dummy2 (ocf::pacemaker:Dummy): Started c19.chepkov.lan
 dummy3 (ocf::pacemaker:Dummy): Started c21.chepkov.lan
 dummy4 (ocf::pacemaker:Dummy): Stopped 

dummy4 have never had a chance :(

Vadym


> 
> Thanks,
> 
> Dejan
> 
> 
>> # cibadmin -Q -o constraints
>>  
>>
>>  
>>  
>>
>>  
>> 
>> # crm_mon -1rf|grep sql
>> fs_node00_sql(ocf::heartbeat:Filesystem):Started c21
>> fs_node01_sql(ocf::heartbeat:Filesystem):Started c21
>> pgsql_node00 (ocf::heartbeat:pgsql): Stopped 
>> pgsql_node01 (ocf::heartbeat:pgsql): Started c21
>>   pgsql_node00: migration-threshold=100 fail-count=14 last-failure='Sat 
>> Jun  5 14:19:24 2010'
>> 
>> [r...@c21 ~]# df -k
>> Filesystem   1K-blocks  Used Available Use% Mounted on
>> 
>> /dev/mapper/node01-sql
>>   2031952 69740   1858996   4% /node01/sql
>> /dev/mapper/node00-sql
>>   2031952 87016   1841720   5% /node00/sql
>> 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] How to replace an agent

2010-06-10 Thread Vadym Chepkov
Hi,

I stumbled upon interesting feature or a bug, not sure how to classify it.

I needed to add a resource to a cluster and since it didn't have native RA, I 
used 'anything' RA while I was working on a new script. When new script was 
ready I stopped the running resource and edited it's definition to utilize a 
new RA.
But when I tried to start it, I could see in the log it was still trying to run 
'anything' script instead my new one.
How should I have handled this? Delete/add a new definition would remove all 
constraints as well.
What would be the proper way to handle such modification?

Thank you,
Vadym Chepkov
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] How to replace an agent

2010-06-11 Thread Vadym Chepkov

On Jun 10, 2010, at 9:03 AM, Dejan Muhamedagic wrote:

> Hi,
> 
> On Thu, Jun 10, 2010 at 08:46:22AM -0400, Vadym Chepkov wrote:
>> Hi,
>> 
>> I stumbled upon interesting feature or a bug, not sure how to classify it.
>> 
>> I needed to add a resource to a cluster and since it didn't have native RA, 
>> I used 'anything' RA while I was working on a new script. When new script 
>> was ready I stopped the running resource and edited it's definition to 
>> utilize a new RA.
>> But when I tried to start it, I could see in the log it was still trying to 
>> run 'anything' script instead my new one.
>> How should I have handled this? Delete/add a new definition would remove all 
>> constraints as well.
>> What would be the proper way to handle such modification?
> 
> I think that it should work like that. Can you open a bugzilla
> and attach a hb_report.
> 


I found what I did wrong. When I loaded new resource definition I also removed 
meta target-role="Stopped".
pacemaker decided it needs to restart the resource because the has changed. But 
it also using params definitions of the new agent, which are not valid anymore.

If you still think it's still a bug, I can file the bugzilla, I have the 
hb_report.

Thanks,
Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] crm node delete

2010-06-11 Thread Vadym Chepkov

On Jun 11, 2010, at 10:45 AM, Maros Timko wrote:

> Hi all,
> 
> using heartbeat stack. I have a system with one node offline:
> 
> Last updated: Fri Jun 11 13:52:40 2010
> Stack: Heartbeat
> Current DC: vsp7.example.com (ba6d6332-71dd-465b-a030-227bcd31a25f) -
> partition with quorum
> Version: 1.0.7-d3fa20fc76c7947d6de66db7e52526dc6bd7d782
> 2 Nodes configured, 2 expected votes
> 3 Resources configured.
> 
> 
> Online: [ vsp7.example.com ]
> OFFLINE: [ vsp8.example.com ]
> 
> If I try to remove this offline node, I get:
> [r...@vsp7 ~]# crm node delete vsp8.example.com
> WARNING: crm_node bad format:
> ERROR: node vsp8.example.com/state "lost" not found in the id list
> INFO: check output of crm_node -l
> [r...@vsp7 ~]# crm_node -l
> [r...@vsp7 ~]# echo $?
> 0
> [r...@vsp7 ~]# crm_node --list
> [r...@vsp7 ~]# echo $?
> 0
> [r...@vsp7 ~]# crm configure show
> node $id="ba6d6332-71dd-465b-a030-227bcd31a25f" vsp7.example.com
> node $id="edc0ba6f-017f-424e-9dbf-302021a2cbce" vsp8.example.com
> 
> Pacemaker explained suggests to use lower level commands for both HA and AIS:
> cibadmin --delete --obj_type nodes --crm_xml ''
> cibadmin --delete --obj_type status --crm_xml ''
> 
> [r...@vsp7 ~]# crm_node --help | grep list
>  -l, --list   (AIS-Only) Display all known members (past and present)
> of this cluster
> 
> So what is the truth of "crm node delete", is it supported for heartbeat or 
> not?

it didn't work like this for me either.
what I did instead, I run crm configure edit and deleted node definition there.
after that node was gone

Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] How to replace an agent

2010-06-11 Thread Vadym Chepkov

On Jun 11, 2010, at 10:09 AM, Dejan Muhamedagic wrote:
>> 
>> I found what I did wrong. When I loaded new resource definition
>> I also removed meta target-role="Stopped".
>> pacemaker decided it needs to restart the resource because the
>> has changed. But it also using params definitions of the new
>> agent, which are not valid anymore.
> 
> The stop operation should be run _before_ changing the resource
> definition. I think that this was fixed quite some time ago.
> 
>> If you still think it's still a bug, I can file the bugzilla, I
>> have the hb_report.
> 
> Yes, please.
> 
> Thanks,
> 
> Dejan


I compiled pacemaker stable-1.0 tip and it works as expected, thanks

Vadym

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-14 Thread Vadym Chepkov

On Jun 7, 2010, at 8:04 AM, Vadym Chepkov wrote:
> 
> I filed bug 2435, glad to hear "it's not me"
> 


Andrew closed this bug 
(http://developerbugs.linux-foundation.org/show_bug.cgi?id=2435) as resolved, 
but I respectfully disagree.

I will try to explain a problem again in this list.

lets assume you want to have several resources running on the same node.
They are independent, so if one is going down, others shouldn't be stopped.
You would do this by using a resource set, like this:

primitive dummy1 ocf:pacemaker:Dummy
primitive dummy2 ocf:pacemaker:Dummy
primitive dummy3 ocf:pacemaker:Dummy
colocation together inf: ( dummy1 dummy2 dummy3 )

and I expect them to run on the same host, but they are not and I attached 
hb_report to the case to prove it.

Andrew closed it with the comment "Thats because you have sequential="false" 
for the colocation set." 
But sequential="false" means doesn't matter what order do they start. 
colocation still has to be honored.

If I am wrong, what syntax I should use to achieve the described configuration?

Thank you,
Vadym Chepkov



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] VirtualDomain/DRBD live migration with pacemaker...

2010-06-14 Thread Vadym Chepkov
On Mon, Jun 14, 2010 at 4:37 PM, Erich Weiler  wrote:
> Hi All,
>
> We have this interesting problem I was hoping someone could shed some light
> on.  Basically, we have 2 servers acting as a pacemaker cluster for DRBD and
> VirtualDomain (KVM) resources under CentOS 5.5.
>
> As it is set up, if one node dies, the other node promotes the DRBD devices
> to "Master", then starts up the VMs there (there is one DRBD device for each
> VM).  This works great.  I set the 'resource-stickiness="100"', and the vm
> resource score is 50, such that if a VM migrates to the other server, it
> will stay there until I specifically move it back manually.
>
> Now...  In the event of a failure of one server, all the VMs go to the other
> server.  When I fix the broken server and bring it back online, the VMs do
> not migrate back automatically because of the scoring I mentioned above.  I
> wanted this because when the VM goes back, it essentially has to shut down,
> then reboot on the other node.  I'm trying to avoid the 'shut down' part of
> it and do a live migration back to the first server.  But, I cannot figure
> out the exact sequence of events to do this in such that pacemaker will not
> reboot the VM somewhere in the process.  This is my configuration, with one
> VM called 'caweb':
>
> node vmserver1
> node vmserver2
> primitive caweb-vd ocf:heartbeat:VirtualDomain \
>        params config="/etc/libvirt/qemu/caweb.xml"
> hypervisor="qemu:///system" \
>        meta allow-migrate="false" target-role="Started" \
>        op start interval="0" timeout="120s" \
>        op stop interval="0" timeout="120s" \
>        op monitor interval="10" timeout="30" depth="0"
> primitive drbd-caweb ocf:linbit:drbd \
>        params drbd_resource="caweb" \
>        op monitor interval="15s" \
>        op start interval="0" timeout="240s" \
>        op stop interval="0" timeout="100s"
> ms ms-drbd-caweb drbd-caweb \
>        meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" target-role="Started"
> location caweb-prefers-vmserver1 caweb-vd 50: vmserver1
> colocation caweb-vd-on-drbd inf: caweb-vd ms-drbd-caweb:Master
> order caweb-after-drbd inf: ms-drbd-caweb:promote caweb-vd:start
> property $id="cib-bootstrap-options" \
>        dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
>        cluster-infrastructure="openais" \
>        expected-quorum-votes="2" \
>        stonith-enabled="false" \
>        no-quorum-policy="ignore" \
>        last-lrm-refresh="1276538859"
> rsc_defaults $id="rsc-options" \
>        resource-stickiness="100"
>
> One thing I tried, in an effort to do a live migration from vmserver2 to
> vmserver1 and afterward tell pacemaker to 're-acquire' the current state of
> things without a VM reboot, was:
>
> vmserver1# crm resource unmanage caweb-vd
> vmserver1# crm resource unmanage ms-drbd-caweb
> vmserver1# drbdadm primary caweb   <--make dual primary
>
> (then back on vmserver2...)
>
> vmserver2# virsh migrate --live caweb qemu+ssh://hgvmserver1.local/system
> vmserver2# drbdadm secondary caweb  <--disable dual primary
> vmserver2# crm resource manage ms-drbd-caweb
> vmserver2# crm resource manage caweb-vd
> vmserver2# crm resource cleanup ms-drbd-caweb
> vmserver2# crm resource cleanup caweb-vd
> vmserver2# crm resource refresh
> vmserver2# crm resource reprobe
> vmserver2# crm resource start caweb-vd
>
> at this point the VM has live migrated and is still online.
>
> [wait 120 seconds for caweb-vd start timeouts to expire]
>
> For a moment I thought it had worked, but then pacemaker put the device in
> an error mode and it was shut down...  After bringing a resource(s) back
> into 'managed' mode, is there any way to tell pacemaker to 'figure things
> out' without restarting the resources?  Or is this impossible because the VM
> resources is dependent on the DRBD resource, and it has trouble figuring out
> stacked resources without restarting them?
>
> Or - does anyone know another way to manually live migrate a
> pacemaker/VirtualDomain managed VM (with DRBD) without having to reboot the
> VM after the live migrate?
>
> Thanks in advance for any clues!!  BTW, I am using pacemaker 1.0.8 and DRBD
> 83.


I know what the problem is, how to solve it, that's another issue :)
In order to be able to do live migration you have to be able to access
the same storage from two different nodes at the time of migration.
So, you have to add allow-two-primaries to your DRBD definition, and also
options drbd disable_sendpage=1
into /etc/modprobe.conf

You don't have much of a choice here (at least I don't know one), but
to run drbd as primary/primary (master-max="2" master-node-max="1")
all the time
and to hope cluster will prevent running two KVM at the same time.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: h

Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-15 Thread Vadym Chepkov

On Jun 15, 2010, at 4:57 AM, Andrew Beekhof wrote:

> On Tue, Jun 15, 2010 at 10:23 AM, Andreas Kurz  
> wrote:
>> On Tuesday 15 June 2010 08:40:58 Andrew Beekhof wrote:
>>> On Mon, Jun 14, 2010 at 4:22 PM, Vadym Chepkov  wrote:
>>>> On Jun 7, 2010, at 8:04 AM, Vadym Chepkov wrote:
>>>>> I filed bug 2435, glad to hear "it's not me"
>>>> 
>>>> Andrew closed this bug
>>>> (http://developerbugs.linux-foundation.org/show_bug.cgi?id=2435) as
>>>> resolved, but I respectfully disagree.
>>>> 
>>>> I will try to explain a problem again in this list.
>>>> 
>>>> lets assume you want to have several resources running on the same node.
>>>> They are independent, so if one is going down, others shouldn't be
>>>> stopped. You would do this by using a resource set, like this:
>>>> 
>>>> primitive dummy1 ocf:pacemaker:Dummy
>>>> primitive dummy2 ocf:pacemaker:Dummy
>>>> primitive dummy3 ocf:pacemaker:Dummy
>>>> colocation together inf: ( dummy1 dummy2 dummy3 )
>>>> 
>>>> and I expect them to run on the same host, but they are not and I
>>>> attached hb_report to the case to prove it.
>>>> 
>>>> Andrew closed it with the comment "Thats because you have
>>>> sequential="false" for the colocation set." But sequential="false" means
>>>> doesn't matter what order do they start.
>>> 
>>> No.  Thats not what it means.
>>> And I believe I should know.
>>> 
>>> It means that the members of the set are NOT collocated with each
>>> other, only with any preceding set.
>> 
>> Just for clarification:
>> 
>> colocation together inf: ( dummy1 dummy2 dummy3 ) dummy4
>> 
>>  is a shortcut for:
>> 
>> colocation together1 inf: dummy4 dummy1
>> colocation together1 inf: dummy4 dummy2
>> colocation together1 inf: dummy4 dummy3
>> 
>> ... is that correct?
> 
> Only if sequential != false.
> For some reason the shell appears to be setting that by default.
> 
>> 
>> To pick up Vadym's Question:
>> 
>> *  what would be the correct syntax to say 
>> "run-together-but-dont-care-if-one-
>> dies-or-is-not-runable"?
> 
> Choose a score < inf, just like regular colocation constraints.

Ah, ok, thanks, I guess in my mind anything less then inf was "advisory".
As long as I keep it above any resource-stickiness it should be in fact 
mandatory, right? 
Or something else needs to be taken to consideration?

On a side note, I was trying to figure out how to make a set from two 
resources, so I just added a proper xml and checked what crm shell say about 
it. And it shows it like this: 

colocation together 5000: _rsc_set_ dummy1 dummy2

Who knew? I didn't see it anywhere in documentation.

Anyway, just so I get it right, what would be the opposite constraint (which is 
what this thread started from)
If I want to have same dummy1, dummy2, dummy3 resources, but I don't want any 
of them ever to run simultaneously on the same host. What wold be the proper 
anti-colocation constraint for this configuration? 

Thanks,
Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-15 Thread Vadym Chepkov

On Jun 15, 2010, at 6:14 AM, Dejan Muhamedagic wrote:

> Hi,
> 
> On Tue, Jun 15, 2010 at 10:57:47AM +0200, Andrew Beekhof wrote:
>> On Tue, Jun 15, 2010 at 10:23 AM, Andreas Kurz  
>> wrote:
>>> On Tuesday 15 June 2010 08:40:58 Andrew Beekhof wrote:
>>>> On Mon, Jun 14, 2010 at 4:22 PM, Vadym Chepkov  wrote:
>>>>> On Jun 7, 2010, at 8:04 AM, Vadym Chepkov wrote:
>>>>>> I filed bug 2435, glad to hear "it's not me"
>>>>> 
>>>>> Andrew closed this bug
>>>>> (http://developerbugs.linux-foundation.org/show_bug.cgi?id=2435) as
>>>>> resolved, but I respectfully disagree.
>>>>> 
>>>>> I will try to explain a problem again in this list.
>>>>> 
>>>>> lets assume you want to have several resources running on the same node.
>>>>> They are independent, so if one is going down, others shouldn't be
>>>>> stopped. You would do this by using a resource set, like this:
>>>>> 
>>>>> primitive dummy1 ocf:pacemaker:Dummy
>>>>> primitive dummy2 ocf:pacemaker:Dummy
>>>>> primitive dummy3 ocf:pacemaker:Dummy
>>>>> colocation together inf: ( dummy1 dummy2 dummy3 )
>>>>> 
>>>>> and I expect them to run on the same host, but they are not and I
>>>>> attached hb_report to the case to prove it.
>>>>> 
>>>>> Andrew closed it with the comment "Thats because you have
>>>>> sequential="false" for the colocation set." But sequential="false" means
>>>>> doesn't matter what order do they start.
>>>> 
>>>> No.  Thats not what it means.
>>>> And I believe I should know.
>>>> 
>>>> It means that the members of the set are NOT collocated with each
>>>> other, only with any preceding set.
>>> 
>>> Just for clarification:
>>> 
>>> colocation together inf: ( dummy1 dummy2 dummy3 ) dummy4
>>> 
>>>  is a shortcut for:
>>> 
>>> colocation together1 inf: dummy4 dummy1
>>> colocation together1 inf: dummy4 dummy2
>>> colocation together1 inf: dummy4 dummy3
>>> 
>>> ... is that correct?
>> 
>> Only if sequential != false.
> 
> You wanted to say "sequential == false"?
> 
>> For some reason the shell appears to be setting that by default.
> 
> This is sequential == false:
> 
> colocation together inf: ( dummy1 dummy2 dummy3 ) dummy4
> 
> This is sequential == true:
> 
> colocation together inf: dummy1 dummy2 dummy3 dummy4
> 
> Thanks,
> 
> Dejan


I guess colocation syntax needs to be expanded to allow something like this

colocation only-one -inf: (dummy1 dummy2 sequential="true")

colocation together 5000: (dummy1 dummy2 sequential="true")

Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-15 Thread Vadym Chepkov

On Jun 15, 2010, at 8:11 AM, Gianluca Cecchi wrote:

> On Tue, Jun 15, 2010 at 1:50 PM, Andrew Beekhof  wrote:
> [snip]
> 
> Score = -inf, plus the patch, plus sequential = true (or unset).
> Not sure how that looks in shell syntax though.
> 
> 
> Which patch?

http://hg.clusterlabs.org/pacemaker/1.1/rev/033609d20c62

Vadym

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-15 Thread Vadym Chepkov

On Jun 15, 2010, at 7:50 AM, Andrew Beekhof wrote:

> On Tue, Jun 15, 2010 at 1:38 PM, Vadym Chepkov  wrote:
>> 
>> On Jun 15, 2010, at 4:57 AM, Andrew Beekhof wrote:
>> 
>>> On Tue, Jun 15, 2010 at 10:23 AM, Andreas Kurz  
>>> wrote:
>>>> On Tuesday 15 June 2010 08:40:58 Andrew Beekhof wrote:
>>>>> On Mon, Jun 14, 2010 at 4:22 PM, Vadym Chepkov  wrote:
>>>>>> On Jun 7, 2010, at 8:04 AM, Vadym Chepkov wrote:
>>>>>>> I filed bug 2435, glad to hear "it's not me"
>>>>>> 
>>>>>> Andrew closed this bug
>>>>>> (http://developerbugs.linux-foundation.org/show_bug.cgi?id=2435) as
>>>>>> resolved, but I respectfully disagree.
>>>>>> 
>>>>>> I will try to explain a problem again in this list.
>>>>>> 
>>>>>> lets assume you want to have several resources running on the same node.
>>>>>> They are independent, so if one is going down, others shouldn't be
>>>>>> stopped. You would do this by using a resource set, like this:
>>>>>> 
>>>>>> primitive dummy1 ocf:pacemaker:Dummy
>>>>>> primitive dummy2 ocf:pacemaker:Dummy
>>>>>> primitive dummy3 ocf:pacemaker:Dummy
>>>>>> colocation together inf: ( dummy1 dummy2 dummy3 )
>>>>>> 
>>>>>> and I expect them to run on the same host, but they are not and I
>>>>>> attached hb_report to the case to prove it.
>>>>>> 
>>>>>> Andrew closed it with the comment "Thats because you have
>>>>>> sequential="false" for the colocation set." But sequential="false" means
>>>>>> doesn't matter what order do they start.
>>>>> 
>>>>> No.  Thats not what it means.
>>>>> And I believe I should know.
>>>>> 
>>>>> It means that the members of the set are NOT collocated with each
>>>>> other, only with any preceding set.
>>>> 
>>>> Just for clarification:
>>>> 
>>>> colocation together inf: ( dummy1 dummy2 dummy3 ) dummy4
>>>> 
>>>>  is a shortcut for:
>>>> 
>>>> colocation together1 inf: dummy4 dummy1
>>>> colocation together1 inf: dummy4 dummy2
>>>> colocation together1 inf: dummy4 dummy3
>>>> 
>>>> ... is that correct?
>>> 
>>> Only if sequential != false.
>>> For some reason the shell appears to be setting that by default.
>>> 
>>>> 
>>>> To pick up Vadym's Question:
>>>> 
>>>> *  what would be the correct syntax to say 
>>>> "run-together-but-dont-care-if-one-
>>>> dies-or-is-not-runable"?
>>> 
>>> Choose a score < inf, just like regular colocation constraints.
>> 
>> Ah, ok, thanks, I guess in my mind anything less then inf was "advisory".
> 
> They are.
> 
> Advisory is the only way to deal with the
> "but-dont-care-if-one-dies-or-is-not-runable" part.
> 
>> As long as I keep it above any resource-stickiness it should be in fact 
>> mandatory, right?
>> Or something else needs to be taken to consideration?

what about this part? what do I need to do to prevent them from running on 
different nodes for sure?


>> 
>> On a side note, I was trying to figure out how to make a set from two 
>> resources, so I just added a proper xml and checked what crm shell say about 
>> it. And it shows it like this:
>> 
>> colocation together 5000: _rsc_set_ dummy1 dummy2
> 
> Strange.  Dejan?
> 
>> 
>> Who knew? I didn't see it anywhere in documentation.
>> 
>> Anyway, just so I get it right, what would be the opposite constraint (which 
>> is what this thread started from)
>> If I want to have same dummy1, dummy2, dummy3 resources, but I don't want 
>> any of them ever to run simultaneously on the same host. What wold be the 
>> proper anti-colocation constraint for this configuration?
> 
> Score = -inf, plus the patch, plus sequential = true (or unset).
> Not sure how that looks in shell syntax though.

My guess is for two resources it's
colocation onlyone -inf: _rsc_set_ dummy1 dummy2

and a patch. Would you include it in 1.0.9, by any chance? 

Thank you,
Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-15 Thread Vadym Chepkov
On Tue, Jun 15, 2010 at 9:14 AM, Andrew Beekhof  wrote:
> On Tue, Jun 15, 2010 at 2:57 PM, Vadym Chepkov  wrote:
>>
>> On Jun 15, 2010, at 7:50 AM, Andrew Beekhof wrote:
>>
>>> On Tue, Jun 15, 2010 at 1:38 PM, Vadym Chepkov  wrote:
>>>>
>>>> On Jun 15, 2010, at 4:57 AM, Andrew Beekhof wrote:
>>>>
>>>>> On Tue, Jun 15, 2010 at 10:23 AM, Andreas Kurz  
>>>>> wrote:
>>>>>> On Tuesday 15 June 2010 08:40:58 Andrew Beekhof wrote:
>>>>>>> On Mon, Jun 14, 2010 at 4:22 PM, Vadym Chepkov  
>>>>>>> wrote:
>>>>>>>> On Jun 7, 2010, at 8:04 AM, Vadym Chepkov wrote:
>>>>>>>>> I filed bug 2435, glad to hear "it's not me"
>>>>>>>>
>>>>>>>> Andrew closed this bug
>>>>>>>> (http://developerbugs.linux-foundation.org/show_bug.cgi?id=2435) as
>>>>>>>> resolved, but I respectfully disagree.
>>>>>>>>
>>>>>>>> I will try to explain a problem again in this list.
>>>>>>>>
>>>>>>>> lets assume you want to have several resources running on the same 
>>>>>>>> node.
>>>>>>>> They are independent, so if one is going down, others shouldn't be
>>>>>>>> stopped. You would do this by using a resource set, like this:
>>>>>>>>
>>>>>>>> primitive dummy1 ocf:pacemaker:Dummy
>>>>>>>> primitive dummy2 ocf:pacemaker:Dummy
>>>>>>>> primitive dummy3 ocf:pacemaker:Dummy
>>>>>>>> colocation together inf: ( dummy1 dummy2 dummy3 )
>>>>>>>>
>>>>>>>> and I expect them to run on the same host, but they are not and I
>>>>>>>> attached hb_report to the case to prove it.
>>>>>>>>
>>>>>>>> Andrew closed it with the comment "Thats because you have
>>>>>>>> sequential="false" for the colocation set." But sequential="false" 
>>>>>>>> means
>>>>>>>> doesn't matter what order do they start.
>>>>>>>
>>>>>>> No.  Thats not what it means.
>>>>>>> And I believe I should know.
>>>>>>>
>>>>>>> It means that the members of the set are NOT collocated with each
>>>>>>> other, only with any preceding set.
>>>>>>
>>>>>> Just for clarification:
>>>>>>
>>>>>> colocation together inf: ( dummy1 dummy2 dummy3 ) dummy4
>>>>>>
>>>>>>  is a shortcut for:
>>>>>>
>>>>>> colocation together1 inf: dummy4 dummy1
>>>>>> colocation together1 inf: dummy4 dummy2
>>>>>> colocation together1 inf: dummy4 dummy3
>>>>>>
>>>>>> ... is that correct?
>>>>>
>>>>> Only if sequential != false.
>>>>> For some reason the shell appears to be setting that by default.
>>>>>
>>>>>>
>>>>>> To pick up Vadym's Question:
>>>>>>
>>>>>> *  what would be the correct syntax to say 
>>>>>> "run-together-but-dont-care-if-one-
>>>>>> dies-or-is-not-runable"?
>>>>>
>>>>> Choose a score < inf, just like regular colocation constraints.
>>>>
>>>> Ah, ok, thanks, I guess in my mind anything less then inf was "advisory".
>>>
>>> They are.
>>>
>>> Advisory is the only way to deal with the
>>> "but-dont-care-if-one-dies-or-is-not-runable" part.
>>>
>>>> As long as I keep it above any resource-stickiness it should be in fact 
>>>> mandatory, right?
>>>> Or something else needs to be taken to consideration?
>>
>> what about this part? what do I need to do to prevent them from running on 
>> different nodes for sure?
>
> You can't have it both ways.
> Either they have to run on the same node or they can remain active
> when one or more die.
>
> Although you could do:
>
> d1 ( d2 d3 d4 )
>
> That would almost get what you want, unless d1 dies.

I guess I would have to keep the most significant as an anchor, I can
leave with it.
Unfortunately, as far as I understand, there 

Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-15 Thread Vadym Chepkov

On Jun 15, 2010, at 9:26 AM, Vadym Chepkov wrote:
>>> 
>>> what about this part? what do I need to do to prevent them from running on 
>>> different nodes for sure?
>> 
>> You can't have it both ways.
>> Either they have to run on the same node or they can remain active
>> when one or more die.
>> 
>> Although you could do:
>> 
>> d1 ( d2 d3 d4 )
>> 
>> That would almost get what you want, unless d1 dies.
> 
> I guess I would have to keep the most significant as an anchor, I can
> leave with it.
> Unfortunately, as far as I understand, there is no way do define this
> in shell config now, because shell adds sequential=false when it sees
> ().
> 

Actually, it can be done :)

dummy1 (d1 d2 d3 d4)
dummy1 will serve as an anchor which will never fail.
Not an elegant, but working solution, and the only one for two resources set.
Just need support from shell to describe it properly.
Goes to my HOWTO :)

Thanks,
Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-15 Thread Vadym Chepkov

On Jun 15, 2010, at 3:36 PM, Dejan Muhamedagic wrote:

> Hi,
> 
> On Tue, Jun 15, 2010 at 08:45:37AM -0400, Vadym Chepkov wrote:
>> 
>> On Jun 15, 2010, at 6:14 AM, Dejan Muhamedagic wrote:
>> 
>>> Hi,
>>> 
>>> On Tue, Jun 15, 2010 at 10:57:47AM +0200, Andrew Beekhof wrote:
>>>> On Tue, Jun 15, 2010 at 10:23 AM, Andreas Kurz  
>>>> wrote:
>>>>> On Tuesday 15 June 2010 08:40:58 Andrew Beekhof wrote:
>>>>>> On Mon, Jun 14, 2010 at 4:22 PM, Vadym Chepkov  
>>>>>> wrote:
>>>>>>> On Jun 7, 2010, at 8:04 AM, Vadym Chepkov wrote:
>>>>>>>> I filed bug 2435, glad to hear "it's not me"
>>>>>>> 
>>>>>>> Andrew closed this bug
>>>>>>> (http://developerbugs.linux-foundation.org/show_bug.cgi?id=2435) as
>>>>>>> resolved, but I respectfully disagree.
>>>>>>> 
>>>>>>> I will try to explain a problem again in this list.
>>>>>>> 
>>>>>>> lets assume you want to have several resources running on the same node.
>>>>>>> They are independent, so if one is going down, others shouldn't be
>>>>>>> stopped. You would do this by using a resource set, like this:
>>>>>>> 
>>>>>>> primitive dummy1 ocf:pacemaker:Dummy
>>>>>>> primitive dummy2 ocf:pacemaker:Dummy
>>>>>>> primitive dummy3 ocf:pacemaker:Dummy
>>>>>>> colocation together inf: ( dummy1 dummy2 dummy3 )
>>>>>>> 
>>>>>>> and I expect them to run on the same host, but they are not and I
>>>>>>> attached hb_report to the case to prove it.
>>>>>>> 
>>>>>>> Andrew closed it with the comment "Thats because you have
>>>>>>> sequential="false" for the colocation set." But sequential="false" means
>>>>>>> doesn't matter what order do they start.
>>>>>> 
>>>>>> No.  Thats not what it means.
>>>>>> And I believe I should know.
>>>>>> 
>>>>>> It means that the members of the set are NOT collocated with each
>>>>>> other, only with any preceding set.
>>>>> 
>>>>> Just for clarification:
>>>>> 
>>>>> colocation together inf: ( dummy1 dummy2 dummy3 ) dummy4
>>>>> 
>>>>>  is a shortcut for:
>>>>> 
>>>>> colocation together1 inf: dummy4 dummy1
>>>>> colocation together1 inf: dummy4 dummy2
>>>>> colocation together1 inf: dummy4 dummy3
>>>>> 
>>>>> ... is that correct?
>>>> 
>>>> Only if sequential != false.
>>> 
>>> You wanted to say "sequential == false"?
>>> 
>>>> For some reason the shell appears to be setting that by default.
>>> 
>>> This is sequential == false:
>>> 
>>> colocation together inf: ( dummy1 dummy2 dummy3 ) dummy4
>>> 
>>> This is sequential == true:
>>> 
>>> colocation together inf: dummy1 dummy2 dummy3 dummy4
>>> 
>>> Thanks,
>>> 
>>> Dejan
>> 
>> 
>> I guess colocation syntax needs to be expanded to allow something like this
>> 
>> colocation only-one -inf: (dummy1 dummy2 sequential="true")
>> 
>> colocation together 5000: (dummy1 dummy2 sequential="true")
> 
> How's this different from a regular constraint?
> 


Because it does not create a resource set with two resources
and if you put it in parentheses, it creates set with sequential="false"

Vadym
 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-15 Thread Vadym Chepkov

On Jun 15, 2010, at 3:55 PM, Dejan Muhamedagic wrote:

> On Tue, Jun 15, 2010 at 03:41:17PM -0400, Vadym Chepkov wrote:
>> 
>> On Jun 15, 2010, at 3:36 PM, Dejan Muhamedagic wrote:
>> 
>>> Hi,
>>> 
>>> On Tue, Jun 15, 2010 at 08:45:37AM -0400, Vadym Chepkov wrote:
>>>> 
>>>> On Jun 15, 2010, at 6:14 AM, Dejan Muhamedagic wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> On Tue, Jun 15, 2010 at 10:57:47AM +0200, Andrew Beekhof wrote:
>>>>>> On Tue, Jun 15, 2010 at 10:23 AM, Andreas Kurz  
>>>>>> wrote:
>>>>>>> On Tuesday 15 June 2010 08:40:58 Andrew Beekhof wrote:
>>>>>>>> On Mon, Jun 14, 2010 at 4:22 PM, Vadym Chepkov  
>>>>>>>> wrote:
>>>>>>>>> On Jun 7, 2010, at 8:04 AM, Vadym Chepkov wrote:
>>>>>>>>>> I filed bug 2435, glad to hear "it's not me"
>>>>>>>>> 
>>>>>>>>> Andrew closed this bug
>>>>>>>>> (http://developerbugs.linux-foundation.org/show_bug.cgi?id=2435) as
>>>>>>>>> resolved, but I respectfully disagree.
>>>>>>>>> 
>>>>>>>>> I will try to explain a problem again in this list.
>>>>>>>>> 
>>>>>>>>> lets assume you want to have several resources running on the same 
>>>>>>>>> node.
>>>>>>>>> They are independent, so if one is going down, others shouldn't be
>>>>>>>>> stopped. You would do this by using a resource set, like this:
>>>>>>>>> 
>>>>>>>>> primitive dummy1 ocf:pacemaker:Dummy
>>>>>>>>> primitive dummy2 ocf:pacemaker:Dummy
>>>>>>>>> primitive dummy3 ocf:pacemaker:Dummy
>>>>>>>>> colocation together inf: ( dummy1 dummy2 dummy3 )
>>>>>>>>> 
>>>>>>>>> and I expect them to run on the same host, but they are not and I
>>>>>>>>> attached hb_report to the case to prove it.
>>>>>>>>> 
>>>>>>>>> Andrew closed it with the comment "Thats because you have
>>>>>>>>> sequential="false" for the colocation set." But sequential="false" 
>>>>>>>>> means
>>>>>>>>> doesn't matter what order do they start.
>>>>>>>> 
>>>>>>>> No.  Thats not what it means.
>>>>>>>> And I believe I should know.
>>>>>>>> 
>>>>>>>> It means that the members of the set are NOT collocated with each
>>>>>>>> other, only with any preceding set.
>>>>>>> 
>>>>>>> Just for clarification:
>>>>>>> 
>>>>>>> colocation together inf: ( dummy1 dummy2 dummy3 ) dummy4
>>>>>>> 
>>>>>>>  is a shortcut for:
>>>>>>> 
>>>>>>> colocation together1 inf: dummy4 dummy1
>>>>>>> colocation together1 inf: dummy4 dummy2
>>>>>>> colocation together1 inf: dummy4 dummy3
>>>>>>> 
>>>>>>> ... is that correct?
>>>>>> 
>>>>>> Only if sequential != false.
>>>>> 
>>>>> You wanted to say "sequential == false"?
>>>>> 
>>>>>> For some reason the shell appears to be setting that by default.
>>>>> 
>>>>> This is sequential == false:
>>>>> 
>>>>> colocation together inf: ( dummy1 dummy2 dummy3 ) dummy4
>>>>> 
>>>>> This is sequential == true:
>>>>> 
>>>>> colocation together inf: dummy1 dummy2 dummy3 dummy4
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Dejan
>>>> 
>>>> 
>>>> I guess colocation syntax needs to be expanded to allow something like this
>>>> 
>>>> colocation only-one -inf: (dummy1 dummy2 sequential="true")
>>>> 
>>>> colocation together 5000: (dummy1 dummy2 sequential="true")
>>> 
>>> How's this different from a regular constraint?
>>> 
>> 
>> 
>> Because it does not create a resource set with two resources
>> and if you put it in parentheses, it creates set with sequential="false"
> 
> What I meant was what is the difference between these two:
> 
> 
> 
> 
> 
>  
>  
> 
> 
> 

I take it there is no difference for the positive score, 
it just looks like former looks like there is a dependency of p1 on p2.

But there is a definite difference with a negative score, 

if you have something like this

colocation only-one -inf: dummy1 dummy2

so you don't want to have them to run on the same host ever.
and the host with dummy2 goes down.
Instead of just not starting dummy2 anywhere it will kill dummy1 too and start 
dummy2 on the surviving host.
That's an outage and not what I wanted to achieve.

Vadym




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-15 Thread Vadym Chepkov

On Jun 15, 2010, at 5:26 PM, Dejan Muhamedagic wrote:

> On Tue, Jun 15, 2010 at 04:44:31PM -0400, Vadym Chepkov wrote:
>> 
>> On Jun 15, 2010, at 3:55 PM, Dejan Muhamedagic wrote:
>> 
>>> On Tue, Jun 15, 2010 at 03:41:17PM -0400, Vadym Chepkov wrote:
>>>> 
>>>> On Jun 15, 2010, at 3:36 PM, Dejan Muhamedagic wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> On Tue, Jun 15, 2010 at 08:45:37AM -0400, Vadym Chepkov wrote:
>>>>>> 
>>>>>> On Jun 15, 2010, at 6:14 AM, Dejan Muhamedagic wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> On Tue, Jun 15, 2010 at 10:57:47AM +0200, Andrew Beekhof wrote:
>>>>>>>> On Tue, Jun 15, 2010 at 10:23 AM, Andreas Kurz 
>>>>>>>>  wrote:
>>>>>>>>> On Tuesday 15 June 2010 08:40:58 Andrew Beekhof wrote:
>>>>>>>>>> On Mon, Jun 14, 2010 at 4:22 PM, Vadym Chepkov  
>>>>>>>>>> wrote:
>>>>>>>>>>> On Jun 7, 2010, at 8:04 AM, Vadym Chepkov wrote:
>>>>>>>>>>>> I filed bug 2435, glad to hear "it's not me"
>>>>>>>>>>> 
>>>>>>>>>>> Andrew closed this bug
>>>>>>>>>>> (http://developerbugs.linux-foundation.org/show_bug.cgi?id=2435) as
>>>>>>>>>>> resolved, but I respectfully disagree.
>>>>>>>>>>> 
>>>>>>>>>>> I will try to explain a problem again in this list.
>>>>>>>>>>> 
>>>>>>>>>>> lets assume you want to have several resources running on the same 
>>>>>>>>>>> node.
>>>>>>>>>>> They are independent, so if one is going down, others shouldn't be
>>>>>>>>>>> stopped. You would do this by using a resource set, like this:
>>>>>>>>>>> 
>>>>>>>>>>> primitive dummy1 ocf:pacemaker:Dummy
>>>>>>>>>>> primitive dummy2 ocf:pacemaker:Dummy
>>>>>>>>>>> primitive dummy3 ocf:pacemaker:Dummy
>>>>>>>>>>> colocation together inf: ( dummy1 dummy2 dummy3 )
>>>>>>>>>>> 
>>>>>>>>>>> and I expect them to run on the same host, but they are not and I
>>>>>>>>>>> attached hb_report to the case to prove it.
>>>>>>>>>>> 
>>>>>>>>>>> Andrew closed it with the comment "Thats because you have
>>>>>>>>>>> sequential="false" for the colocation set." But sequential="false" 
>>>>>>>>>>> means
>>>>>>>>>>> doesn't matter what order do they start.
>>>>>>>>>> 
>>>>>>>>>> No.  Thats not what it means.
>>>>>>>>>> And I believe I should know.
>>>>>>>>>> 
>>>>>>>>>> It means that the members of the set are NOT collocated with each
>>>>>>>>>> other, only with any preceding set.
>>>>>>>>> 
>>>>>>>>> Just for clarification:
>>>>>>>>> 
>>>>>>>>> colocation together inf: ( dummy1 dummy2 dummy3 ) dummy4
>>>>>>>>> 
>>>>>>>>>  is a shortcut for:
>>>>>>>>> 
>>>>>>>>> colocation together1 inf: dummy4 dummy1
>>>>>>>>> colocation together1 inf: dummy4 dummy2
>>>>>>>>> colocation together1 inf: dummy4 dummy3
>>>>>>>>> 
>>>>>>>>> ... is that correct?
>>>>>>>> 
>>>>>>>> Only if sequential != false.
>>>>>>> 
>>>>>>> You wanted to say "sequential == false"?
>>>>>>> 
>>>>>>>> For some reason the shell appears to be setting that by default.
>>>>>>> 
>>>>>>> This is sequential == false:
>>>>>>> 
>>>>>>> colocation together inf: ( dummy1 dummy2 dummy3 ) dummy4
>>>>>>> 
>>>>>>> This is sequential == true:
>>>>>>> 
>>>>>>> colocation together inf: dummy1 dummy2 dummy3 dummy4
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Dejan
>>>>>> 
>>>>>> 
>>>>>> I guess colocation syntax needs to be expanded to allow something like 
>>>>>> this
>>>>>> 
>>>>>> colocation only-one -inf: (dummy1 dummy2 sequential="true")
>>>>>> 
>>>>>> colocation together 5000: (dummy1 dummy2 sequential="true")
>>>>> 
>>>>> How's this different from a regular constraint?
>>>>> 
>>>> 
>>>> 
>>>> Because it does not create a resource set with two resources
>>>> and if you put it in parentheses, it creates set with sequential="false"
>>> 
>>> What I meant was what is the difference between these two:
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> I take it there is no difference for the positive score, 
>> it just looks like former looks like there is a dependency of p1 on p2.
>> 
>> But there is a definite difference with a negative score, 
>> 
>> if you have something like this
>> 
>> colocation only-one -inf: dummy1 dummy2
>> 
>> so you don't want to have them to run on the same host ever.
> 
>> and the host with dummy2 goes down.
>> Instead of just not starting dummy2 anywhere it will kill dummy1 too and 
>> start dummy2 on the surviving host.
>> That's an outage and not what I wanted to achieve.
> 
> Obviously that's not "ever", it's only if there's more than one
> node on which they can run. If you put
> 
>   colocation only-one -10: dummy1 dummy2
> 
> instead, you should get what you wanted.
> 


That the issue, I need "ever", and that's what Andrew's patch is going to fix.

Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-16 Thread Vadym Chepkov

On Jun 15, 2010, at 3:52 PM, Dejan Muhamedagic wrote:

> On Tue, Jun 15, 2010 at 12:53:07PM -0400, Vadym Chepkov wrote:
>> 
>> On Jun 15, 2010, at 9:26 AM, Vadym Chepkov wrote:
>>>>> 
>>>>> what about this part? what do I need to do to prevent them from running 
>>>>> on different nodes for sure?
>>>> 
>>>> You can't have it both ways.
>>>> Either they have to run on the same node or they can remain active
>>>> when one or more die.
>>>> 
>>>> Although you could do:
>>>> 
>>>> d1 ( d2 d3 d4 )
>>>> 
>>>> That would almost get what you want, unless d1 dies.
>>> 
>>> I guess I would have to keep the most significant as an anchor, I can
>>> leave with it.
>>> Unfortunately, as far as I understand, there is no way do define this
>>> in shell config now, because shell adds sequential=false when it sees
>>> ().
> 
> Yes it does. So, you want to have two adjacent sequential sets in
> one constraint? Not very elegant, but I guess that this would do
> until we figure out how to represent it:
> 
> colocation c1 inf: p1:Started p2 p3 p4
> 
> In xml:
> 
>  
>
>  
>
>
>  
>  
>  
>
>  
> 
> Thanks,

But I think it should be at the end, like this:

colocation together 500: d1 d2 anchor:Started

to use the "anchor" approach? Right?

Thanks,
Vadym
 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-16 Thread Vadym Chepkov

On Jun 16, 2010, at 2:55 AM, Andrew Beekhof wrote:

> On Tue, Jun 15, 2010 at 9:41 PM, Dejan Muhamedagic  
> wrote:
> 
>> colocation not-together -inf: d1 d2 d3
> 
> I think there is a problem with this syntax, particularly for +inf.
> 
> Consider:
>  colocation together1 inf: d1 d2
> 
> This means d1 must run where d2 is.
> 
> But if I add d3:
>  colocation together1 inf: d1 d2 d3
> 
> Now the original constraint is reversed and d2 must run where d1 is
> (think of how groups work).
> (Unless you're modifying the order).
> 
> I think we need:
>   no brackets: exactly 2 resources must be specified
>   () brackets: a non-sequential set
>   [] brackets: a sequential set
> 
> 

Would something like this be a legitimate syntax then?

colocation together-but-do-not-die 500: [ d1 d2 d3 ] anchor

To combine different types in on constraint, basically?

The reason I am asking, I don't think it's possible to do something like this 
now, isn't it?

colocation together1 500: d1 d2 d3
colocation together2 500: d4 d5 d6
colocation together0 inf: together2 together1

Thanks,
Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-17 Thread Vadym Chepkov

On Jun 17, 2010, at 7:15 AM, Dejan Muhamedagic wrote:

> On Wed, Jun 16, 2010 at 08:54:37AM -0400, Vadym Chepkov wrote:
>> 
>> On Jun 15, 2010, at 3:52 PM, Dejan Muhamedagic wrote:
>> 
>>> On Tue, Jun 15, 2010 at 12:53:07PM -0400, Vadym Chepkov wrote:
>>>> 
>>>> On Jun 15, 2010, at 9:26 AM, Vadym Chepkov wrote:
>>>>>>> 
>>>>>>> what about this part? what do I need to do to prevent them from running 
>>>>>>> on different nodes for sure?
>>>>>> 
>>>>>> You can't have it both ways.
>>>>>> Either they have to run on the same node or they can remain active
>>>>>> when one or more die.
>>>>>> 
>>>>>> Although you could do:
>>>>>> 
>>>>>> d1 ( d2 d3 d4 )
>>>>>> 
>>>>>> That would almost get what you want, unless d1 dies.
>>>>> 
>>>>> I guess I would have to keep the most significant as an anchor, I can
>>>>> leave with it.
>>>>> Unfortunately, as far as I understand, there is no way do define this
>>>>> in shell config now, because shell adds sequential=false when it sees
>>>>> ().
>>> 
>>> Yes it does. So, you want to have two adjacent sequential sets in
>>> one constraint? Not very elegant, but I guess that this would do
>>> until we figure out how to represent it:
>>> 
>>> colocation c1 inf: p1:Started p2 p3 p4
>>> 
>>> In xml:
>>> 
>>> 
>>>   
>>> 
>>>   
>>>   
>>> 
>>> 
>>> 
>>>   
>>> 
>>> 
>>> Thanks,
>> 
>> But I think it should be at the end, like this:
>> 
>> colocation together 500: d1 d2 anchor:Started
>> 
>> to use the "anchor" approach? Right?
> 
> I don't know what do you want. This was just about how to produce
> two adjacent sequential resource sets.
> 


I was trying to find a configuration that will allow to have several resources 
always run on the same node,
but to be independent, i.e., don't stop if another one dies. In order to do it, 
Andrew suggested to make one of the resources as a most significant, see above. 
This idea can be transformed to a working solution when you just add a 
pseudo-resource, which will never ever die (anchor), and Dummy agent will fill 
the bill.

so, provided we need 2 resources, d1 and d2, locked together, but be 
independent, one would create a constraint like this:

primitive anchor ocf:pacemaker:Dummy
colocation together inf: (d1 d2) anchor

and this seems to be doing the trick nicely.

Vadym






___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-17 Thread Vadym Chepkov

Andrew, 

I took the latest sources from the repository an got myself pacemaker Version: 
1.0.9-6bf91e9195fe7649e174af0ba2c67dbd902d4a2b
Just to remind what all this story began from, I want to be able to define 
resources that should never run on the same node.

Here is the config

primitive d1 ocf:pacemaker:Dummy
primitive d2 ocf:pacemaker:Dummy
primitive d3 ocf:pacemaker:Dummy
colocation only-one -inf: d1 d2 d3

It seems to be working almost fine, and by the way, sorry if it seems like I am 
never happy, I really appreciate your help, I started this crusade to cluster 
everything in sight and face lots of challenges on the way.
Anyway, what is not quite right is what I described in an earlier post.

Online: [ c20 c19 c21 ]

Full list of resources:

d1  (ocf::pacemaker:Dummy): Started c19
d2  (ocf::pacemaker:Dummy): Started c21
d3  (ocf::pacemaker:Dummy): Started c20

If I bring node c19 down:

Node c19: standby
Online: [ c20 c21 ]

Full list of resources:

d1  (ocf::pacemaker:Dummy): Started c20
d2  (ocf::pacemaker:Dummy): Started c21
d3  (ocf::pacemaker:Dummy): Stopped

I never wanted for d3 to be brought down, just didn't want d1 started anywhere 
else.
What configuration should I have used?

Thank you,
Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Drbd/Nfs MS don't failover on slave node

2010-07-07 Thread Vadym Chepkov

On Jul 7, 2010, at 11:41 AM, Guillaume Chanaud wrote:

> I already tried the
> 
> no_quorum-policy="ignore"
> which has no effect
> 
> I just retried when reading your mail and resources doesn't migrate.
> In fact even if i boot only one server, this one won't promote the resources 
> until the second is up...which is far from HA ;)
> 
> 
> Guillaume


crm shell is very helpful if you can't spell properties right
just press TAB symbol and it will autocomplete them for you.

crm(live)configure# property 
batch-limit=  pe-input-series-max=
cluster-delay=pe-warn-series-max=
default-action-timeout=   remove-after-stop=
default-resource-stickiness=  start-failure-is-fatal=
is-managed-default=   startup-fencing=
maintenance-mode= stonith-action=
no-quorum-policy= stonith-enabled=
node-health-green=stonith-timeout=
node-health-red=  stop-all-resources=
node-health-strategy= stop-orphan-actions=
node-health-yellow=   stop-orphan-resources=
pe-error-series-max=  symmetric-cluster=


As you can see the parameter you used was just ignored.

Vadym


> 
>> On Mon, Jul 05, 2010 at 12:41:05PM +0200, Guillaume Chanaud wrote:
>>>  Hello,
>>> 
>>> i searched the list, tried lots of things but nothing works, so i
>>> try to post here.
>>> I'd like to say my configuration worked on heartbeat2/crm, but since
>>> i migrated to corosync/pacemaker i have a problem.
>>> Here is my cib :
>>> 
>>> node filer1 \
>>> attributes standby="off"
>>> node filer2 \
>>> attributes standby="off"
>>> property $id="cib-bootstrap-options" \
>>> symmetric-cluster="true" \
>>> no_quorum-policy="stop" \
>> 
>> 
>>> Now, i stop the filer1
>>> #/etc/init.d/corosync stop
>>> 
>>> It stops correctly
>>> 
>>> but in crm_mon:
>>> 
>>> 
>>> Last updated: Mon Jul  5 11:28:59 2010
>>> Stack: openais
>>> Current DC: filer1.connecting-nature.com - partition WITHOUT quorum
>> 
>>> And nothing happens (ressources doesn't migrate to filer2 which is
>>> online, in fact, like pasted above, they doesn't appears)
>> maybe you get what you asked for?
>> 
>> Hint: no_quorum-policy
>> 
>> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] crm configure update and properties

2010-07-09 Thread Vadym Chepkov
Hi,

I am not sure if it's a bug or not, but certainly not a pleasant feature.
At first I configured cluster with property stonith-enabled="false", because I 
didn't have a stontih device handy.
Then, after I got an APC PDU, I created a separate configuration:

# cat pdu.crm

property stonith-enabled="true" stonith-timeout="30s"
primitive pdu stonith:external/rackpdu \
params pduip="10.6.6.6" community="666" hostlist="AUTO" \
op start timeout="60s" \
meta target-role="Started"

and loaded it (or so I thought) with

# crm configure load update pdu.crm

To my surprise this has removed any other properties my cluster has had :(
Is it expected behavior? 

pacemaker-1.0.9.1

Thank you,
Vadym Chepkov
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] error installing CentOS clvm after using clusterlabs repository

2010-08-04 Thread Vadym Chepkov
On Wed, Aug 4, 2010 at 10:49 AM, Michael Fung  wrote:
>
> On 2010/8/4 下午 09:06, Andrew Beekhof wrote:
>
>> You can either use cluster.conf for configuring corosync/cman or I can
>> send you the corosync.conf snippet.
>>
>
> Yes, please send me the corosync.conf snippet.
>

If it's not much trouble, could you post it to the maillist, please?
I personally was waiting for RHEL6 release and to find out I won't be
able to use clvmd with pacemaker event there is kind of unsettling.

Thank you,
Vadym

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] pacemaker in rhel6

2010-08-07 Thread Vadym Chepkov
Hi,

It seems pacemaker is broken in rhel6-beta2

# rpm -q pacemaker
pacemaker-1.1.2-5.el6.x86_64

# crm configure load replace crm.cfg 
crm_standby not available, check your installation

And it's true, crm_standby is not part of the RPM

Thanks,
Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] pacemaker in rhel6

2010-08-09 Thread Vadym Chepkov

On Aug 9, 2010, at 3:59 AM, Andrew Beekhof wrote:

> yes, its fixed in -6
> 

I wasn't able to find "updates" repository for rhel6. Is it available?


> On Sat, Aug 7, 2010 at 8:03 PM, Vadym Chepkov  wrote:
>> Hi,
>> 
>> It seems pacemaker is broken in rhel6-beta2
>> 
>> # rpm -q pacemaker
>> pacemaker-1.1.2-5.el6.x86_64
>> 
>> # crm configure load replace crm.cfg
>> crm_standby not available, check your installation
>> 
>> And it's true, crm_standby is not part of the RPM
>> 
>> Thanks,
>> Vadym
>> 
>> 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] migration-threshold and failure-timeout

2010-09-21 Thread Vadym Chepkov
On Tue, Sep 21, 2010 at 9:14 AM, Dan Frincu  wrote:
> Hi,
>
> This =>
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-failure-migration.html
> explains it pretty well. Notice the INFINITY score and what sets it.
>
> However I don't know of any automatic method to clear the failcount.
>
> Regards,
> Dan


in pacemaker 1.0 nothing will clean failcount automatically, this is a
feature of pacemaker 1.1, imho

But,

crm configure rsc_defaults failure-timeout="10min"

will make cluster to "forget" about previous failure in 10 minutes.
if you want to futher decrease this paramater, you might need to decrease

crm configure property cluster-recheck-interval="10min"

Cheers,
Vadym




>
> Pavlos Parissis wrote:
>
> Hi,
>
> I am trying to figure a way to do the following
> if the monitor of x resource fails N time in a period of Z then fail over to
> the other node and clear fail-count.
>
> Regards,
> Pavlos
>
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
> --
> Dan FRINCU
> Systems Engineer
> CCNA, RHCE
> Streamwide Romania
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] starting a xen-domU depending on available hardware-resources using SysInfo-RA

2010-09-30 Thread Vadym Chepkov

On Sep 30, 2010, at 2:35 AM, Sascha Reimann wrote:

> Hi Dejan,
> 
> it's working fine with the amount of free ram as the score and a bigger 
> default-resource-stickiness:
> 
> primitive v01 ocf:heartbeat:Xen \
>   params xmfile="/etc/xen/conf.d/v01.cfg" \
>   op monitor interval="30s" timeout="30s" \
>   op start interval="0" timeout="60s" \
>   op stop interval="0" timeout="40s" allow-migrate="true" \
>   meta target-role="Started"
> primitive v02 ocf:heartbeat:Xen \
>   params xmfile="/etc/xen/conf.d/v02.cfg" \
>   op monitor interval="30s" timeout="30s" \
>   op start interval="0" timeout="60s" \
>   op stop interval="0" timeout="40s" allow-migrate="true" \
>   meta target-role="Started"
> primitive v03 ocf:heartbeat:Xen \
>   params xmfile="/etc/xen/conf.d/v03.cfg" \
>   op monitor interval="30s" timeout="30s" \
>   op start interval="0" timeout="60s" \
>   op stop interval="0" timeout="40s" allow-migrate="true" \
>   meta target-role="Started"
> location RAM01-v01 v01 \
>   rule $id="loc-resv01-rule" ram_free: ram_free gt 6000
> location RAM01-v02 v02 \
>   rule $id="loc-resv02-rule" ram_free: ram_free gt 3000
> location RAM01-v03 v03 \
>   rule $id="RAM01-v03-rule" ram_free: ram_free gt 1000
> property $id="cib-bootstrap-options" \
>   dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
>   cluster-infrastructure="openais" \
>   expected-quorum-votes="4" \
>   stonith-enabled="false" \
>   default-resource-stickiness="16000" \
>   last-lrm-refresh="1285761587"
> 
> thanks!

Hmm, correct me if I am wrong, but these rules won't migrate a VM anywhere 
unless some dom0 has more then 16000 of ram, no? 
I don't think this is what you wanted.

And on a related note, why are there two different SysInfo agents? Which one is 
supported?

/usr/lib/ocf/resource.d/heartbeat/SysInfo
/usr/lib/ocf/resource.d/pacemaker/SysInfo

Thanks,
Vadym

> 
> On 09/28/2010 12:18 PM, Dejan Muhamedagic wrote:
>> Hi,
>> 
>> On Tue, Sep 28, 2010 at 11:00:18AM +0200, Sascha Reimann wrote:
>>> howdy!
>>> 
>>> I'm trying to configure a resource (xen-domU) that could start on 2
>>> nodes (preferred on node server01):
>>> 
>>> primitive v01 ocf:heartbeat:Xen \
>>> params xmfile="/etc/xen/conf.d/v01.cfg" allow-migrate="true"
>>> location loc-v01p v01 200: server01
>>> location loc-v01s v01 100: server02
>>> 
>>> That's working fine so far, but I want to ensure that there's enough
>>> hardwareresources available on server01, so I've set up a modified
>>> SysInfo-RA to put the ram_total and ram_free values of xen (xm
>>> info|awk '/free_memory/ {print $3}') to the statusinformation of the
>>> CIB:
>>> 
>>> server01:~$ cibadmin -Q -o status|grep status-server01-ram
>>> 
>>> 
>>> 
>>> This is working fine, too. BUT:
>>> 
>>> When I create a rule like the one below, the xen-domU keeps
>>> restarting (or moving to server02 where the same happens), which is
>>> correct since the SysInfo-RA updates the statusinformation to
>>> value="0" after a start and back to value="2000" after a stop in
>>> this example.
>>> 
>>> location loc-resv01 v01 \
>>> rule $id="loc-resv01-rule" -inf: ram_free lt 2000
>> 
>> An interesting issue :-)
>> 
>> Well, you can introduce resource stickiness and use that to
>> outweigh the negative score coming from the lack of memory (use
>> something less than inf). You may also consider using the amount
>> of free memory as a score.
>> 
>> HTH,
>> 
>> Dejan
>> 
>>> Can anybody help?
>>> 
>>> ___
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: 
>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 
> 
> -- 
> Für weitere Fragen stehen wir Ihnen gerne zur Verfügung.
> 
> Mit freundlichen Grüßen
> 
> Sascha Reimann
> 
> ===
> 
> - Hostway Deutschland GmbH
> - Am Mittelfelde 29, D 30519 Hannover, Germany
> - Fon +49 (0)511 71260-100, Fax +49 (0)511 71260-198
> 
> Geschäftsführer
> 
> Cord Bansemer (CEO)
> 
> Dr. Achilleas Anastasiadis
> 
> 
> 
> Datenschutzbeauftragter lt. BDSG
> 
> RA Thomas Lehmacher
> 
> Zuständiges Handelsregister:
> 
> Amtsgericht Hannover HRB 202097
> 
> 
> 
> Zuständiges Finanzamt:
> 
> Finanzamt Hannover
> 
> USt-IdNr. DE204915504
> 
> 
> 
> Bankverbindung: Dresdner Bank AG
> 
> KTO 0 111 085 800 · BLZ 250 800 20
> 
> ===
> 
> HINWEIS: Diese Email und etwaige Anlagen beinhalten vertrauliche und/oder 
> rechtlich geschützte Informationen und sind nur für den Adressaten bestimmt. 
> Sollten Sie nicht der beabsichtigte Empfänger der Nachricht sein, oder diese 
> Nachricht versehentlich erhalten ha

Re: [Pacemaker] pacemaker version

2010-10-06 Thread Vadym Chepkov

On Oct 6, 2010, at 2:48 AM, Andrew Beekhof wrote:

> On Tue, Oct 5, 2010 at 7:53 PM, Shravan Mishra  
> wrote:
>> Hi,
>> 
>> I was interested in knowing that if I have to choose between pacemaker
>> 1.0 vs 1.1 which one should I use.
> 
> Have a read of:
>   
> http://theclusterguy.clusterlabs.org/post/441442543/new-pacemaker-release-series
> 
> I would recommend 1.1.  Its what ships in SLES11 and RHEL6
> 

But http://www.clusterlabs.org/wiki/Releases says it's "experimental" version, 
not a stable one.
Is this page outdated then?

Thanks,
Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] how to test network access and fail over accordingly?

2010-10-06 Thread Vadym Chepkov

On Oct 6, 2010, at 3:43 AM, Jayakrishnan wrote:

> 
> Hello,
>  
> Guess the change:--  
> location loc_pingd g_cluster_services rule -inf: not_defined pingd or pingd 
> number:lte 0
> 
> should work
>  
> 
> 


ocf:pacemaker:ping is recommended as a replacement for pingd RA

Both RA define node attribute "pingd" by default, I think this question arises 
a lot, since crm ra meta for both agents is misleading:

#  crm ra meta ocf:pacemaker:ping

name (string, [undef]): Attribute name
The name of the attributes to set.  This is the name to be used in the 
constraints.

I think it should say "pingd" instead of "undef"

Obviously, you can redefine any name you like and use it instead, but, 
unfortunately, "pingd" is the only attribute name that crm_mon -f would 
display, the name is hardcoded in crm_mon.c:

if(safe_str_eq("pingd", g_hash_table_lookup(rsc->meta, "type"))) {

this is inconvenience for multi-homed clusters where you need to define 
separate ping clones for each network, so maybe crm_mon should display 
attributes starting with "ping". Just a thought.


Vadym


> -- 
> Regards,
> 
> Jayakrishnan. L
> 
> Visit: 
> www.foralllinux.blogspot.com
> www.jayakrishnan.bravehost.com
>  
>  
> On Wed, Oct 6, 2010 at 11:56 AM, Claus Denk  wrote:
> I am having a similar problem, so let's wait for the experts, But in the 
> meanwhile, try changing
> 
> 
> location loc_pingd g_cluster_services rule -inf: not_defined p_pingd
> or p_pingd lte 0
> 
> to
> 
> location loc_pingd g_cluster_services rule -inf: not_defined pingd
> or pingd number:lte 0
> 
> and see what happens. As far as I have read, it is also more recommended to 
> use the "ping"
> resource instead of "pingd"...
> 
> kind regards, Claus
> 
> 
> 
> 
> 
> 
> On 10/06/2010 05:45 AM, Craig Hurley wrote:
> Hello,
> 
> I have a 2 node cluster, running DRBD, heartbeat and pacemaker in
> active/passive mode.  On both nodes, eth0 is connected to the main
> network, eth1 is used to connect the nodes directly to each other.
> The nodes share a virtual IP address on eth0.  Pacemaker is also
> controlling a custom service with an LSB compliant script in
> /etc/init.d/.  All of this is working fine and I'm happy with it.
> 
> I'd like to configure the nodes so that they fail over if eth0 goes
> down (or if they cannot access a particular gateway), so I tried
> adding the following (as per
> http://www.clusterlabs.org/wiki/Example_configurations#Set_up_pingd)
> 
> primitive p_pingd ocf:pacemaker:pingd params host_list=172.20.0.254 op
> monitor interval=15s timeout=5s
> clone c_pingd p_pingd meta globally-unique=false
> location loc_pingd g_cluster_services rule -inf: not_defined p_pingd
> or p_pingd lte 0
> 
> ... but when I do add that, all resource are stopped and they don't
> come back up on either node.  Am I making a basic mistake or do you
> need more info from me?
> 
> All help is appreciated,
> Craig.
> 
> 
> pacemaker
> Version: 1.0.8+hg15494-2ubuntu2
> 
> heartbeat
> Version: 1:3.0.3-1ubuntu1
> 
> drbd8-utils
> Version: 2:8.3.7-1ubuntu2.1
> 
> 
> r...@rpalpha:~$ sudo crm configure show
> node $id="32482293-7b0f-466e-b405-c64bcfa2747d" rpalpha
> node $id="3f2aac12-05aa-4ac7-b91f-c47fa28efb44" rpbravo
> primitive p_drbd_data ocf:linbit:drbd \
> params drbd_resource="data" \
> op monitor interval="30s"
> primitive p_fs_data ocf:heartbeat:Filesystem \
> params device="/dev/drbd/by-res/data" directory="/mnt/data"
> fstype="ext4"
> primitive p_ip ocf:heartbeat:IPaddr2 \
> params ip="172.20.50.3" cidr_netmask="255.255.0.0" nic="eth0" \
> op monitor interval="30s"
> primitive p_rp lsb:rp \
> op monitor interval="30s" \
> meta target-role="Started"
> group g_cluster_services p_ip p_fs_data p_rp
> ms ms_drbd p_drbd_data \
> meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> location loc_preferred_master g_cluster_services inf: rpalpha
> colocation colo_mnt_on_master inf: g_cluster_services ms_drbd:Master
> order ord_mount_after_drbd inf: ms_drbd:promote g_cluster_services:start
> property $id="cib-bootstrap-options" \
> dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
> cluster-infrastructure="Heartbeat" \
> no-quorum-policy="ignore" \
> stonith-enabled="false" \
> expected-quorum-votes="2" \
> 
> 
> r...@rpalpha:~$ sudo cat /etc/ha.d/ha.cf
> node rpalpha
> node rpbravo
> 
> keepalive 2
> warntime 5
> deadtime 15
> initdead 60
> 
> mcast eth0 239.0.0.43 694 1 0
> bcast eth1
> 
> use_logd yes
> autojoin none
> crm respawn
> 
> 
> r...@rpalpha:~$ sudo cat /etc/drbd.conf
> global {
> usage-count no;
> }
> common {
> protocol C;
> 
> handlers {}
> 
> startup {}
> 
> disk {}
> 
> net {
> cram-hmac-alg sha1;
> shared-secret "foobar";
> }
> 
> syncer {
> verify-alg sha1;
> rate 100M

Re: [Pacemaker] how to test network access and fail over accordingly?

2010-10-06 Thread Vadym Chepkov

On Oct 6, 2010, at 4:21 PM, Craig Hurley wrote:

> I tried using ping instead of pingd and I added "number" to the
> evaluation, I get the same results :/
> 
> primitive p_ping ocf:pacemaker:ping params host_list=172.20.0.254
> clone c_ping p_ping meta globally-unique=false
> location loc_ping g_cluster_services rule -inf: not_defined p_ping or
> p_ping number:lte 0
> 


try this

primitive p_ping ocf:pacemaker:ping params name="p_ping" 
host_list="172.20.0.254" op monitor interval="10s"

Vadym

> Regards,
> Craig.
> 
> 
> On 6 October 2010 20:43, Jayakrishnan  wrote:
>> 
>> Hello,
>> 
>> Guess the change:--
>> location loc_pingd g_cluster_services rule -inf: not_defined pingd or pingd
>> number:lte 0
>> 
>> should work
>> 
>> 
>> --
>> Regards,
>> 
>> Jayakrishnan. L
>> 
>> Visit:
>> www.foralllinux.blogspot.com
>> www.jayakrishnan.bravehost.com
>> 
>> 
>> On Wed, Oct 6, 2010 at 11:56 AM, Claus Denk  wrote:
>>> 
>>> I am having a similar problem, so let's wait for the experts, But in the
>>> meanwhile, try changing
>>> 
>>> 
>>> location loc_pingd g_cluster_services rule -inf: not_defined p_pingd
>>> or p_pingd lte 0
>>> 
>>> to
>>> 
>>> location loc_pingd g_cluster_services rule -inf: not_defined pingd
>>> or pingd number:lte 0
>>> 
>>> and see what happens. As far as I have read, it is also more recommended
>>> to use the "ping"
>>> resource instead of "pingd"...
>>> 
>>> kind regards, Claus
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 10/06/2010 05:45 AM, Craig Hurley wrote:
 
 Hello,
 
 I have a 2 node cluster, running DRBD, heartbeat and pacemaker in
 active/passive mode.  On both nodes, eth0 is connected to the main
 network, eth1 is used to connect the nodes directly to each other.
 The nodes share a virtual IP address on eth0.  Pacemaker is also
 controlling a custom service with an LSB compliant script in
 /etc/init.d/.  All of this is working fine and I'm happy with it.
 
 I'd like to configure the nodes so that they fail over if eth0 goes
 down (or if they cannot access a particular gateway), so I tried
 adding the following (as per
 http://www.clusterlabs.org/wiki/Example_configurations#Set_up_pingd)
 
 primitive p_pingd ocf:pacemaker:pingd params host_list=172.20.0.254 op
 monitor interval=15s timeout=5s
 clone c_pingd p_pingd meta globally-unique=false
 location loc_pingd g_cluster_services rule -inf: not_defined p_pingd
 or p_pingd lte 0
 
 ... but when I do add that, all resource are stopped and they don't
 come back up on either node.  Am I making a basic mistake or do you
 need more info from me?
 
 All help is appreciated,
 Craig.
 
 
 pacemaker
 Version: 1.0.8+hg15494-2ubuntu2
 
 heartbeat
 Version: 1:3.0.3-1ubuntu1
 
 drbd8-utils
 Version: 2:8.3.7-1ubuntu2.1
 
 
 r...@rpalpha:~$ sudo crm configure show
 node $id="32482293-7b0f-466e-b405-c64bcfa2747d" rpalpha
 node $id="3f2aac12-05aa-4ac7-b91f-c47fa28efb44" rpbravo
 primitive p_drbd_data ocf:linbit:drbd \
 params drbd_resource="data" \
 op monitor interval="30s"
 primitive p_fs_data ocf:heartbeat:Filesystem \
 params device="/dev/drbd/by-res/data" directory="/mnt/data"
 fstype="ext4"
 primitive p_ip ocf:heartbeat:IPaddr2 \
 params ip="172.20.50.3" cidr_netmask="255.255.0.0" nic="eth0" \
 op monitor interval="30s"
 primitive p_rp lsb:rp \
 op monitor interval="30s" \
 meta target-role="Started"
 group g_cluster_services p_ip p_fs_data p_rp
 ms ms_drbd p_drbd_data \
 meta master-max="1" master-node-max="1" clone-max="2"
 clone-node-max="1" notify="true"
 location loc_preferred_master g_cluster_services inf: rpalpha
 colocation colo_mnt_on_master inf: g_cluster_services ms_drbd:Master
 order ord_mount_after_drbd inf: ms_drbd:promote g_cluster_services:start
 property $id="cib-bootstrap-options" \
 dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
 cluster-infrastructure="Heartbeat" \
 no-quorum-policy="ignore" \
 stonith-enabled="false" \
 expected-quorum-votes="2" \
 
 
 r...@rpalpha:~$ sudo cat /etc/ha.d/ha.cf
 node rpalpha
 node rpbravo
 
 keepalive 2
 warntime 5
 deadtime 15
 initdead 60
 
 mcast eth0 239.0.0.43 694 1 0
 bcast eth1
 
 use_logd yes
 autojoin none
 crm respawn
 
 
 r...@rpalpha:~$ sudo cat /etc/drbd.conf
 global {
 usage-count no;
 }
 common {
 protocol C;
 
 handlers {}
 
 startup {}
 
 disk {}
 
 net {
 cram-hmac-alg sha1;
 shared-secret "foobar";
 }
 
 syncer {
 verify-alg sha1;
 

Re: [Pacemaker] stonith pacemaker problem

2010-10-11 Thread Vadym Chepkov

On Oct 11, 2010, at 2:14 AM, Andrew Beekhof wrote:

> On Sun, Oct 10, 2010 at 11:20 PM, Shravan Mishra
>  wrote:
>> Andrew,
>> 
>> We were able to solve our problem. Obviously if no one else is having
>> it then it has to be our environment. It's just that time pressure and
>> mgmt pressure was causing us to go really bonkers.
>> 
>> We had been struggling with this for past 4 days.
>> So here is the story:
>> 
>> We had following versions of HA libs existing on our appliance:
>> 
>> heartbeat=3.0.0
>> openais=1.0.0
>> pacemaker=1.0.9
>> 
>> When I started installing glue=1.0.3 on top of it I started getting
>> bunch of conflicts so I basically
>> uninstalled the heartbeat and openais and proceeded to install the
>> following in the given order:
>> 
>> 1.  glue=1.0.3
>> 2.  corosync=1.1.1
>> 3. pacemaker=1.0.9
>> 4. agents=1.0.3
>> 
>> 
>> 
>> And that's when we started seeing this problem.
>> So after 2 days of going nowhere with this we said let's leave the
>> packages as such try to install using --replace-files option.
>> 
>> We are using a build tool called conary which has this option and not
>> standard make/make install.
>> 
>> So we let the above heartbeat and openais remain as such and installed
>> glue,corosync and pacemaker on top of it with the --replace-files
>> options , this time with no conflicts and bingo it all works fine.
>> 
>> So that sort of confused me as to why do we still need heartbeat given
>> the above 4 packages.
> 
> strictly speaking you don't.
> but at least on fedora, the policy is that $x-libs always requires $x
> so just building against heartbeat-libs means that yum will suck in
> the main heartbeat package :-(

I don't think its the case for properly designed rpms:

[r...@fedora ~]# cat /etc/fedora-release 
Fedora release 13 (Goddard)

[r...@fedora ~]# rpm -qa|grep postgres
postgresql-libs-8.4.4-1.fc13.i686

heartbeat dependency is for some reason built in into spec file

%package libs
Summary:  Heartbeat libraries
Group:System Environment/Daemons
Requires: heartbeat = %{version}-%{release}

And I don't think it should.

Vadym


> 
> glad you found a path forward though
> 
>>  understand that /usr/lib/ocf/resource.d/heartbeat has ocf scripts
>> provided by heartbeat but that can be part of the "Reusable cluster
>> agents" subsystem.
>> 
>> Frankly I thought the way I had installed the system by erasing and
>> installing the fresh packages it should have worked.
>> 
>> But all said and done I learned a lot of cluster code by gdbing it.
>> I'll be having a peaceful thanksgiving.
>> 
>> Thanks and happy thanks giving.
>> Shravan
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Sun, Oct 10, 2010 at 2:46 PM, Andrew Beekhof  wrote:
>>> Not enough information.
>>> We'd need more than just the lrmd's logs, they only show what happened not 
>>> why.
>>> 
>>> On Thu, Oct 7, 2010 at 11:02 PM, Shravan Mishra
>>>  wrote:
 Hi,
 
 Description of my environment:
   corosync=1.2.8
   pacemaker=1.1.3
   Linux= 2.6.29.6-0.6.smp.gcc4.1.x86_64 #1 SMP
 
 
 We are having a problem with our pacemaker which is continuously
 canceling the monitoring operation of our stonith devices.
 
 We ran:
 
 stonith -d -t external/safe/ipmi hostname=ha2.itactics.com
 ipaddr=192.168.2.7 userid=hellouser passwd=hello interface=lanplus -S
 
 it's output is attached as stonith.output.
 
 We have been trying to debug this issue for  a few days now with no 
 success.
 We are hoping that someone can help us as we are under immense
 pressure to move to RCS unless we can solve this issue in a day or two
 ,which I personally don't want to because we like the product.
 
 Any help will be greatly appreciated.
 
 
 Here is an excerpt from the /var/log/messages:
 =
 Oct  7 16:58:29 ha1 lrmd: [3581]: info:
 rsc:ha2.itactics.com-stonith:11155: start
 Oct  7 16:58:29 ha1 lrmd: [3581]: info:
 rsc:ha2.itactics.com-stonith:11156: monitor
 Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
 monitor[11156] on
 stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
 its parameters: CRM_meta_interval=[2] target_role=[started]
 ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[18]
 crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
 hostname=[ha2.itactics.com] passwd=[ft01st0...@]
 userid=[safe_ipmi_admin]  cancelled
 Oct  7 16:58:29 ha1 lrmd: [3581]: info: 
 rsc:ha2.itactics.com-stonith:11157: stop
 Oct  7 16:58:29 ha1 lrmd: [3581]: info:
 rsc:ha2.itactics.com-stonith:11158: start
 Oct  7 16:58:29 ha1 lrmd: [3581]: info:
 rsc:ha2.itactics.com-stonith:11159: monitor
 Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
 monitor[11159] on
 stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
 its parameters: CRM_meta_inte

[Pacemaker] Move DRBD master

2010-10-18 Thread Vadym Chepkov
Hi,

What is the crm shell command to move drbd master to a different node?

Thank you,
Vadym

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Move DRBD master

2010-10-19 Thread Vadym Chepkov

On Oct 19, 2010, at 3:13 AM, Dan Frincu wrote:

> Vadym Chepkov wrote:
>> Hi,
>> 
>> What is the crm shell command to move drbd master to a different node?
>>  
> # crm resource help migrate
> 
> Migrate a resource to a different node. If node is left out, the
> resource is migrated by creating a constraint which prevents it from
> running on the current node. Additionally, you may specify a
> lifetime for the constraint---once it expires, the location
> constraint will no longer be active.
> 
> Usage:
> ...
>   migrate  [] []
> 
> crm resource migrate ms_drbd_storage
> 
> WARNING: Creating rsc_location constraint 'cli-standby-ms_drbd_storage' with 
> a score of -INFINITY for resource ms_drbd_storage on cluster1.
>   This will prevent ms_drbd_storage from running on cluster1 until the 
> constraint is removed using the 'crm_resource -U' command or manually with 
> cibadmin
>   This will be the case even if cluster1 is the last node in the cluster
>   This message can be disabled with -Q
> 
> 
> This also made me to remind that I was wondering, is there a way to
> demote one instance of multi-master ms resource away from particular
> node (forcibly switch it to a slave state on that node). I didn't find
> the answer too. Is it possible with crm shell?
> 
> Same as above, just specify the node in the command.
> 

Have you actually tried it? This command doesn't work for this case.


> Regards,
> 
> Dan
>> Thank you,
>> Vadym
>> 
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: 
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>  
> 
> -- 
> Dan FRINCU
> Systems Engineer
> CCNA, RHCE
> Streamwide Romania
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Move DRBD master

2010-10-19 Thread Vadym Chepkov

On Oct 19, 2010, at 3:42 AM, Pavlos Parissis wrote:

> 
> 
> On 19 October 2010 01:18, Vadym Chepkov  wrote:
> Hi,
> 
> What is the crm shell command to move drbd master to a different node?
> 
> 
> take a look at this 
> http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg06300.html
> ___

Wow, not the friendliest command I would say. Maybe "move" command can be 
enhanced to provide something similar?

Thanks,
Vadym

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Move DRBD master

2010-10-19 Thread Vadym Chepkov

On Oct 19, 2010, at 9:18 AM, Dan Frincu wrote:

> 
> 
> Vadym Chepkov wrote:
>> 
>> 
>> On Oct 19, 2010, at 3:42 AM, Pavlos Parissis wrote:
>> 
>>> 
>>> 
>>> On 19 October 2010 01:18, Vadym Chepkov  wrote:
>>> Hi,
>>> 
>>> What is the crm shell command to move drbd master to a different node?
>>> 
>>> 
>>> take a look at this 
>>> http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg06300.html
>>> ___
>> 
>> Wow, not the friendliest command I would say. Maybe "move" command can be 
>> enhanced to provide something similar?
>> 
>> Thanks,
>> Vadym
> The crm resource move/migrate  provides/creates the location constraint 
> within the crm shell. 
> 
> In the example I gave:
> 
> crm resource migrate ms_drbd_storage
> 
> WARNING: Creating rsc_location constraint 'cli-standby-ms_drbd_storage' with 
> a score of -INFINITY for resource ms_drbd_storage on cluster1.
>   This will prevent ms_drbd_storage from running on cluster1 until the 
> constraint is removed using the 'crm_resource -U' command or manually with 
> cibadmin
>   This will be the case even if cluster1 is the last node in the cluster
>   This message can be disabled with -Q
> 
> The warning message appears in the console after executing the command in the 
> crm shell, running a crm configure show reveals the following:
> 
> location cli-standby-ms_drbd_storage ms_drbd_storage \
> rule $id="cli-standby-rule-ms_drbd_storage" -inf: #uname eq cluster1
> 
> The example in the UEL above has been done manually to (probably) provide an 
> advisory placement constraint, the one set by crm resource move  is a 
> mandatory placement constraint.
> 


It misses $role=Master and doesn't work because of that.

Vadym

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Vadym Chepkov

On Oct 27, 2010, at 3:47 AM, Pavlos Parissis wrote:
> 
> Does anyone know any other PDU which works out of box with the
> supplied stonith agents?
> 

I use APC AP7901, works like a charm:

primitive pdu stonith:external/rackpdu \
params pduip="10.6.6.6" community="pdu-6" hostlist="AUTO"
clone fencing pdu

Vadym




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Vadym Chepkov

On Oct 27, 2010, at 7:27 AM, Pavlos Parissis wrote:

> On 27 October 2010 13:12, Vadym Chepkov  wrote:
>> 
>> On Oct 27, 2010, at 3:47 AM, Pavlos Parissis wrote:
>>> 
>>> Does anyone know any other PDU which works out of box with the
>>> supplied stonith agents?
>>> 
>> 
>> I use APC AP7901, works like a charm:
>> 
>> primitive pdu stonith:external/rackpdu \
>>params pduip="10.6.6.6" community="pdu-6" hostlist="AUTO"
>> clone fencing pdu
>> 
>> Vadym
> 
> Then most likely the defaults OIDs of the rackpdu agents matches the
> OIDs of the AP7901.
> In my case I have to use OID for the device itself
> 1.3.6.1.4.1.318.1.1.4.4.2.1.3  and OID for retrieving (snmpwalk) the
> outlet list .1.3.6.1.4.1.318.1.1.4.4.2.1.4 .
> 
> Hold on a sec, are you using clone on AP7901? Does it support multiple
> connections? Mine it doesn't.

Then it's useless regardless clone or not, you have to have multiple instances, 
because server can't reliable fence itself, right?

Vadym



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Vadym Chepkov

On Oct 27, 2010, at 7:58 AM, Pavlos Parissis wrote:

> 
> On 27 October 2010 13:43, Vadym Chepkov  wrote:
> 
> On Oct 27, 2010, at 7:27 AM, Pavlos Parissis wrote:
> 
> > On 27 October 2010 13:12, Vadym Chepkov  wrote:
> >>
> >> On Oct 27, 2010, at 3:47 AM, Pavlos Parissis wrote:
> >>>
> >>> Does anyone know any other PDU which works out of box with the
> >>> supplied stonith agents?
> >>>
> >>
> >> I use APC AP7901, works like a charm:
> >>
> >> primitive pdu stonith:external/rackpdu \
> >>params pduip="10.6.6.6" community="pdu-6" hostlist="AUTO"
> >> clone fencing pdu
> >>
> >> Vadym
> >
> > Then most likely the defaults OIDs of the rackpdu agents matches the
> > OIDs of the AP7901.
> > In my case I have to use OID for the device itself
> > 1.3.6.1.4.1.318.1.1.4.4.2.1.3  and OID for retrieving (snmpwalk) the
> > outlet list .1.3.6.1.4.1.318.1.1.4.4.2.1.4 .
> >
> > Hold on a sec, are you using clone on AP7901? Does it support multiple
> > connections? Mine it doesn't.
> 
> Then it's useless regardless clone or not, you have to have multiple 
> instances, because server can't reliable fence itself, right?
> 
> 
> 
> My understanding is/was that I need to have one resource running on 1 of the 
> 3 nodes in the cluster and if a fence event has to be triggered then 
> pacemaker will send to it to the one stonith resource. I am planning to test 
> that the coming days.[1]
> Am I right? if not then I have to buy a different PDU! :-(
> 

My understanding is you have to have a fencing device for each of your hosts. 
Are you sure one connection limitation applies for SNMP? Most likely it's only 
for tcp sessions - ssh/http ?
If you look into rackpdu log you will see this:

Oct 19 12:39:00 xen-11 stonithd: [8606]: debug: external_run_cmd: Calling 
'/usr/lib64/stonith/plugins/external/rackpdu gethosts'
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_run_cmd: 
'/usr/lib64/stonith/plugins/external/rackpdu gethosts' output: xen-11 xen-12 
Outlet_3 Outlet_4 Outlet_5 Outlet_6 Outlet_7 Outlet_8 
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: running 
'rackpdu gethosts' returned 0
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu host 
xen-11
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu host 
xen-12
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu host 
Outlet_3
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu host 
Outlet_4
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu host 
Outlet_5
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu host 
Outlet_6
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu host 
Outlet_7
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu host 
Outlet_8
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: remove us (xen-11) from the 
host list for pdu:0

check the last line - the agent is smart enough to know it can't fence itself.

Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Vadym Chepkov

On Oct 27, 2010, at 8:11 AM, Dejan Muhamedagic wrote:

> Hi,
> 
> On Wed, Oct 27, 2010 at 01:58:20PM +0200, Pavlos Parissis wrote:
>> On 27 October 2010 13:43, Vadym Chepkov  wrote:
>> 
>>> 
>>> On Oct 27, 2010, at 7:27 AM, Pavlos Parissis wrote:
>>> 
>>>> On 27 October 2010 13:12, Vadym Chepkov  wrote:
>>>>> 
>>>>> On Oct 27, 2010, at 3:47 AM, Pavlos Parissis wrote:
>>>>>> 
>>>>>> Does anyone know any other PDU which works out of box with the
>>>>>> supplied stonith agents?
>>>>>> 
>>>>> 
>>>>> I use APC AP7901, works like a charm:
>>>>> 
>>>>> primitive pdu stonith:external/rackpdu \
>>>>>   params pduip="10.6.6.6" community="pdu-6" hostlist="AUTO"
>>>>> clone fencing pdu
>>>>> 
>>>>> Vadym
>>>> 
>>>> Then most likely the defaults OIDs of the rackpdu agents matches the
>>>> OIDs of the AP7901.
>>>> In my case I have to use OID for the device itself
>>>> 1.3.6.1.4.1.318.1.1.4.4.2.1.3  and OID for retrieving (snmpwalk) the
>>>> outlet list .1.3.6.1.4.1.318.1.1.4.4.2.1.4 .
>>>> 
>>>> Hold on a sec, are you using clone on AP7901? Does it support multiple
>>>> connections? Mine it doesn't.
>>> 
>>> Then it's useless regardless clone or not, you have to have multiple
>>> instances, because server can't reliable fence itself, right?
>>> 
>>> 
>>> 
>> My understanding is/was that I need to have one resource running on 1 of the
>> 3 nodes in the cluster and if a fence event has to be triggered then
>> pacemaker will send to it to the one stonith resource. I am planning to test
>> that the coming days.[1]
>> Am I right? if not then I have to buy a different PDU! :-(
> 
> Yes. In case a node which is currently running the stonith
> resource is to be fenced, then the stonith resource would move
> elsewhere first. But, yes, you should test this just like
> anything else. Make sure to test both the "node gone" event
> (failed links) and a critical action failing (such as stop).
> 
> Thanks,
> 
> Dejan

rackpdu stonith agent seems to explicitly remove node itself from list of hosts 
it can fence. so I assume if you have just one instance running, 
cluster would not see any stonith device capable to fence server where agent 
started initially. Would pacemaker move such resource anyway?
Since it reported it can't fence server in trouble?

Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Multiple independent two-node clusters side-by-side?

2010-10-28 Thread Vadym Chepkov

On Oct 28, 2010, at 2:53 AM, Dan Frincu wrote:

> Hi,
> 
> Andreas Ntaflos wrote:
>> 
>> Hi, 
>> 
>> first time poster, short time Pacemaker user. I don't think this is a 
>> very difficult question to answer but I seem to be feeding Google the 
>> wrong search terms. I am using Pacemaker 1.0.8 and Corosync 1.2.0 on 
>> Ubuntu 10.04.1 Server.
>> 
>> Short version: How do I configure multiple independent two-node clusters 
>> where the nodes are all on the same subnet? Only the two nodes that form 
>> the cluster should see that cluster's resources and not any other. 
>> 
>> Is this possible? Where should I look for more and detailed information?
>>   
> You need to specify different multicast sockets for this to work. Under the 
> /etc/corosync/corosync.conf you have the interface statements. Even if all 
> servers are in the same subnet, you can "split them apart" by defining unique 
> multicast sockets.
> An example should be useful. Let's say that you have only one interface 
> statement in the corosync file.
> interface {
> ringnumber: 0
> bindnetaddr: 192.168.1.0 
> mcastaddr: 239.192.168.1 
> mcastport: 5405 
> }
> The multicast socket in this case is 239.192.168.1:5405. All nodes that 
> should be in the same cluster should use the same multicast socket. In your 
> case, the first two nodes should use the same multicast socket. How about the 
> other two nodes? Use another unique multicast socket.
> interface {
> ringnumber: 0
> bindnetaddr: 192.168.1.0 
> mcastaddr: 239.192.168.112 
> mcastport: 5405 
> }
> Now the multicast socket is 239.192.168.112:5405. It's unique, the network 
> address is the same, but you add this config (edit according to your 
> environment, this is just an example) to your other two nodes. So you have 
> cluster1 formed out of node1 and node2 linked to 239.192.168.1:5405 and 
> cluster2 formed out of node3 and node4 linked to 239.192.168.112:5405.
> 
> This way, the clusters don't _see_ each other, so you can reuse the resource 
> ID's and see only two nodes per cluster.


Out of curiosity, RFC2365 defines "local scope" multicast space 239.255.0.0/16 
and "organizational local scope" 239.192.0.0/14.

Seems most examples for pacemaker cluster use later. But since most clusters 
are not spread across different subnets, wouldn't it be more appropriate to use 
the former?

Thanks,
Vadym



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


  1   2   >