[Pacemaker] Fail-count and failure timeout

2010-10-01 Thread Holger . Teutsch
Hi,
I observed the following in pacemaker Versions 1.1.3 and tip up to patch 
10258.

In a small test environment to study fail-count behavior I have one 
resource

anything
doing sleep 600 with monitoring interval 10 secs.

The failure-timeout is 300.

I would expect to never see a failcount higher than 1.

I observed some sporadic clears but mostly the count is increasing by 1 
each 10 minutes. 

Am I mistaken or is this a bug ?

Regards
Holger

-- complete cib for reference ---


  

  







  


  


  

  
  


  
  


  

  


  

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Dependency on either of two resources

2010-10-04 Thread Holger . Teutsch
Hi,
a similar or related use case that we tried to solve without success:
- a stretch cluster with two disk boxes
- a LUN on each disk box guarded by an individual SFEX
- a mirror (raid1 or clvm) that survives an outage of one disk box
- the mirror should be started if at least one SFEX can be obtained and 
the other one could not be obtained on a different node

IMHO sdb is not an alternative as this introduces a SPOF.
 
Mit freundlichen Grüßen / Kind regards 

Holger Teutsch 





From:   Vladislav Bogdanov 
To: The Pacemaker cluster resource manager 

Date:   04.10.2010 06:33
Subject:[Pacemaker] Dependency on either of two resources



Hi all,

just wondering, is there a way to make resource depend on (be colocated
with) either of two other resources?

Use case is iSCSI initiator connection to iSCSI target with two portals.
Idea is to have f.e. device manager multipath resource depend on both
iSCSI connection resources, but in a "soft" way, so fail of any single
iSCSI connection will not cause multipath resource to stop, but fail of
both connections will cause it.

I should be missing something but I cannot find answer is it possible
with current pacemaker. Can someone bring some light?

Best,
Vladislav

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: 
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Fail-count and failure timeout

2010-10-05 Thread Holger . Teutsch
The resource failed when the sleep expired, i.e. each 600 secs.
Now I changed the resource to

sleep 7200, failure-timeout 3600

i.e. to values far beyond the recheck-interval opf 15m.

Now everything behaves as expected.
 
Mit freundlichen Grüßen / Kind regards 

Holger Teutsch 





From:   Andrew Beekhof 
To: The Pacemaker cluster resource manager 

Date:   05.10.2010 11:09
Subject:Re: [Pacemaker] Fail-count and failure timeout



On Tue, Oct 5, 2010 at 11:07 AM, Andrew Beekhof  
wrote:
> On Fri, Oct 1, 2010 at 3:40 PM,   
wrote:
>> Hi,
>> I observed the following in pacemaker Versions 1.1.3 and tip up to 
patch
>> 10258.
>>
>> In a small test environment to study fail-count behavior I have one 
resource
>>
>> anything
>> doing sleep 600 with monitoring interval 10 secs.
>>
>> The failure-timeout is 300.
>>
>> I would expect to never see a failcount higher than 1.
>
> Why?
>
> The fail-count is only reset when the PE runs... which is on a failure
> and/or after the cluster-recheck-interval
> So I'd expect a maximum of two.

Actually this is wrong.
There is no maximum, because there needs to have been 300s since the
last failure when the PE runs.
And since it only runs when the resource fails, it is never reset.

>
>   cluster-recheck-interval = time [15min]
>  Polling interval for time based changes to options,
> resource parameters and constraints.
>
>  The Cluster is primarily event driven, however the
> configuration can have elements that change based on time. To ensure
> these changes take effect, we can optionally poll  the  cluster’s
>  status for changes. Allowed values: Zero disables
> polling. Positive values are an interval in seconds (unless other SI
> units are specified. eg. 5min)
>
>
>
>>
>> I observed some sporadic clears but mostly the count is increasing by 1 
each
>> 10 minutes.
>>
>> Am I mistaken or is this a bug ?
>
> Hard to say without logs.  What value did it reach?
>
>>
>> Regards
>> Holger
>>
>> -- complete cib for reference ---
>>
>> > validate-with="pacemaker-1.2" crm_feature_set="3.0.4" have-quorum="0"
>> cib-last-written="Fri Oct  1 14:17:31 2010" dc-uuid="hotlx">
>>   
>> 
>>   
>> > value="1.1.3-09640bd6069e677d5eed65203a6056d9bf562e67"/>
>> > name="cluster-infrastructure" value="openais"/>
>> > name="expected-quorum-votes" value="2"/>
>> > name="no-quorum-policy" value="ignore"/>
>> > name="stonith-enabled" value="false"/>
>> > name="start-failure-is-fatal" value="false"/>
>> > name="last-lrm-refresh" value="1285926879"/>
>>   
>> 
>> 
>>   
>> 
>> 
>>   
>> 
>>   > value="started"/>
>>   > name="failure-timeout" value="300"/>
>> 
>> 
>>   > on-fail="restart" timeout="20s"/>
>>   > on-fail="restart" timeout="20s"/>
>> 
>> 
>>   > value="sleep 600"/>
>> 
>>   
>> 
>> 
>>   
>> 
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> 
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>>
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: 
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Small document fix Clusters_from_Scratch

2010-12-03 Thread Holger . Teutsch
Hi,
found and fixed an error in Clusters from Scratch
- holger
___
WEB.DE DSL Doppel-Flat ab 19,99 €/mtl.! Jetzt auch mit 
gratis Notebook-Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2
# HG changeset patch
# User Holger Teutsch 
# Date 1291365790 -3600
# Node ID 567a8f8fb3efc4a4213ff9f0b518144bdd895bd1
# Parent  e99aa3451ce77e07f8230fabedbabba18e4e0501
Low: Clusters from Scratch: fix parameter consensus in corosync example
Must be >= 1.2 * token (currently 5000)

diff -r e99aa3451ce7 -r 567a8f8fb3ef doc/Clusters_from_Scratch/en-US/Ap-Corosync-Conf.xml
--- a/doc/Clusters_from_Scratch/en-US/Ap-Corosync-Conf.xml	Thu Dec 02 16:52:37 2010 +0100
+++ b/doc/Clusters_from_Scratch/en-US/Ap-Corosync-Conf.xml	Fri Dec 03 09:43:10 2010 +0100
@@ -38,7 +38,7 @@ totem {
 
         # How long to wait for consensus to be achieved before starting a new
         # round of membership configuration (ms)
-        consensus:      2500
+        consensus:      6000
 
         # Turn off the virtual synchrony filter
         vsftype:        none
diff -r e99aa3451ce7 -r 567a8f8fb3ef doc/Clusters_from_Scratch/it-IT/Ap-Corosync-Conf.po
--- a/doc/Clusters_from_Scratch/it-IT/Ap-Corosync-Conf.po	Thu Dec 02 16:52:37 2010 +0100
+++ b/doc/Clusters_from_Scratch/it-IT/Ap-Corosync-Conf.po	Fri Dec 03 09:43:10 2010 +0100
@@ -52,7 +52,7 @@ msgid ""
 "\n"
 "        # How long to wait for consensus to be achieved before starting a new\n"
 "        # round of membership configuration (ms)\n"
-"        consensus:      2500\n"
+"        consensus:      6000\n"
 "\n"
 "        # Turn off the virtual synchrony filter\n"
 "        vsftype:        none\n"
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Split-site cluster in two locations

2011-01-11 Thread Holger Teutsch
On Tue, 2011-01-11 at 10:21 +0100, Christoph Herrmann wrote:
> -Ursprüngliche Nachricht-
> Von: Andrew Beekhof 
> Gesendet: Di 11.01.2011 09:01
> An: The Pacemaker cluster resource manager ; 
> CC: Michael Schwartzkopff ; 
> Betreff: Re: [Pacemaker] Split-site cluster in two locations
> 
> > On Tue, Dec 28, 2010 at 10:21 PM, Anton Altaparmakov  
> > wrote:
> > > Hi,
> > >
> > > On 28 Dec 2010, at 20:32, Michael Schwartzkopff wrote:
> > >> Hi,
> > >>
> > >> I have four nodes in a split site scenario located in two computing 
> > >> centers.
> > >> STONITH is enabled.
> > >>
> > >> Is there and best practise how to deal with this setup? Does it make 
> > >> sense to
> > >> set expected-quorum-votes to "3" to make the whole setup still running 
> > >> with
> > >> one data center online? Is this possible at all?
> > >>
> > >> Is quorum needed with STONITH enabled?
> > >>
> > >> Is there a quorum server available already?
> > >
> > > I couldn't see a quorum server in Pacemaker so I have installed a third 
> > > dummy 
> > node which is not allowed to run any resources (using location constraints 
> > and 
> > setting the cluster to not be symmetric) which just acts as a third vote.  
> > I am 
> > hoping this effectively acts as a quorum server as a node that looses 
> > connectivity will lose quorum and shut down its services whilst the other 
> > real 
> > node will retain connectivity and thus quorum due to the dummy node still 
> > being 
> > present.
> > >
> > > Obviously this is quite wasteful of servers as you can only run a single 
> > Pacemaker instance on a server (as far as I know) so that is a lot of dummy 
> > servers when you run multiple pacemaker clusters...  Solution for us is to 
> > use 
> > virtualization - one physical server with VMs and each VM is a dummy node 
> > for a 
> > cluster...
> > 
> > With recent 1.1.x builds it should be possible to run just the
> > corosync piece (no pacemaker).
> > 
> 
> As long as you have only two computing centers it doesn't matter if you run a 
> corosync
> only piece or whatever  on a physikal or a virtual machine. The question is: 
> How to
> configure a four node (or six node, an even number bigger then two) 
> corosync/pacemaker
> cluster to continue services if you have a blackout in one computing center 
> (you will
> always loose (at least) one half of your nodes), but to shutdown everything 
> if you have
> less then half of the node available. Are there any best practices on how to 
> deal with
> clusters in two computing centers? Anything like an external quorum node or a 
> quorum
> partition? I'd like to set the expected-quorum-votes to "3" but this is not 
> possible
> (with corosync-1.2.6 and pacemaker-1.1.2 on SLES11 SP1) Does anybody know why?
> Currently, the only way I can figure out is to run the cluster with 
> no-quorum-policy="ignore". But I don't like that. Any suggestions?
> 
> 
> Best regards
> 
>   Christoph

Hi,
I assume the only solution is to work with manual intervention, i.e. the
stonith meatware module.
Whenever a site goes down a human being has to confirm that it is lost,
pull the power cords or the inter-site links so it will not come back
unintentionally.

Then confirm with meatclient on the healthy site that the no longer
reachable site can be considered gone.

From theory this can be configured with an additional meatware stonith
resource with lower priority. The intention is to let your regular
stonith resources do the work with meatware as last resort.
Although I was not able to get this running with versions packaged with
SLES11 SP1. The priority was not honored and a lot of zombie meatware
processes were left over.
I found some patches in the upstream repositories that seem to address
these problems but I didn't follow up.

Regards
Holger


 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Howto write a STONITH agent

2011-01-15 Thread Holger Teutsch
On Fri, 2011-01-14 at 17:10 +0100, Christoph Herrmann wrote:
> -Ursprüngliche Nachricht-
> Von: Dejan Muhamedagic 
> Gesendet: Fr 14.01.2011 12:31
> An: The Pacemaker cluster resource manager ; 
> Betreff: Re: [Pacemaker] Howto write a STONITH agent
> 
> > Hi,
> > 
> > On Thu, Jan 13, 2011 at 09:09:38PM +0100, Christoph Herrmann wrote:
> > > Hi,
> > > 
> > > I have some brand new HP Blades with ILO Boards (iLO 2 Standard Blade 
> > > Edition 
> > 1.81 ...)
> > > But I'm not able to connect with them via the external/riloe agent.
> > > When i try:
> > > 
> > > stonith -t external/riloe -p "hostlist=node1 ilo_hostname=ilo1  
> > ilo_user=ilouser ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 
> > ilo_powerdown_method=power" -S
> > 
> > Try this:
> > 
> > stonith -t external/riloe hostlist=node1 ilo_hostname=ilo1  
> > ilo_user=ilouser 
> > ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 
> > ilo_powerdown_method=power -S
> 
> thats much better (looks like PEBKAC ;-), thanks! But it is not reliable. 
> I've tested it about 10 times
> and 5 times it hangs.  That's not what I want.
I had the same experience. Ilo is _extremely_ slow and unreliable.

Go for external/ipmi.

That works very fast and reliable. It is available with ILO 2.x
firmware.

- holger


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Packages for Opensuse 11.3 don't build / install

2011-02-15 Thread Holger Teutsch
Hi,
the packages from rpm-next(64bit) for opensuse 11.3 do not install there (at 
least true for 1.1.4 and 1.1.5).

The plugin is in
./usr/lib/lcrso/pacemaker.lcrso

but should be in
./usr/lib64/lcrso/pacemaker.lcrso

I think the patch below (borrowed from the 'official' packages) cures.
Regards
Holger


diff -r 43a11c0daae4 pacemaker.spec
--- a/pacemaker.specMon Feb 14 15:25:13 2011 +0100
+++ b/pacemaker.specTue Feb 15 17:50:27 2011 +0100
@@ -1,3 +1,7 @@
+%if 0%{?suse_version}
+%define _libexecdir %{_libdir}
+%endif
+
 %global gname haclient
 %global uname hacluster
 %global pcmk_docdir %{_docdir}/%{name}



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] rpm packages in Pacemaker's rpm-next do not install

2011-02-17 Thread Holger Teutsch
At least for Opensuse 11.3 x86_64. May be the title of my previous mail
was misleading.
-h
On Tue, 2011-02-15 at 17:57 +0100, Holger Teutsch wrote:
> Hi,
> the packages from rpm-next(64bit) for opensuse 11.3 do not install there (at 
> least true for 1.1.4 and 1.1.5).
> 
> The plugin is in
> ./usr/lib/lcrso/pacemaker.lcrso
> 
> but should be in
> ./usr/lib64/lcrso/pacemaker.lcrso
> 
> I think the patch below (borrowed from the 'official' packages) cures.
> Regards
> Holger
> 
> 
> diff -r 43a11c0daae4 pacemaker.spec
> --- a/pacemaker.spec  Mon Feb 14 15:25:13 2011 +0100
> +++ b/pacemaker.spec  Tue Feb 15 17:50:27 2011 +0100
> @@ -1,3 +1,7 @@
> +%if 0%{?suse_version}
> +%define _libexecdir %{_libdir}
> +%endif
> +
>  %global gname haclient
>  %global uname hacluster
>  %global pcmk_docdir %{_docdir}/%{name}
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Patch: Doc correction "Pacemaker explained"

2011-03-01 Thread Holger Teutsch
Hi,
small fix -> reload is implemented
- holger

diff -r cf4e9febed8e doc/Pacemaker_Explained/en-US/Ap-OCF.xml
--- a/doc/Pacemaker_Explained/en-US/Ap-OCF.xml  Wed Feb 23 14:52:34 2011 +0100
+++ b/doc/Pacemaker_Explained/en-US/Ap-OCF.xml  Tue Mar 01 09:34:55 2011 +0100
@@ -81,9 +81,8 @@
   Must not fail. Must exit 0
 
   
-  Some actions specified in the OCF specs are not currently used by 
the cluster
+  One action specified in the OCF specs is not currently used by the 
cluster
   
-   reload - reload the configuration of the resource 
instance without disrupting the service 
recover - a variant of the start action, this should 
try to recover a resource locally.
   
   



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated

2011-03-03 Thread Holger Teutsch
Hi,
I submit a patch for
"bugzilla 2541: Shell should warn if parameter uniqueness is violated"
for discussion.

devel1:~ # crm configure
crm(live)configure# primitive ip1a ocf:heartbeat:IPaddr2 params ip="1.2.3.4" 
meta target-role="stopped"
crm(live)configure# primitive ip1b ocf:heartbeat:IPaddr2 params ip="1.2.3.4" 
meta target-role="stopped"
crm(live)configure# primitive ip2a ocf:heartbeat:IPaddr2 params ip="1.2.3.5" 
meta target-role="stopped"
crm(live)configure# primitive ip2b ocf:heartbeat:IPaddr2 params ip="1.2.3.5" 
meta target-role="stopped"
crm(live)configure# primitive ip3 ocf:heartbeat:IPaddr2 params ip="1.2.3.6" 
meta target-role="stopped"
crm(live)configure# primitive dummy_1 ocf:heartbeat:Dummy params fake="abc" 
meta target-role="stopped"
crm(live)configure# primitive dummy_2 ocf:heartbeat:Dummy params fake="abc" 
meta target-role="stopped"
crm(live)configure# primitive dummy_3 ocf:heartbeat:Dummy meta 
target-role="stopped"
crm(live)configure# commit
WARNING: Violations of instance parameters with attribute unique detected:
  Agent "ocf:heartbeat:IPaddr2" parameter "ip" value "1.2.3.4" in resources
  ip1a
  ip1b

  Agent "ocf:heartbeat:IPaddr2" parameter "ip" value "1.2.3.5" in resources
  ip2a
  ip2b

Do you still want to commit? n
crm(live)configure#

The code now lives in ui.py. I'm not sure whether it should be considered as 
more cib related an be moved to some other module.

Regards
Holger

diff -r cf4e9febed8e -r 810c5ea83873 shell/modules/ui.py.in
--- a/shell/modules/ui.py.in	Wed Feb 23 14:52:34 2011 +0100
+++ b/shell/modules/ui.py.in	Thu Mar 03 10:24:51 2011 +0100
@@ -1509,6 +1509,60 @@
 return False
 set_obj = mkset_obj("xml")
 return ptestlike(set_obj.ptest,'vv',cmd,*args)
+
+def __check_unique_clash(self):
+'Check whether resource parameters with attribute "unique" clash'
+
+def process_primitive(prim, clash_dict):
+'''
+Update dict clash_dict with
+(ra_class, ra_provider, ra_type, name, value) -> [ resourcename ]
+if parameter "name" should be unique
+'''
+ra_class = prim.getAttribute("class")
+ra_provider = prim.getAttribute("provider")
+ra_type = prim.getAttribute("type")
+ra_id = prim.getAttribute("id")
+
+ra = RAInfo(ra_class, ra_type, ra_provider)
+if ra == None:
+return
+ra_params = ra.params()
+
+attributes = prim.getElementsByTagName("instance_attributes")
+if len(attributes) == 0:
+return
+
+for p in attributes[0].getElementsByTagName("nvpair"):
+name = p.getAttribute("name")
+if ra_params[ name ]['unique'] == '1':
+value = p.getAttribute("value")
+k = (ra_class, ra_provider, ra_type, name, value)
+try:
+clash_dict[k].append(ra_id)
+except:
+clash_dict[k] = [ra_id]
+return
+
+clash_dict = {}
+for p in cib_factory.mkobj_list("xml","type:primitive"):
+process_primitive(p.node, clash_dict)
+
+clash_msg = []
+for param, resources in clash_dict.items():
+if len(resources) > 1:
+tag = ':'.join(param[:3])
+clash_msg.append('  Agent "%s" parameter "%s" value "%s" in resources'\
+%(tag, param[3], param[4]))
+for r in sorted(resources):
+clash_msg.append("  %s"%r)
+clash_msg.append("")
+
+if len(clash_msg) > 0:
+common_warning("Violations of instance parameters with attribute unique detected:")
+print "\n".join(clash_msg)
+return 0
+return 1
 def commit(self,cmd,force = None):
 "usage: commit [force]"
 if force and force != "force":
@@ -1523,7 +1577,8 @@
 rc1 = cib_factory.is_current_cib_equal()
 rc2 = cib_factory.is_cib_empty() or \
 self._verify(mkset_obj("xml","changed"),mkset_obj("xml"))
-if rc1 and rc2:
+rc3 = self.__check_unique_clash()
+if rc1 and rc2 and rc3:
 return cib_factory.commit()
 if force or user_prefs.get_force():
 common_info("commit forced")
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated

2011-03-04 Thread Holger Teutsch
On Thu, 2011-03-03 at 10:55 +0100, Florian Haas wrote:
> On 2011-03-03 10:43, Holger Teutsch wrote:
> > Hi,
> > I submit a patch for
> > "bugzilla 2541: Shell should warn if parameter uniqueness is violated"
> > for discussion.
> 
> I'll leave it do Dejan to review the code, but I love the functionality.
> Thanks a lot for tackling this. My only suggestion for an improvement is
> to make the warning message a bit more terse, as in:
> 
> WARNING: Resources ip1a, ip1b violate uniqueness for parameter "ip":
> "1.2.3.4"
> 

Florian,
I see your point. Although my formatting allows for an unlimited number
of collisions ( 8-) ) in real life we will only have 2 or 3. Will change
this together with Dejan's hints.

> Cheers,
> Florian
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] WG: time pressure - software raid cluster, raid1 ressource agent, help needed

2011-03-06 Thread Holger Teutsch
On Sun, 2011-03-06 at 12:40 +0100, patrik.rappo...@knapp.com wrote:
Hi,
assume the basic problem is in your raid configuration.

If you unmap one box the devices should not be in status FAIL but in
degraded.

So what is the exit status of

mdadm --detail --test /dev/md0

after unmapping ?

Furthermore I would start start with one isolated group containing the
raid, LVM, and FS to keep it simple.

Regards
Holger

>  Hy, 
> 
> 
> does anyone have an idea to that? I only have the servers till next
> week friday, so to my regret I am under time pressure :(
> 
> 
> 
> Like I already wrote, I would appreciate and test any idea of you.
> Also if someone already made clusters with lvm-mirror, I would be
> happy to get a cib or some configuration examples.
> 
>  
> 
> 
> 
> 
> 
> Thank you very much in advance.
> 
>  
> 
> 
> 
> 
> 
> kr Patrik
> 
> 
> 
> 
> 
> patrik.rappo...@knapp.com
> 03.03.2011 15:11Bitte antworten anThe Pacemaker cluster resource
> manager
> 
> An   pacemaker@oss.clusterlabs.org
> Kopie   
> Blindkopie   
> Thema   [Pacemaker] software raid cluster, raid1 ressource agent,help
> needed
> 
> 
> Good Day, 
> 
> I have a 2 node active/passive cluster which is connected to two  ibm
> 4700 storages. I configured 3 raids and I use the Raid1 ressource
> agent for managing the Raid1s in the cluster. 
> When I now disable the mapping of one storage, to simulate the fail of
> one storage, the Raid1 Ressources change to the State "FAILED" and the
> second node then takes over the ressources and is able to start the
> raid devices. 
> 
> So I am confused, why the active node can't keep the raid1 ressources
> and the former passive node takes them over and can start them
> correct. 
> 
> I would really appreciate your advice, or maybe someone already has a
> example configuration for Raid1 with two storages.
> 
> Thank you very much in advance. Attached you can find my cib.xml. 
> 
> kr Patrik 
> 
> 
> 
> Mit freundlichen Grüßen / Best Regards
> 
> Patrik Rapposch, BSc
> System Administration
> 
> KNAPP Systemintegration GmbH
> Waltenbachstraße 9
> 8700 Leoben, Austria 
> Phone: +43 3842 805-915
> Fax: +43 3842 805-500
> patrik.rappo...@knapp.com 
> www.KNAPP.com 
> 
> Commercial register number: FN 138870x
> Commercial register court: Leoben
> 
> The information in this e-mail (including any attachment) is
> confidential and intended to be for the use of the addressee(s) only.
> If you have received the e-mail by mistake, any disclosure, copy,
> distribution or use of the contents of the e-mail is prohibited, and
> you must delete the e-mail from your system. As e-mail can be changed
> electronically KNAPP assumes no responsibility for any alteration to
> this e-mail or its attachments. KNAPP has taken every reasonable
> precaution to ensure that any attachment to this e-mail has been swept
> for virus. However, KNAPP does not accept any liability for damage
> sustained as a result of such attachment being virus infected and
> strongly recommend that you carry out your own virus check before
> opening any attachment.
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Antwort: Re: WG: time pressure - software raid cluster, raid1 ressource agent, help needed

2011-03-07 Thread Holger Teutsch
Hi,
SAN drivers often cave large timeouts configured, so are you patient
enough ?
At least this demonstrates that the problem is currently not in the
cluster...
- holger
On Mon, 2011-03-07 at 11:04 +0100, patrik.rappo...@knapp.com wrote:
> Hy, 
> 
> thx for answer. I tested this now, the problem is, mdadm hangs totally
> when we simulate the fail of one storage. (we already tried two ways:
> 1. removing the mapping., 2. removing one path, and then disabling the
> remaining path through the port on the san switch - which is nearly
> the same like a total fail of the storage). 
> 
> So I can't get the output of mdadm, because it hangs. 
> 
> I think it must be a problem with mdadm. This is my mdadm.conf: 
> 
> "DEVICE /dev/mapper/3600a0b800050c94e07874d2e0028_part1 
> /dev/mapper/3600a0b8000511f5414b14d2df1b1_part1 
> /dev/mapper/3600a0b800050c94e07874d2e0028_part2 
> /dev/mapper/3600a0b8000511f5414b14d2df1b1_part2 
> /dev/mapper/3600a0b800050c94e07874d2e0028_part3 
> /dev/mapper/3600a0b8000511f5414b14d2df1b1_part3 
> ARRAY /dev/md0 metadata=0.90 UUID=c411c076:bb28916f:d50a93ef:800dd1f0 
> ARRAY /dev/md1 metadata=0.90 UUID=522279fa:f3cdbe3a:d50a93ef:800dd1f0 
> ARRAY /dev/md2 metadata=0.90
> UUID=01e07d7d:5305e46c:d50a93ef:800dd1f0" 
> 
> kr Patrik 
> 
> 
> Mit freundlichen Grüßen / Best Regards
> 
> Patrik Rapposch, BSc
> System Administration
> 
> KNAPP Systemintegration GmbH
> Waltenbachstraße 9
> 8700 Leoben, Austria 
> Phone: +43 3842 805-915
> Fax: +43 3842 805-500
> patrik.rappo...@knapp.com 
> www.KNAPP.com
> 
> Commercial register number: FN 138870x
> Commercial register court: Leoben
> 
> The information in this e-mail (including any attachment) is
> confidential and intended to be for the use of the addressee(s) only.
> If you have received the e-mail by mistake, any disclosure, copy,
> distribution or use of the contents of the e-mail is prohibited, and
> you must delete the e-mail from your system. As e-mail can be changed
> electronically KNAPP assumes no responsibility for any alteration to
> this e-mail or its attachments. KNAPP has taken every reasonable
> precaution to ensure that any attachment to this e-mail has been swept
> for virus. However, KNAPP does not accept any liability for damage
> sustained as a result of such attachment being virus infected and
> strongly recommend that you carry out your own virus check before
> opening any attachment. 
> 
> 
> Holger Teutsch
>  
> 
> 06.03.2011 19:56 
> Bitte antworten an
>   The Pacemaker cluster resource
>   manager
>   
> 
> 
> 
> 
>An
> The Pacemaker
> cluster resource
> manager
>  
> Kopie
> 
> Thema
> Re: [Pacemaker]
> WG: time pressure
> - software raid
> cluster, raid1
> ressource agent,
> help needed
> 
> 
> 
> 
> 
> 
> 
> 
> On Sun, 2011-03-06 at 12:40 +0100, patrik.rappo...@knapp.com wrote:
> Hi,
> assume the basic problem is in your raid configuration.
> 
> If you unmap one box the devices should not be in status FAIL but in
> degraded.
> 
> So what is the exit status of
> 
> mdadm --detail --test /dev/md0
> 
> after unmapping ?
> 
> Furthermore I would start start with one isolated group containing the
> raid, LVM, and FS to keep it simple.
> 
> Regards
> Holger
> 
> >  Hy, 
> > 
> > 
> > does anyone have an idea to that? I only have the servers till next
> > week friday, so to my regret I am under time pressure :(
> > 
> > 
> > 
> > Like I already wrote, I would appreciate and test any idea of you.
> > Also if someone already made clusters with lvm-mirror, I would be
> > happy to get a cib or some configuration examples.
> > 
> >  
> > 
> > 
> > 
> > 
> > 
> > Thank you very much in advance.
> > 
> >  
> > 
> > 
> > 
> > 
> > 
> > kr Patrik
> > 
> > 
> > 
> > 
> > 
> > patrik.rappo...@knapp.com
> > 03.03.2011 15:11Bitte antworten anThe Pacemaker cluster resource
> > manager
> > 
> > An   pacemaker@oss.clusterlabs.org
> > Kopie   
> > Blindkopie   
> > Thema   [Pacemaker] software raid cluster, raid1 ressource
> agent,help
> > needed
> > 
> > 
> > Good Day, 
> > 
> > I have a 2 node active/passive cluster which is connected to two
>  ibm
> > 4700 storages. I configured 3 raids and I use the Raid1 ressource
> > agent for managing the Raid1s in the cluster. 
> > When I now disable the mapping of one storage, to 

Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated

2011-03-08 Thread Holger Teutsch
On Fri, 2011-03-04 at 13:06 +0100, Holger Teutsch wrote:
> On Thu, 2011-03-03 at 10:55 +0100, Florian Haas wrote:
> > On 2011-03-03 10:43, Holger Teutsch wrote:
> > > Hi,
> > > I submit a patch for
> > > "bugzilla 2541: Shell should warn if parameter uniqueness is violated"
> > > for discussion.
> > 
> > I'll leave it do Dejan to review the code, but I love the functionality.
> > Thanks a lot for tackling this. My only suggestion for an improvement is
> > to make the warning message a bit more terse, as in:
> > 
> > WARNING: Resources ip1a, ip1b violate uniqueness for parameter "ip":
> > "1.2.3.4"
> > 
> 
> Florian,
> I see your point. Although my formatting allows for an unlimited number
> of collisions ( 8-) ) in real life we will only have 2 or 3. Will change
> this together with Dejan's hints.
> 
> > Cheers,
> > Florian
> > 
Florian + Dejan,
here the version with terse output. The code got terser as well.
- holger

crm(live)configure# primitive ip1a ocf:heartbeat:IPaddr2 params ip="1.2.3.4" 
meta target-role="stopped"
crm(live)configure# primitive ip1b ocf:heartbeat:IPaddr2 params ip="1.2.3.4" 
meta target-role="stopped"
crm(live)configure# primitive ip2a ocf:heartbeat:IPaddr2 params ip="1.2.3.5" 
meta target-role="stopped"
crm(live)configure# primitive ip2b ocf:heartbeat:IPaddr2 params ip="1.2.3.5" 
meta target-role="stopped"
crm(live)configure# primitive ip3 ocf:heartbeat:IPaddr2 params ip="1.2.3.6" 
meta target-role="stopped"
crm(live)configure# primitive dummy_1 ocf:heartbeat:Dummy params fake="abc" 
meta target-role="stopped"
crm(live)configure# primitive dummy_2 ocf:heartbeat:Dummy params fake="abc" 
meta target-role="stopped"
crm(live)configure# primitive dummy_3 ocf:heartbeat:Dummy meta 
target-role="stopped"
crm(live)configure# commit
WARNING: Resources ip1a,ip1b violate uniqueness for parameter "ip": "1.2.3.4"
WARNING: Resources ip2a,ip2b violate uniqueness for parameter "ip": "1.2.3.5"
Do you still want to commit? 


diff -r cf4e9febed8e shell/modules/ui.py.in
--- a/shell/modules/ui.py.in	Wed Feb 23 14:52:34 2011 +0100
+++ b/shell/modules/ui.py.in	Tue Mar 08 09:11:38 2011 +0100
@@ -1509,6 +1509,55 @@
 return False
 set_obj = mkset_obj("xml")
 return ptestlike(set_obj.ptest,'vv',cmd,*args)
+
+def __check_unique_clash(self):
+'Check whether resource parameters with attribute "unique" clash'
+
+def process_primitive(prim, clash_dict):
+'''
+Update dict clash_dict with
+(ra_class, ra_provider, ra_type, name, value) -> [ resourcename ]
+if parameter "name" should be unique
+'''
+ra_class = prim.getAttribute("class")
+ra_provider = prim.getAttribute("provider")
+ra_type = prim.getAttribute("type")
+ra_id = prim.getAttribute("id")
+
+ra = RAInfo(ra_class, ra_type, ra_provider)
+if ra == None:
+return
+ra_params = ra.params()
+
+attributes = prim.getElementsByTagName("instance_attributes")
+if len(attributes) == 0:
+return
+
+for p in attributes[0].getElementsByTagName("nvpair"):
+name = p.getAttribute("name")
+if ra_params[ name ]['unique'] == '1':
+value = p.getAttribute("value")
+k = (ra_class, ra_provider, ra_type, name, value)
+try:
+clash_dict[k].append(ra_id)
+except:
+clash_dict[k] = [ra_id]
+return
+
+clash_dict = {}
+for p in cib_factory.mkobj_list("xml","type:primitive"):
+process_primitive(p.node, clash_dict)
+
+no_clash = 1
+for param, resources in clash_dict.items():
+if len(resources) > 1:
+no_clash = 0
+msg = 'Resources %s violate uniqueness for parameter "%s": "%s"' %\
+(",".join(sorted(resources)), param[3], param[4])
+common_warning(msg)
+
+return no_clash
+
 def commit(self,cmd,force = None):
 "usage: commit [force]"
 if force and force != "force":
@@ -1523,7 +1572,8 @@
 rc1 = cib_factory.is_current_cib_equal()
 rc2 = cib_factory.is_cib_empty() or \
 se

Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated

2011-03-08 Thread Holger Teutsch
Hi Dejan,

On Tue, 2011-03-08 at 12:07 +0100, Dejan Muhamedagic wrote:
> Hi Holger,
> 
> On Tue, Mar 08, 2011 at 09:15:01AM +0100, Holger Teutsch wrote:
> > On Fri, 2011-03-04 at 13:06 +0100, Holger Teutsch wrote:
> > > On Thu, 2011-03-03 at 10:55 +0100, Florian Haas wrote:
> > > > On 2011-03-03 10:43, Holger Teutsch wrote:
> > > > > Hi,
> > > > > I submit a patch for
> > > > > "bugzilla 2541: Shell should warn if parameter uniqueness is violated"
> > > > > for discussion.
> > > > 
...
> It looks good, just a few notes. The check function should move
> to the CibObjectSetRaw class and be invoked from

Will move it there.

> semantic_check(). There's
> 
> rc1 = set_obj_verify.verify()
> if user_prefs.check_frequency != "never":
>   rc2 = set_obj_semantic.semantic_check()
> 
> The last should be changed to:
> 
>   rc2 = set_obj_semantic.semantic_check(set_obj_verify)
> 
> set_obj_verify always contains all CIB elements (well, that means
> that its name should probably be changed too :). Now, the code
> should check _only_ new and changed primitives which are
> contained in set_obj_semantic. That's because we don't want to
> repeatedly print warnings for all objects on commit, but only for
> those which were added/changed in the meantime. On the other
> hand, verify is an explicit check and in that case the whole CIB
> is always verified.
> 
> > 
> > +ra_class = prim.getAttribute("class")
> > +ra_provider = prim.getAttribute("provider")
> > +ra_type = prim.getAttribute("type")
> > +ra_id = prim.getAttribute("id")
> > +
> > +ra = RAInfo(ra_class, ra_type, ra_provider)
> 
> There's a convenience function get_ra(node) for this.
> 

I did not use this as I need all ra_XXX value anyhow later in the code
for building k.

> > +if ra == None:
> > +return
> > +ra_params = ra.params()
> > +
> > +attributes = prim.getElementsByTagName("instance_attributes")
> > +if len(attributes) == 0:
> > +return
> > +
> > +for p in attributes[0].getElementsByTagName("nvpair"):
> > +name = p.getAttribute("name")
> > +if ra_params[ name ]['unique'] == '1':
> > +value = p.getAttribute("value")
> > +k = (ra_class, ra_provider, ra_type, name, value)
> > +try:
> > +clash_dict[k].append(ra_id)
> > +except:
> > +clash_dict[k] = [ra_id]
> > +return
> > +
> > +clash_dict = {}
> > +for p in cib_factory.mkobj_list("xml","type:primitive"):
> 
> This would become:
> 
>for p in all_obj_list: # passed from _verify()
>  if is_primitive(p.node):
> 
> > +process_primitive(p.node, clash_dict)
> 
> Or perhaps to loop through self.obj_list and build clash_dict
> against all elements? Otherwise, you'll need to skip elements
> which don't pass the check but are not new/changed (in
> self.obj_list).
> 

The typical occurrences of clashes will originate from "old" objects and
"new/changed" objects.

I think I have to loop over all objects to build clash dict  and
then ... 

> > +
> > +no_clash = 1
> > +for param, resources in clash_dict.items():
> > +if len(resources) > 1:

... only emit a warning if the intersection of a "clash set" with
"changed objects" is not empty. 

> > +no_clash = 0
> > +msg = 'Resources %s violate uniqueness for parameter "%s": 
> > "%s"' %\
> > +(",".join(sorted(resources)), param[3], param[4])
> > +common_warning(msg)
> > +
> > +return no_clash
> > +

I will submit an updated version later this week.

-holger


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated

2011-03-09 Thread Holger Teutsch
Hi Dejan,

On Tue, 2011-03-08 at 19:26 +0100, Holger Teutsch wrote:
> Hi Dejan,
> 
> On Tue, 2011-03-08 at 12:07 +0100, Dejan Muhamedagic wrote:
> > Hi Holger,
> > 
> > On Tue, Mar 08, 2011 at 09:15:01AM +0100, Holger Teutsch wrote:
> > > On Fri, 2011-03-04 at 13:06 +0100, Holger Teutsch wrote:
> > > > On Thu, 2011-03-03 at 10:55 +0100, Florian Haas wrote:
> > > > > On 2011-03-03 10:43, Holger Teutsch wrote:
> > > > > > Hi,
> > > > > > I submit a patch for
> > > > > > "bugzilla 2541: Shell should warn if parameter uniqueness is 
> > > > > > violated"
> > > > > > for discussion.
> > > > > 
> ...
> > It looks good, just a few notes. The check function should move
> > to the CibObjectSetRaw class and be invoked from
> 
> Will move it there.
> 
> > semantic_check(). There's
> > 
> > rc1 = set_obj_verify.verify()
> > if user_prefs.check_frequency != "never":
> > rc2 = set_obj_semantic.semantic_check()
> > 
> > The last should be changed to:
> > 
> > rc2 = set_obj_semantic.semantic_check(set_obj_verify)
> > 
> > set_obj_verify always contains all CIB elements (well, that means
> > that its name should probably be changed too :). Now, the code
> > should check _only_ new and changed primitives which are
> > contained in set_obj_semantic. That's because we don't want to
> > repeatedly print warnings for all objects on commit, but only for
> > those which were added/changed in the meantime. On the other
> > hand, verify is an explicit check and in that case the whole CIB
> > is always verified.
> > 
> > > 
> > > +ra_class = prim.getAttribute("class")
> > > +ra_provider = prim.getAttribute("provider")
> > > +ra_type = prim.getAttribute("type")
> > > +ra_id = prim.getAttribute("id")
> > > +
> > > +ra = RAInfo(ra_class, ra_type, ra_provider)
> > 
> > There's a convenience function get_ra(node) for this.
> > 
> 
> I did not use this as I need all ra_XXX value anyhow later in the code
> for building k.
> 
> > > +if ra == None:
> > > +return
> > > +ra_params = ra.params()
> > > +
> > > +attributes = prim.getElementsByTagName("instance_attributes")
> > > +if len(attributes) == 0:
> > > +return
> > > +
> > > +for p in attributes[0].getElementsByTagName("nvpair"):
> > > +name = p.getAttribute("name")
> > > +if ra_params[ name ]['unique'] == '1':
> > > +value = p.getAttribute("value")
> > > +k = (ra_class, ra_provider, ra_type, name, value)
> > > +try:
> > > +clash_dict[k].append(ra_id)
> > > +except:
> > > +clash_dict[k] = [ra_id]
> > > +return
> > > +
> > > +clash_dict = {}
> > > +for p in cib_factory.mkobj_list("xml","type:primitive"):
> > 
> > This would become:
> > 
> >for p in all_obj_list: # passed from _verify()
> >if is_primitive(p.node):
> > 
> > > +process_primitive(p.node, clash_dict)
> > 
> > Or perhaps to loop through self.obj_list and build clash_dict
> > against all elements? Otherwise, you'll need to skip elements
> > which don't pass the check but are not new/changed (in
> > self.obj_list).
> > 
> 

I did not pass "set_obj_verify" in semantic check as this variable "only
by chance" contains the right values.

- holger

Output:
crm(live)configure# primitive ip1a ocf:heartbeat:IPaddr2 params ip="1.2.3.4" 
meta target-role="stopped"
crm(live)configure# primitive ip1b ocf:heartbeat:IPaddr2 params ip="1.2.3.4" 
meta target-role="stopped"
crm(live)configure# commit
WARNING: Resources ip1a,ip1b violate uniqueness for parameter "ip": "1.2.3.4"
Do you still want to commit? y
crm(live)configure# primitive ip2a ocf:heartbeat:IPaddr2 params ip="1.2.3.5" 
meta target-role="stopped"
crm(live)configure# commit
crm(live)configure# primitive ip2b ocf:heartbeat:IPa

Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated

2011-03-09 Thread Holger Teutsch
Hi Dejan,

On Wed, 2011-03-09 at 14:00 +0100, Dejan Muhamedagic wrote:
> Hi Holger,

> > > > 
> > > > This would become:
> > > > 
> > > >for p in all_obj_list: # passed from _verify()
> > > >if is_primitive(p.node):
> > > > 
> > > > > +process_primitive(p.node, clash_dict)
> > > > 
> > > > Or perhaps to loop through self.obj_list and build clash_dict
> > > > against all elements? Otherwise, you'll need to skip elements
> > > > which don't pass the check but are not new/changed (in
> > > > self.obj_list).
> > > > 
> > > 
> > 
> > I did not pass "set_obj_verify" in semantic check as this variable "only
> > by chance" contains the right values.
> 
> But it's not by chance. As I wrote earlier, it always contains
> the whole CIB. It has to, otherwise crm_verify wouldn't work. It
> should actually be renamed to set_obj_all or similar. Since we
> already have that list created, it's better to reuse it than to
> create another one from scratch. Further, we may need to add more
> semantic checks which would require looking at the whole CIB.
> 

OK, I implemented it this way.

In order to show the intention of the arguments clearer:

Instead of

def _verify(self, set_obj_semantic, set_obj_all = None):
if not set_obj_all:
set_obj_all = set_obj_semantic
rc1 = set_obj_all.verify()
if user_prefs.check_frequency != "never":
rc2 = set_obj_semantic.semantic_check(set_obj_all)
else:
rc2 = 0
return rc1 and rc2 <= 1
def verify(self,cmd):
"usage: verify"
if not cib_factory.is_cib_sane():
return False
return self._verify(mkset_obj("xml"))

This way (always passing both args):

def _verify(self, set_obj_semantic, set_obj_all):
rc1 = set_obj_all.verify()
if user_prefs.check_frequency != "never":
rc2 = set_obj_semantic.semantic_check(set_obj_all)
else:
rc2 = 0
return rc1 and rc2 <= 1
def verify(self,cmd):
"usage: verify"
if not cib_factory.is_cib_sane():
return False
set_obj_all = mkset_obj("xml")
return self._verify(set_obj_all, set_obj_all)


> My only remaining concern is performance. Though the meta-data is
> cached, perhaps it will pay off to save the RAInfo instance with
> the element. But we can worry about that later.
> 

I can work on this as next step.

> Cheers,
> 
> Dejan
> 

- holger

diff -r a35d8d6d0ab1 shell/modules/cibconfig.py
--- a/shell/modules/cibconfig.py	Wed Mar 09 11:21:03 2011 +0100
+++ b/shell/modules/cibconfig.py	Wed Mar 09 19:53:50 2011 +0100
@@ -230,11 +230,68 @@
 See below for specific implementations.
 '''
 pass
-def semantic_check(self):
+
+def __check_unique_clash(self, set_obj_all):
+'Check whether resource parameters with attribute "unique" clash'
+
+def process_primitive(prim, clash_dict):
+'''
+Update dict clash_dict with
+(ra_class, ra_provider, ra_type, name, value) -> [ resourcename ]
+if parameter "name" should be unique
+'''
+ra_class = prim.getAttribute("class")
+ra_provider = prim.getAttribute("provider")
+ra_type = prim.getAttribute("type")
+ra_id = prim.getAttribute("id")
+
+ra = RAInfo(ra_class, ra_type, ra_provider)
+if ra == None:
+return
+ra_params = ra.params()
+
+attributes = prim.getElementsByTagName("instance_attributes")
+if len(attributes) == 0:
+return
+
+for p in attributes[0].getElementsByTagName("nvpair"):
+name = p.getAttribute("name")
+if ra_params[ name ]['unique'] == '1':
+value = p.getAttribute("value")
+k = (ra_class, ra_provider, ra_type, name, value)
+try:
+clash_dict[k].append(ra_id)
+except:
+clash_dict[k] = [ra_id]
+return
+
+# we check the whole CIB for clashes as a clash may originate between
+# an object already committed and a new one
+clash_dict = {}
+for obj in set_obj_all.obj_list:
+node = obj.node
+if is_primitive(node):
+process_primitive(node, clash_dict)
+
+# but we only warn if a 'new' object is involved 
+check_set = set([o.node.getAttribute("id") for o in self.obj_list if is_primitive(o.node)])
+
+rc = 0
+for param, resources in clash_dict.items():
+# at least one new object must be involved
+if len(resources) > 1 and len(set(resources) & check_set) > 0:
+rc = 2
+msg = 'Resources %s violate uniqueness for parameter "%s": "%s"' %\
+(",".join(sorted(resources

Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated

2011-03-10 Thread Holger Teutsch
Hi Dejan,
On Thu, 2011-03-10 at 10:14 +0100, Dejan Muhamedagic wrote:
> Hi Holger,
> 
> On Wed, Mar 09, 2011 at 07:58:02PM +0100, Holger Teutsch wrote:
> > Hi Dejan,
> > 
> > On Wed, 2011-03-09 at 14:00 +0100, Dejan Muhamedagic wrote:
> > > Hi Holger,
> > 

> > 
> > In order to show the intention of the arguments clearer:
> > 
> > Instead of
> > 
> > def _verify(self, set_obj_semantic, set_obj_all = None):
> > if not set_obj_all:
> > set_obj_all = set_obj_semantic
> > rc1 = set_obj_all.verify()
> > if user_prefs.check_frequency != "never":
> > rc2 = set_obj_semantic.semantic_check(set_obj_all)
> > else:
> > rc2 = 0
> > return rc1 and rc2 <= 1
> > def verify(self,cmd):
> > "usage: verify"
> > if not cib_factory.is_cib_sane():
> > return False
> > return self._verify(mkset_obj("xml"))
> > 
> > This way (always passing both args):
> > 
> > def _verify(self, set_obj_semantic, set_obj_all):
> > rc1 = set_obj_all.verify()
> > if user_prefs.check_frequency != "never":
> > rc2 = set_obj_semantic.semantic_check(set_obj_all)
> > else:
> > rc2 = 0
> > return rc1 and rc2 <= 1
> > def verify(self,cmd):
> > "usage: verify"
> > if not cib_factory.is_cib_sane():
> > return False
> > set_obj_all = mkset_obj("xml")
> > return self._verify(set_obj_all, set_obj_all)

See patch set_obj_all.diff

> 
> > > My only remaining concern is performance. Though the meta-data is
> > > cached, perhaps it will pay off to save the RAInfo instance with
> > > the element. But we can worry about that later.
> > > 
> > 
> > I can work on this as next step.
> 
> I'll do some testing on really big configurations and try to
> gauge the impact.

OK

> 
> The patch makes some regression tests blow:
> 
> +  File "/usr/lib64/python2.6/site-packages/crm/ui.py", line 1441, in verify
> +return self._verify(mkset_obj("xml"))
> +  File "/usr/lib64/python2.6/site-packages/crm/ui.py", line 1433, in _verify
> +rc2 = set_obj_semantic.semantic_check(set_obj_all)
> +  File "/usr/lib64/python2.6/site-packages/crm/cibconfig.py", line 294, in 
> semantic_check
> +rc = self.__check_unique_clash(set_obj_all)
> +  File "/usr/lib64/python2.6/site-packages/crm/cibconfig.py", line 274, in 
> __check_unique_clash
> +process_primitive(node, clash_dict)
> +  File "/usr/lib64/python2.6/site-packages/crm/cibconfig.py", line 259, in 
> process_primitive
> +if ra_params[ name ]['unique'] == '1':
> +KeyError: 'OCF_CHECK_LEVEL'
> 
> Can't recall why OCF_CHECK_LEVEL appears here. There must be some
> good explanation :)

The good explanation is: Not only params are in 
but OCF_CHECK_LEVEL as well within 

The latest version no longer blows the test -> semantic_check.diff

Regards
Holger
# HG changeset patch
# User Holger Teutsch 
# Date 1299775617 -3600
# Branch hot
# Node ID 30730ccc0aa09c3a476a18c6d95c680b3595
# Parent  9fa61ee6e35ef190f4126e163e9bfe6911e35541
Low: Shell: Rename variable set_obj_verify to set_obj_all as it always contains all objects
Simplify usage of this var in [_]verify, pass to CibObjectSet.semantic_check

diff -r 9fa61ee6e35e -r 30730ccc0aa0 shell/modules/cibconfig.py
--- a/shell/modules/cibconfig.py	Wed Mar 09 13:41:27 2011 +0100
+++ b/shell/modules/cibconfig.py	Thu Mar 10 17:46:57 2011 +0100
@@ -230,7 +230,7 @@
 See below for specific implementations.
 '''
 pass
-def semantic_check(self):
+def semantic_check(self, set_obj_all):
 '''
 Test objects for sanity. This is about semantics.
 '''
diff -r 9fa61ee6e35e -r 30730ccc0aa0 shell/modules/ui.py.in
--- a/shell/modules/ui.py.in	Wed Mar 09 13:41:27 2011 +0100
+++ b/shell/modules/ui.py.in	Thu Mar 10 17:46:57 2011 +0100
@@ -1425,12 +1425,10 @@
 set_obj = mkset_obj(*args)
 err_buf.release() # show them, but get an ack from the user
 return set_obj.edit()
-def _verify(self, set_obj_semantic, set_obj_verify = None):
-if not set_obj_verify:
-set_obj_verify = set_obj_semantic
-rc1 = set_obj_verify.verify()
+def _verify(self, set_obj_semantic, set_obj_all):
+rc1 = set_obj_all.verify()
 if user_prefs.check_frequency != "never":
-rc2 = set_obj_s

Re: [Pacemaker] Failing back a multi-state resource eg. DRBD

2011-03-11 Thread Holger Teutsch
On Mon, 2011-03-07 at 14:21 +0100, Dejan Muhamedagic wrote:
> Hi,
> 
> On Fri, Mar 04, 2011 at 09:12:46AM -0500, David McCurley wrote:
> > Are you wanting to move all the resources back or just that one resource?
> > 
> > I'm still learning, but one simple way I move all resources back from nodeb 
> > to nodea is like this:
> > 
> > # on nodeb
> > sudo crm node standby
> > # now services migrate to nodea
> > # still on nodeb
> > sudo crm node online
> > 
> > This may be a naive way to do it but it works for now :)
> 
> Yes, that would work. Though that would also make all other
> resources move from the standby node.
> 
> > There is also a "crm resource migrate" to migrate individual resources.  
> > For that, see here:
> 
> resource migrate has no option to move ms resources, i.e. to make
> another node the master.
> 
> What would work right now is to create a temporary location
> constraint:
> 
> location tmp1 ms-drbd0 \
> rule $id="tmp1-rule" $role="Master" inf: #uname eq nodea
> 
> Then, once the drbd got promoted on nodea, just remove the
> constraint:
> 
> crm configure delete tmp1
> 
> Obviously, we'd need to make some improvements here. "resource
> migrate" uses crm_resource to insert the location constraint,
> perhaps we should update it to also accept the role parameter.
> 
> Can you please make an enhancement bugzilla report so that this
> doesn't get lost.
> 
> Thanks,
> 
> Dejan

Hi Dejan,
it seems that the original author did not file the bug.
I entered it as

http://developerbugs.linux-foundation.org/show_bug.cgi?id=2567

Regards
Holger



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-03-18 Thread Holger Teutsch
Hi,
I would like to submit 2 patches of an initial implementation for
discussion.

Patch 1 implements migration of the Master role of an m/s resource to
another node in crm_resource
Patch 2 adds support for the shell.

crm_resource does this with options

"crm_resource --move --resource ms_test --master --node devel2"

The shell does the same with

"crm resource migrate ms_test:master devel2"

crm_resource insist on the options "--master --node xxx" if dealing with
m/s resources.

It is not easy to assess the expectations that a move command should
fulfill for something more complex than a group.

To recall:

crm_resource --move resource
creates a "standby" rule that moves the resource off the currently
active node

while

crm_resource --move resource --node newnode
creates a "prefer" rule that moves the resource to the new node.

When dealing with clones and masters the behavior was random as the code
only considers the node where the first instance of the clone was
started.

The new code behaves consistently for the master role of an m/s
resource. The options "--master" and "rsc:master" are somewhat redundant
as a "slave" move is not supported. Currently it's more an
acknowledgement of the user.

On the other hand it is desirable (and was requested several times on
the ML) to stop a single resource instance of a clone or master on a
specific node.

Should that be implemented by something like
 
"crm_resource --move-off --resource myresource --node devel2" ?

or should

crm_resource refuse to work on clones

and/or should moving the master role be the default for m/s resources
and the "--master" option discarded ?


Regards
Holger
# HG changeset patch
# User Holger Teutsch 
# Date 1300439791 -3600
# Branch mig
# Node ID dac1a4eae844f0bd857951b1154a171c80c25772
# Parent  b4f456380f60bd308acdc462215620f5bf530854
crm_resource.c: Add support for move of Master role of a m/s resource

diff -r b4f456380f60 -r dac1a4eae844 tools/crm_resource.c
--- a/tools/crm_resource.c	Thu Mar 17 09:41:25 2011 +0100
+++ b/tools/crm_resource.c	Fri Mar 18 10:16:31 2011 +0100
@@ -52,6 +52,7 @@
 const char *prop_id = NULL;
 const char *prop_set = NULL;
 char *move_lifetime = NULL;
+int move_master = 0;
 char rsc_cmd = 'L';
 char *our_pid = NULL;
 IPC_Channel *crmd_channel = NULL;
@@ -192,6 +193,32 @@
 return 0;
 }
 
+/* is m/s resource in master role on a host? */
+static int
+is_master_on(resource_t *rsc, const char *check_uname)
+{
+GListPtr lpc = NULL;
+
+if(rsc->variant > pe_native) {
+/* recursively call down */
+	GListPtr gIter = rsc->children;
+	for(; gIter != NULL; gIter = gIter->next) {
+	   if(is_master_on(gIter->data, check_uname))
+   return 1;
+}
+	return 0;
+}
+
+for(lpc = rsc->running_on; lpc != NULL; lpc = lpc->next) {
+	node_t *node = (node_t*)lpc->data;
+	if(rsc->variant == pe_native && rsc->role == RSC_ROLE_MASTER
+   && safe_str_eq(node->details->uname, check_uname)) {
+return 1;
+}
+}
+return 0;
+}
+
 #define cons_string(x) x?x:"NA"
 static void
 print_cts_constraints(pe_working_set_t *data_set) 
@@ -797,6 +824,7 @@
 static int
 move_resource(
 const char *rsc_id,
+int move_master,
 const char *existing_node, const char *preferred_node,
 cib_t *	cib_conn) 
 {
@@ -935,6 +963,10 @@
 	crm_xml_add(rule, XML_ATTR_ID, id);
 	crm_free(id);
 
+if(move_master) {
+crm_xml_add(rule, XML_RULE_ATTR_ROLE, "Master");
+}
+
 	crm_xml_add(rule, XML_RULE_ATTR_SCORE, INFINITY_S);
 	crm_xml_add(rule, XML_RULE_ATTR_BOOLEAN_OP, "and");
 	
@@ -1093,6 +1125,8 @@
 crm_free(prefix);
 }	
 
+/* out of single letter options */
+#define OPT_MASTER (256 + 'm')
 static struct crm_option long_options[] = {
 /* Top-level Options */
 {"help",0, 0, '?', "\t\tThis text"},
@@ -1120,10 +1154,10 @@
 {"get-property",1, 0, 'G', "Display the 'class', 'type' or 'provider' of a resource", 1},
 {"set-property",1, 0, 'S', "(Advanced) Set the class, type or provider of a resource", 1},
 {"move",0, 0, 'M',
- "\t\tMove a resource from its current location, optionally specifying a destination (-N) and/or a period for which it should take effect (-u)"
+ "\t\tMove a resource from its current location, optionally specifying a role (--master), a destination (-N) and/or a period for which it should take effect (-u)"
  "\n\t\t\t\tIf -N is not specified, the cluster will force the resource to move by creating a rule for the current location and a score of -INFINITY"
  "\n\t\t\t\tNOTE: This will pr

Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-03-19 Thread Holger Teutsch
Hi Dejan,

On Fri, 2011-03-18 at 14:24 +0100, Dejan Muhamedagic wrote:
> Hi,
> 
> On Fri, Mar 18, 2011 at 12:21:40PM +0100, Holger Teutsch wrote:
> > Hi,
> > I would like to submit 2 patches of an initial implementation for
> > discussion.
..
> > To recall:
> > 
> > crm_resource --move resource
> > creates a "standby" rule that moves the resource off the currently
> > active node
> > 
> > while
> > 
> > crm_resource --move resource --node newnode
> > creates a "prefer" rule that moves the resource to the new node.
> > 
> > When dealing with clones and masters the behavior was random as the code
> > only considers the node where the first instance of the clone was
> > started.
> > 
> > The new code behaves consistently for the master role of an m/s
> > resource. The options "--master" and "rsc:master" are somewhat redundant
> > as a "slave" move is not supported. Currently it's more an
> > acknowledgement of the user.
> > 
> > On the other hand it is desirable (and was requested several times on
> > the ML) to stop a single resource instance of a clone or master on a
> > specific node.
> > 
> > Should that be implemented by something like
> >  
> > "crm_resource --move-off --resource myresource --node devel2" ?
> > 
> > or should
> > 
> > crm_resource refuse to work on clones
> > 
> > and/or should moving the master role be the default for m/s resources
> > and the "--master" option discarded ?
> 
> I think that we also need to consider the case when clone-max is
> less than the number of nodes. If I understood correctly what you
> were saying. So, all of move slave and move master and move clone
> should be possible.
> 

I think the following use cases cover what can be done with such kind of
interface:

crm_resource --moveoff --resource myresource --node mynode
   -> all resource variants: check whether active on mynode, then create 
standby constraint

crm_resource --move --resource myresource
   -> primitive/group: convert to --moveoff --node `current_node`
   -> clone/master: refused

crm_resource --move --resource myresource --node mynode
  -> primitive/group: create prefer constraint
  -> clone/master: refused

crm_resource --move --resource myresource --master --node mynode
  -> master: create prefer constraint for master role
  -> others: refused

They should work (witch foreseeable outcome!) regardless of the setting of 
clone-max.

Regards
Holger


> Cheers,
> 
> Dejan
> 
> > Regards
> > Holger
> 



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-03-21 Thread Holger Teutsch
Hi Dejan,

On Mon, 2011-03-21 at 16:11 +0100, Dejan Muhamedagic wrote:
> Hi Holger,
> 
> On Sat, Mar 19, 2011 at 11:55:57AM +0100, Holger Teutsch wrote:
> > Hi Dejan,
> > 
> > On Fri, 2011-03-18 at 14:24 +0100, Dejan Muhamedagic wrote:
> > > Hi,
> > > 
> > > On Fri, Mar 18, 2011 at 12:21:40PM +0100, Holger Teutsch wrote:
> > > > Hi,
> > > > I would like to submit 2 patches of an initial implementation for
> > > > discussion.
> > ..
> > > > To recall:
> > > > 
> > > > crm_resource --move resource
> > > > creates a "standby" rule that moves the resource off the currently
> > > > active node
> > > > 
> > > > while
> > > > 
> > > > crm_resource --move resource --node newnode
> > > > creates a "prefer" rule that moves the resource to the new node.
> > > > 
> > > > When dealing with clones and masters the behavior was random as the code
> > > > only considers the node where the first instance of the clone was
> > > > started.
> > > > 
> > > > The new code behaves consistently for the master role of an m/s
> > > > resource. The options "--master" and "rsc:master" are somewhat redundant
> > > > as a "slave" move is not supported. Currently it's more an
> > > > acknowledgement of the user.
> > > > 
> > > > On the other hand it is desirable (and was requested several times on
> > > > the ML) to stop a single resource instance of a clone or master on a
> > > > specific node.
> > > > 
> > > > Should that be implemented by something like
> > > >  
> > > > "crm_resource --move-off --resource myresource --node devel2" ?
> > > > 
> > > > or should
> > > > 
> > > > crm_resource refuse to work on clones
> > > > 
> > > > and/or should moving the master role be the default for m/s resources
> > > > and the "--master" option discarded ?
> > > 
> > > I think that we also need to consider the case when clone-max is
> > > less than the number of nodes. If I understood correctly what you
> > > were saying. So, all of move slave and move master and move clone
> > > should be possible.
> > > 
> > 
> > I think the following use cases cover what can be done with such kind of
> > interface:
> > 
> > crm_resource --moveoff --resource myresource --node mynode
> >-> all resource variants: check whether active on mynode, then create 
> > standby constraint
> > 
> > crm_resource --move --resource myresource
> >-> primitive/group: convert to --moveoff --node `current_node`
> >-> clone/master: refused
> > 
> > crm_resource --move --resource myresource --node mynode
> >   -> primitive/group: create prefer constraint
> >   -> clone/master: refused
> > 
> > crm_resource --move --resource myresource --master --node mynode
> >   -> master: create prefer constraint for master role
> >   -> others: refused
> > 
> > They should work (witch foreseeable outcome!) regardless of the setting of 
> > clone-max.
> 
> This seems quite complicated to me. Took me a while to figure
> out what's what and where :) Why bother doing the thinking for

I'm afraid the matter *is* complicated. The current implementation of 

crm_resource --move --resource myResource

(without node name) is moving off the resource from the node it is
currently active on by creating a standby constraint. For clones and
masters there is no such *single* active node the constraint can be
constructed for.

Consider this use case:
I have 2 nodes and a clone or master and would like to safely get rid of
one instance on a particular node (e.g. with agents 1.0.5 the slave of a
DB2 HADR pair 8-) ). No idea how that should be done without a move-off
functionality. 

> users? The only case which seems to me worth considering is
> refusing setting role for non-ms resources. Otherwise, let's let
> the user move things around and enjoy the consequences.

Definitely not true for production clusters. The tools should produce
least surprise consequences.
  
> 
> Cheers,
> 

Over the weekend I implemented the above mentioned functionality. Drop
me note if you want to play with an early snapshot 8-)

Regards
Holger 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-04-04 Thread Holger Teutsch
On Mon, 2011-04-04 at 11:05 +0200, Andrew Beekhof wrote:
> On Sat, Mar 19, 2011 at 11:55 AM, Holger Teutsch  
> wrote:
> > Hi Dejan,
> >
> > On Fri, 2011-03-18 at 14:24 +0100, Dejan Muhamedagic wrote:
> >> Hi,
> >>
> >> On Fri, Mar 18, 2011 at 12:21:40PM +0100, Holger Teutsch wrote:
> >> > Hi,
> >> > I would like to submit 2 patches of an initial implementation for
> >> > discussion.
> > ..
> >> > To recall:
> >> >
> >> > crm_resource --move resource
> >> > creates a "standby" rule that moves the resource off the currently
> >> > active node
> >> >
> >> > while
> >> >
> >> > crm_resource --move resource --node newnode
> >> > creates a "prefer" rule that moves the resource to the new node.
> >> >
> >> > When dealing with clones and masters the behavior was random as the code
> >> > only considers the node where the first instance of the clone was
> >> > started.
> >> >
> >> > The new code behaves consistently for the master role of an m/s
> >> > resource. The options "--master" and "rsc:master" are somewhat redundant
> >> > as a "slave" move is not supported. Currently it's more an
> >> > acknowledgement of the user.
> >> >
> >> > On the other hand it is desirable (and was requested several times on
> >> > the ML) to stop a single resource instance of a clone or master on a
> >> > specific node.
> >> >
> >> > Should that be implemented by something like
> >> >
> >> > "crm_resource --move-off --resource myresource --node devel2" ?
> >> >
> >> > or should
> >> >
> >> > crm_resource refuse to work on clones
> >> >
> >> > and/or should moving the master role be the default for m/s resources
> >> > and the "--master" option discarded ?
> >>
> >> I think that we also need to consider the case when clone-max is
> >> less than the number of nodes. If I understood correctly what you
> >> were saying. So, all of move slave and move master and move clone
> >> should be possible.
> >>
> >
> > I think the following use cases cover what can be done with such kind of
> > interface:
> >
> > crm_resource --moveoff --resource myresource --node mynode
> >   -> all resource variants: check whether active on mynode, then create 
> > standby constraint
> >
> > crm_resource --move --resource myresource
> >   -> primitive/group: convert to --moveoff --node `current_node`
> >   -> clone/master: refused
> >
> > crm_resource --move --resource myresource --node mynode
> >  -> primitive/group: create prefer constraint
> >  -> clone/master: refused
> 
> Not sure this needs to be refused.

I see the problem that the node where the resource instance should be
moved off had to be specified as well to get predictable behavior. 

Consider a a 2 way clone on a 3 node cluster.
If the clone is active on A and B what should

crm_resource --move --resource myClone --node C

do ? This would require an additional --from-node or similar.

> Other than that the proposal looks sane.
> 
> My first thought was to make --move behave like --move-off if the
> resource is a clone or /ms, but since the semantics are the exact
> opposite, that might introduce introduce more problems than it solves.

That was my perception as well.

> 
> Does the original crm_resource patch implement this?

No, I will submit an updated version later this week.

- holger


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-04-04 Thread Holger Teutsch
On Mon, 2011-04-04 at 15:24 +0200, Andrew Beekhof wrote:
> On Mon, Apr 4, 2011 at 2:43 PM, Holger Teutsch  wrote:
> > On Mon, 2011-04-04 at 11:05 +0200, Andrew Beekhof wrote:
> >> On Sat, Mar 19, 2011 at 11:55 AM, Holger Teutsch  
> >> wrote:
> >> > Hi Dejan,
> >> >
> >> > On Fri, 2011-03-18 at 14:24 +0100, Dejan Muhamedagic wrote:
> >> >> Hi,
> >> >>
> >> >> On Fri, Mar 18, 2011 at 12:21:40PM +0100, Holger Teutsch wrote:
> >> >> > Hi,
> >> >> > I would like to submit 2 patches of an initial implementation for
> >> >> > discussion.
> >> > ..
> >> >> > To recall:
> >> >> >
> >> >> > crm_resource --move resource
> >> >> > creates a "standby" rule that moves the resource off the currently
> >> >> > active node
> >> >> >
> >> >> > while
> >> >> >
> >> >> > crm_resource --move resource --node newnode
> >> >> > creates a "prefer" rule that moves the resource to the new node.
> >> >> >
> >> >> > When dealing with clones and masters the behavior was random as the 
> >> >> > code
> >> >> > only considers the node where the first instance of the clone was
> >> >> > started.
> >> >> >
> >> >> > The new code behaves consistently for the master role of an m/s
> >> >> > resource. The options "--master" and "rsc:master" are somewhat 
> >> >> > redundant
> >> >> > as a "slave" move is not supported. Currently it's more an
> >> >> > acknowledgement of the user.
> >> >> >
> >> >> > On the other hand it is desirable (and was requested several times on
> >> >> > the ML) to stop a single resource instance of a clone or master on a
> >> >> > specific node.
> >> >> >
> >> >> > Should that be implemented by something like
> >> >> >
> >> >> > "crm_resource --move-off --resource myresource --node devel2" ?
> >> >> >
> >> >> > or should
> >> >> >
> >> >> > crm_resource refuse to work on clones
> >> >> >
> >> >> > and/or should moving the master role be the default for m/s resources
> >> >> > and the "--master" option discarded ?
> >> >>
> >> >> I think that we also need to consider the case when clone-max is
> >> >> less than the number of nodes. If I understood correctly what you
> >> >> were saying. So, all of move slave and move master and move clone
> >> >> should be possible.
> >> >>
> >> >
> >> > I think the following use cases cover what can be done with such kind of
> >> > interface:
> >> >
> >> > crm_resource --moveoff --resource myresource --node mynode
> >> >   -> all resource variants: check whether active on mynode, then create 
> >> > standby constraint
> >> >
> >> > crm_resource --move --resource myresource
> >> >   -> primitive/group: convert to --moveoff --node `current_node`
> >> >   -> clone/master: refused
> >> >
> >> > crm_resource --move --resource myresource --node mynode
> >> >  -> primitive/group: create prefer constraint
> >> >  -> clone/master: refused
> >>
> >> Not sure this needs to be refused.
> >
> > I see the problem that the node where the resource instance should be
> > moved off had to be specified as well to get predictable behavior.
> >
> > Consider a a 2 way clone on a 3 node cluster.
> > If the clone is active on A and B what should
> >
> > crm_resource --move --resource myClone --node C
> >
> > do ?
> 
> I would expect it to create the +inf constraint for C but no
> contraint(s) for the current location(s)

You are right. These are different and valid use cases.

crm_resource --move --resource myClone --node C
   -> I want an instance on C, regardless where it is moved off

crm_resource --move-off --resource myClone --node C
   -> I want the instance moved off C, regardless where it is moved on

I tried them out with a reimplementation of the patch on a 3 node
cluster with a resource with clone-max=2. The behavior appears logical
(at least 

Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-04-05 Thread Holger Teutsch
On Mon, 2011-04-04 at 21:31 +0200, Holger Teutsch wrote:
> On Mon, 2011-04-04 at 15:24 +0200, Andrew Beekhof wrote:
> > On Mon, Apr 4, 2011 at 2:43 PM, Holger Teutsch  
> > wrote:
> > > On Mon, 2011-04-04 at 11:05 +0200, Andrew Beekhof wrote:
> > >> On Sat, Mar 19, 2011 at 11:55 AM, Holger Teutsch  
> > >> wrote:
> > >> > Hi Dejan,
> > >> >
> > >> > On Fri, 2011-03-18 at 14:24 +0100, Dejan Muhamedagic wrote:
> > >> >> Hi,
> > >> >>
> > >> >> On Fri, Mar 18, 2011 at 12:21:40PM +0100, Holger Teutsch wrote:
> > >> >> > Hi,
> > >> >> > I would like to submit 2 patches of an initial implementation for
> > >> >> > discussion.
> > >> > ..
> > >> >> > To recall:
> > >> >> >
> > >> >> > crm_resource --move resource
> > >> >> > creates a "standby" rule that moves the resource off the currently
> > >> >> > active node
> > >> >> >
> > >> >> > while
> > >> >> >
> > >> >> > crm_resource --move resource --node newnode
> > >> >> > creates a "prefer" rule that moves the resource to the new node.
> > >> >> >
> > >> >> > When dealing with clones and masters the behavior was random as the 
> > >> >> > code
> > >> >> > only considers the node where the first instance of the clone was
> > >> >> > started.
> > >> >> >
> > >> >> > The new code behaves consistently for the master role of an m/s
> > >> >> > resource. The options "--master" and "rsc:master" are somewhat 
> > >> >> > redundant
> > >> >> > as a "slave" move is not supported. Currently it's more an
> > >> >> > acknowledgement of the user.
> > >> >> >
> > >> >> > On the other hand it is desirable (and was requested several times 
> > >> >> > on
> > >> >> > the ML) to stop a single resource instance of a clone or master on a
> > >> >> > specific node.
> > >> >> >
> > >> >> > Should that be implemented by something like
> > >> >> >
> > >> >> > "crm_resource --move-off --resource myresource --node devel2" ?
> > >> >> >
> > >> >> > or should
> > >> >> >
> > >> >> > crm_resource refuse to work on clones
> > >> >> >
> > >> >> > and/or should moving the master role be the default for m/s 
> > >> >> > resources
> > >> >> > and the "--master" option discarded ?
> > >> >>
> > >> >> I think that we also need to consider the case when clone-max is
> > >> >> less than the number of nodes. If I understood correctly what you
> > >> >> were saying. So, all of move slave and move master and move clone
> > >> >> should be possible.
> > >> >>
> > >> >
> > >> > I think the following use cases cover what can be done with such kind 
> > >> > of
> > >> > interface:
> > >> >
> > >> > crm_resource --moveoff --resource myresource --node mynode
> > >> >   -> all resource variants: check whether active on mynode, then 
> > >> > create standby constraint
> > >> >
> > >> > crm_resource --move --resource myresource
> > >> >   -> primitive/group: convert to --moveoff --node `current_node`
> > >> >   -> clone/master: refused
> > >> >
> > >> > crm_resource --move --resource myresource --node mynode
> > >> >  -> primitive/group: create prefer constraint
> > >> >  -> clone/master: refused
> > >>
> > >> Not sure this needs to be refused.
> > >
> > > I see the problem that the node where the resource instance should be
> > > moved off had to be specified as well to get predictable behavior.
> > >
> > > Consider a a 2 way clone on a 3 node cluster.
> > > If the clone is active on A and B what should
> > >
> > > crm_resource --move --resource myClone --node C
> > >
&

Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-04-05 Thread Holger Teutsch
Hi Dejan,

On Tue, 2011-04-05 at 11:48 +0200, Dejan Muhamedagic wrote:
> Hi Holger,
> 
> On Mon, Apr 04, 2011 at 09:31:02PM +0200, Holger Teutsch wrote:
> > On Mon, 2011-04-04 at 15:24 +0200, Andrew Beekhof wrote:
> [...]
> > 
> > crm_resource --move-off --resource myClone --node C
> >-> I want the instance moved off C, regardless where it is moved on
> 
> What is the difference between move-off and unmigrate (-U)?

--move-off -> create a constraint that a resource should *not* run on
the specific node (partly as before --move without --node)

-U: zap all migration constraints (as before) 

Regards
Holger
> 
> Cheers,
> 
> Dejan
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-04-05 Thread Holger Teutsch
Hi Dejan,

On Tue, 2011-04-05 at 12:27 +0200, Dejan Muhamedagic wrote:
> On Tue, Apr 05, 2011 at 12:10:48PM +0200, Holger Teutsch wrote:
> > Hi Dejan,
> > 
> > On Tue, 2011-04-05 at 11:48 +0200, Dejan Muhamedagic wrote:
> > > Hi Holger,
> > > 
> > > On Mon, Apr 04, 2011 at 09:31:02PM +0200, Holger Teutsch wrote:
> > > > On Mon, 2011-04-04 at 15:24 +0200, Andrew Beekhof wrote:
> > > [...]
> > > > 
> > > > crm_resource --move-off --resource myClone --node C
> > > >-> I want the instance moved off C, regardless where it is moved on
> > > 
> > > What is the difference between move-off and unmigrate (-U)?
> > 
> > --move-off -> create a constraint that a resource should *not* run on
> > the specific node (partly as before --move without --node)
> > 
> > -U: zap all migration constraints (as before) 
> 
> Ah, right, sorry, wanted to ask about the difference between
> move-off and move. The description looks the same as for move. Is
> it that in this case it is for clones so crm_resource needs an
> extra node parameter? You wrote in the doc:
> 
>   +Migrate a resource (-instance for clones/masters) off the specified 
> node.
> 
> The '-instance' looks somewhat funny. Why not say "Move/migrate a
> clone or master/slave instance away from the specified node"?

Moving away works for all kinds of resources so the text now looks like:

diff -r b4f456380f60 doc/crm_cli.txt
--- a/doc/crm_cli.txt   Thu Mar 17 09:41:25 2011 +0100
+++ b/doc/crm_cli.txt   Tue Apr 05 13:08:10 2011 +0200
@@ -818,10 +818,25 @@
 running on the current node. Additionally, you may specify a
 lifetime for the constraint---once it expires, the location
 constraint will no longer be active.
+For a master resource specify :master to move the master role.
 
 Usage:
 ...
-migrate  [] [] [force]
+migrate [:master] [] [] [force]
+...
+
+[[cmdhelp_resource_migrateoff,migrate a resource off the specified
node]]
+ `migrateoff` (`moveoff`)
+
+Migrate a resource away from the specified node. 
+The resource is migrated by creating a constraint which prevents it
from
+running on the specified node. Additionally, you may specify a
+lifetime for the constraint---once it expires, the location
+constraint will no longer be active.
+
+Usage:
+...
+migrateoff   [] [force]
 ...
 
 [[cmdhelp_resource_unmigrate,unmigrate a resource to another node]]

> 
> I must say that I still find all this quite confusing, i.e. now
> we have "move", "unmove", and "move-off", but it's probably just me :)

Think of "move" == "move-to" then it is simpler 8-)

... keeping in mind that for backward compatibility

crm_resource --move --resource myResource

is equivalent

crm_resource --move-off --resource myResource --node $(current node)

But as there is no "current node" for clones / masters the old
implementation did some random movements...

Regards
Holger

> 
> Cheers,
> 
> Dejan
> 



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-04-05 Thread Holger Teutsch
Hi Dejan,

On Tue, 2011-04-05 at 13:40 +0200, Dejan Muhamedagic wrote:
> Hi Holger,
> 
> On Tue, Apr 05, 2011 at 01:19:56PM +0200, Holger Teutsch wrote:
> > Hi Dejan,
> > 
> > On Tue, 2011-04-05 at 12:27 +0200, Dejan Muhamedagic wrote:
> > > On Tue, Apr 05, 2011 at 12:10:48PM +0200, Holger Teutsch wrote:
> > > > Hi Dejan,
> > > > 
> > > > On Tue, 2011-04-05 at 11:48 +0200, Dejan Muhamedagic wrote:
> > > > > Hi Holger,
> > > > > 
> > > > > On Mon, Apr 04, 2011 at 09:31:02PM +0200, Holger Teutsch wrote:
> > > > > > On Mon, 2011-04-04 at 15:24 +0200, Andrew Beekhof wrote:
> > > > > [...]
> > > > > > 
> > > > > > crm_resource --move-off --resource myClone --node C
> > > > > >-> I want the instance moved off C, regardless where it is moved 
> > > > > > on
> > > > > 
> > > > > What is the difference between move-off and unmigrate (-U)?
> > > > 
> > > > --move-off -> create a constraint that a resource should *not* run on
> > > > the specific node (partly as before --move without --node)
> > > > 
> > > > -U: zap all migration constraints (as before) 
> > > 
> > > Ah, right, sorry, wanted to ask about the difference between
> > > move-off and move. The description looks the same as for move. Is
> > > it that in this case it is for clones so crm_resource needs an
> > > extra node parameter? You wrote in the doc:
> > > 
> > >   +Migrate a resource (-instance for clones/masters) off the specified 
> > > node.
> > > 
> > > The '-instance' looks somewhat funny. Why not say "Move/migrate a
> > > clone or master/slave instance away from the specified node"?
> > 
> > Moving away works for all kinds of resources so the text now looks like:
> > 
> > diff -r b4f456380f60 doc/crm_cli.txt
> > --- a/doc/crm_cli.txt   Thu Mar 17 09:41:25 2011 +0100
> > +++ b/doc/crm_cli.txt   Tue Apr 05 13:08:10 2011 +0200
> > @@ -818,10 +818,25 @@
> >  running on the current node. Additionally, you may specify a
> >  lifetime for the constraint---once it expires, the location
> >  constraint will no longer be active.
> > +For a master resource specify :master to move the master role.
> >  
> >  Usage:
> >  ...
> > -migrate  [] [] [force]
> > +migrate [:master] [] [] [force]
> > +...
> > +
> > +[[cmdhelp_resource_migrateoff,migrate a resource off the specified
> > node]]
> > + `migrateoff` (`moveoff`)
> > +
> > +Migrate a resource away from the specified node. 
> > +The resource is migrated by creating a constraint which prevents it
> > from
> > +running on the specified node. Additionally, you may specify a
> > +lifetime for the constraint---once it expires, the location
> > +constraint will no longer be active.
> > +
> > +Usage:
> > +...
> > +migrateoff   [] [force]
> >  ...
> >  
> >  [[cmdhelp_resource_unmigrate,unmigrate a resource to another node]]
> > 
> > > 
> > > I must say that I still find all this quite confusing, i.e. now
> > > we have "move", "unmove", and "move-off", but it's probably just me :)
> > 
> > Think of "move" == "move-to" then it is simpler 8-)
> > 
> > ... keeping in mind that for backward compatibility
> > 
> > crm_resource --move --resource myResource
> > 
> > is equivalent
> > 
> > crm_resource --move-off --resource myResource --node $(current node)
> > 
> > But as there is no "current node" for clones / masters the old
> > implementation did some random movements...
> 
> OK. Thanks for the clarification. I'd like to revise my previous
> comment about restricting use of certain constructs. For
> instance, in this case, if the command would result in a random
> movement then the shell should at least issue a warning about it.
> Or perhaps refuse to do that completely. I didn't take a look yet
> at the code, perhaps you've already done that.
> 
> Thanks,
> 
> Dejan
> 
> 

I admit you have to specify more verbosely what you want to achieve but
then the patched versions (based on patches I submitted today around
10:01) execute consistent and without surprises - at least for my test
cases. 

Regards
Holger



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-04-06 Thread Holger Teutsch
On Wed, 2011-04-06 at 15:38 +0200, Dejan Muhamedagic wrote:
> On Wed, Apr 06, 2011 at 01:00:36PM +0200, Andrew Beekhof wrote:
> > On Tue, Apr 5, 2011 at 12:27 PM, Dejan Muhamedagic  
> > wrote:
> > > Ah, right, sorry, wanted to ask about the difference between
> > > move-off and move. The description looks the same as for move. Is
> > > it that in this case it is for clones so crm_resource needs an
> > > extra node parameter? You wrote in the doc:
> > >
> > >+Migrate a resource (-instance for clones/masters) off the 
> > > specified node.
> > >
> > > The '-instance' looks somewhat funny. Why not say "Move/migrate a
> > > clone or master/slave instance away from the specified node"?
> > >
> > > I must say that I still find all this quite confusing, i.e. now
> > > we have "move", "unmove", and "move-off", but it's probably just me :)
> > 
> > Not just you.  The problem is that we didn't fully understand all the
> > use case permutations at the time.
> > 
> > I think, not withstanding legacy computability, "move" should probably
> > be renamed to "move-to" and this new option be called "move-from".
> > That seems more obvious and syntactically consistent with the rest of
> > the system.
> 
> Yes, move-to and move-from seem more consistent than other
> options. The problem is that the old "move" is at times one and
> then at times another.
> 
> > In the absence of a host name, each uses the current location for the
> > named group/primitive resource and complains for clones.
> > 
> > The biggest question in my mind is what to call "unmove"...
> > "move-cleanup" perhaps?
> 
> move-remove? :D
> Actually, though the word is a bit awkward, unmove sounds fine
> to me.

I would vote for "move-cleanup". It's consistent to move-XXX and to my
(german) ears "unmove" seems to stand for the previous "move" being
undone and the stuff comes back.

BTW: Has someone already tried out the code or do you trust me 8-D ?

Stay tuned for updated patches...

- holger
> 
> Thanks,
> 
> Dejan
> 
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: 
> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-04-07 Thread Holger Teutsch
On Thu, 2011-04-07 at 08:57 +0200, Andrew Beekhof wrote:
> On Wed, Apr 6, 2011 at 5:48 PM, Holger Teutsch  wrote:
> > On Wed, 2011-04-06 at 15:38 +0200, Dejan Muhamedagic wrote:
> >> On Wed, Apr 06, 2011 at 01:00:36PM +0200, Andrew Beekhof wrote:
> >> > On Tue, Apr 5, 2011 at 12:27 PM, Dejan Muhamedagic  
> >> > wrote:
> >> > > Ah, right, sorry, wanted to ask about the difference between
> >> > > move-off and move. The description looks the same as for move. Is
> >> > > it that in this case it is for clones so crm_resource needs an
> >> > > extra node parameter? You wrote in the doc:
> >> > >
> >> > >+Migrate a resource (-instance for clones/masters) off the 
> >> > > specified node.
> >> > >
> >> > > The '-instance' looks somewhat funny. Why not say "Move/migrate a
> >> > > clone or master/slave instance away from the specified node"?
> >> > >
> >> > > I must say that I still find all this quite confusing, i.e. now
> >> > > we have "move", "unmove", and "move-off", but it's probably just me :)
> >> >
> >> > Not just you.  The problem is that we didn't fully understand all the
> >> > use case permutations at the time.
> >> >
> >> > I think, not withstanding legacy computability, "move" should probably
> >> > be renamed to "move-to" and this new option be called "move-from".
> >> > That seems more obvious and syntactically consistent with the rest of
> >> > the system.
> >>
> >> Yes, move-to and move-from seem more consistent than other
> >> options. The problem is that the old "move" is at times one and
> >> then at times another.
> >>
> >> > In the absence of a host name, each uses the current location for the
> >> > named group/primitive resource and complains for clones.
> >> >
> >> > The biggest question in my mind is what to call "unmove"...
> >> > "move-cleanup" perhaps?
> >>
> >> move-remove? :D
> >> Actually, though the word is a bit awkward, unmove sounds fine
> >> to me.
> >
> > I would vote for "move-cleanup". It's consistent to move-XXX and to my
> > (german) ears "unmove" seems to stand for the previous "move" being
> > undone and the stuff comes back.
> >
> > BTW: Has someone already tried out the code or do you trust me 8-D ?
> 
> I trust no-one - which is why we have regression tests :-)
> 
> >
> > Stay tuned for updated patches...

Now, after an additional discussion round I propose the following:
Please note that for consistency the "--node" argument is optional for 
"--move-from"

New syntax:
---

crm_resource --move-from --resource myresource --node mynode
   -> all resource variants: check whether active on mynode, then create 
standby constraint

crm_resource --move-from --resource myresource
   -> primitive/group: set --node `current_node`, then create standby constraint
   -> clone/master: refused

crm_resource --move-to --resource myresource --node mynode
  -> all resource variants: create prefer constraint

crm_resource --move-to --resource myresource --master --node mynode
  -> master: check whether active as slave on mynode, then create prefer 
constraint for master role
  -> others: refused

crm_resource --move-cleanup --resource myresource
  -> zap constraints

As we are already short on meaningful single letter options I vote for long 
options only.

Backwards Compatibility:


crm_resource {-M|--move} --resource myresource
  -> output deprecation warning
  -> treat as crm_resource --move-from --resource myresource

crm_resource {-M|--move} --resource myresource --node mynode
  -> output deprecation warning
  -> treat as crm_resource --move-to --resource myresource --node mynode

crm_resource {-U|--unmove} --resource myresource
  -> output deprecation warning
  -> treat as crm_resource --move-cleanup --resource myresource

For the shell:
Should we go for similar commands or keep "migrate-XXX"


Coming back to Dejan's proposal of "move-remove":

That can be implemented of reexecuting the last move (a remove).
Reimplemeting "unmove" as undo of the last move you have shortcuts for your 
favorite move operation

move
move-unmove -> back
move-remove -> and forth

Just kidding...
 

> >
> > - holger
> >>



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-04-08 Thread Holger Teutsch
On Thu, 2011-04-07 at 12:33 +0200, Dejan Muhamedagic wrote:
> > New syntax:
> > ---
> > 
> > crm_resource --move-from --resource myresource --node mynode
> >-> all resource variants: check whether active on mynode, then create 
> > standby constraint
> > 
> > crm_resource --move-from --resource myresource
> >-> primitive/group: set --node `current_node`, then create standby 
> > constraint
> >-> clone/master: refused
> > 
> > crm_resource --move-to --resource myresource --node mynode
> >   -> all resource variants: create prefer constraint
> > 
> > crm_resource --move-to --resource myresource --master --node mynode
> >   -> master: check whether active as slave on mynode, then create prefer 
> > constraint for master role
> >   -> others: refused
> > 
> > crm_resource --move-cleanup --resource myresource
> >   -> zap constraints
> > 
> > As we are already short on meaningful single letter options I vote for long 
> > options only.
> > 
> > Backwards Compatibility:
> > 
> > 
> > crm_resource {-M|--move} --resource myresource
> >   -> output deprecation warning
> >   -> treat as crm_resource --move-from --resource myresource
> > 
> > crm_resource {-M|--move} --resource myresource --node mynode
> >   -> output deprecation warning
> >   -> treat as crm_resource --move-to --resource myresource --node mynode
> > 
> > crm_resource {-U|--unmove} --resource myresource
> >   -> output deprecation warning
> >   -> treat as crm_resource --move-cleanup --resource myresource
> 
> All looks fine to me.
> 
> > For the shell:
> > Should we go for similar commands or keep "migrate-XXX"
> 
> migrate is a bit of a misnomer, could be confused with the
> migrate operation. I'd vote to leave old migrate/unmigrate
> as deprecated and introduce just move-from/to/cleanup variants.
> 

Deajn & Andrew,
please find attached the patches that implement these commands for
review. The require the patch
 
Low: lib/common/utils.c: Don't try to print unprintable option values in 
crm_help

that I send separately because it is not directly related to the
movement stuff.

I think that the preceding discussions were very valuable to fully
understand issues and implications and I'm confident that the new
command set is consistent and behaves with predictable outcome.

Regards
Holger


diff -r b4f456380f60 doc/crm_cli.txt
--- a/doc/crm_cli.txt	Thu Mar 17 09:41:25 2011 +0100
+++ b/doc/crm_cli.txt	Fri Apr 08 14:23:59 2011 +0200
@@ -810,28 +810,44 @@
 unmanage 
 ...
 
-[[cmdhelp_resource_migrate,migrate a resource to another node]]
- `migrate` (`move`)
-
-Migrate a resource to a different node. If node is left out, the
-resource is migrated by creating a constraint which prevents it from
-running on the current node. Additionally, you may specify a
+[[cmdhelp_resource_move-to,move a resource to another node]]
+ `move-to`
+
+Move a resource to a different node. The resource is moved by creating
+a constraint which forces it to run on the specified node.
+Additionally, you may specify a lifetime for the constraint---once it
+expires, the location constraint will no longer be active.
+For a master resource specify :master to move the master role.
+
+Usage:
+...
+move-to [:master]  [] [force]
+...
+
+[[cmdhelp_resource_move-from,move a resource away from the specified node]]
+ `move-from`
+
+Move a resource away from the specified node. 
+If node is left out, the the node where the resource is currently active
+is used.
+The resource is moved by creating a constraint which prevents it from
+running on the specified node. Additionally, you may specify a
 lifetime for the constraint---once it expires, the location
 constraint will no longer be active.
 
 Usage:
 ...
-migrate  [] [] [force]
+move-from  [] [] [force]
 ...
 
-[[cmdhelp_resource_unmigrate,unmigrate a resource to another node]]
- `unmigrate` (`unmove`)
-
-Remove the constraint generated by the previous migrate command.
+[[cmdhelp_resource_move-cleanup,Cleanup previously created move constraint]]
+ `move-cleanup`
+
+Remove the constraint generated by the previous move-to/move-from command.
 
 Usage:
 ...
-unmigrate 
+move-cleanup 
 ...
 
 [[cmdhelp_resource_param,manage a parameter of a resource]]
diff -r b4f456380f60 tools/crm_resource.c
--- a/tools/crm_resource.c	Thu Mar 17 09:41:25 2011 +0100
+++ b/tools/crm_resource.c	Fri Apr 08 15:02:39 2011 +0200
@@ -52,7 +52,8 @@
 const char *prop_id = NULL;
 const char *prop_set = NULL;
 char *move_lifetime = NULL;
-char rsc_cmd = 'L';
+int move_master = 0;
+int rsc_cmd = 'L';
 char *our_pid = NULL;
 IPC_Channel *crmd_channel = NULL;
 char *xml_file = NULL;
@@ -192,6 +193,33 @@
 return 0;
 }
 
+/* return role of resource on node */
+static int
+role_on_node(resource_t *rsc, const char *node_uname)
+{
+GListPtr lpc = NULL;
+
+if(rsc->variant > pe_nat

[Pacemaker] [PATCH]Low: lib/common/utils.c: Don't try to print unprintable option values in crm_help

2011-04-08 Thread Holger Teutsch
Hi,
during work on the move-XXX stuff I discovered this.
Regards
Holger

# HG changeset patch
# User Holger Teutsch 
# Date 1302259903 -7200
# Branch mig
# Node ID caed31174dc966450a31da048b640201980870a8
# Parent  9451c288259b7b9fd6f32f5df01d47569e570c58
Low: lib/common/utils.c: Don't try to print unprintable option values in crm_help

diff -r 9451c288259b -r caed31174dc9 lib/common/utils.c
--- a/lib/common/utils.c	Tue Apr 05 13:24:21 2011 +0200
+++ b/lib/common/utils.c	Fri Apr 08 12:51:43 2011 +0200
@@ -2281,7 +2281,13 @@
 		fprintf(stream, "%s\n", crm_long_options[i].desc);
 		
 	} else {
-		fprintf(stream, " -%c, --%s%c%s\t%s\n", crm_long_options[i].val, crm_long_options[i].name,
+/* is val printable as char ? */
+if(crm_long_options[i].val <= UCHAR_MAX) {
+fprintf(stream, " -%c,", crm_long_options[i].val);
+} else {
+fputs("", stream);
+}
+		fprintf(stream, " --%s%c%s\t%s\n", crm_long_options[i].name,
 			crm_long_options[i].has_arg?'=':' ',crm_long_options[i].has_arg?"value":"",
 			crm_long_options[i].desc?crm_long_options[i].desc:"");
 	}
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-04-24 Thread Holger Teutsch
On Mon, 2011-04-11 at 20:50 +0200, Andrew Beekhof wrote:
> why?
> CMD_ERR("Resource %s not moved:"
> " specifying --master is not supported for
> --move-from\n", rsc_id);
> 
it did not look sensible to me but I can't recall the exact reasons 8-)
It's now implemented.
> also the legacy handling is a little off - do a make install and run
> tools/regression.sh and you'll see what i mean.

Remaining diffs seem to be not related to my changes.

> other than that the crm_resource part looks pretty good.
> can you add some regression testcases in tools/ too please?
> 
Will add them once the code is in the repo.

Latest diffs are attached.

-holger

diff -r b4f456380f60 shell/modules/ui.py.in
--- a/shell/modules/ui.py.in	Thu Mar 17 09:41:25 2011 +0100
+++ b/shell/modules/ui.py.in	Sun Apr 24 16:18:59 2011 +0200
@@ -738,8 +738,9 @@
 rsc_status = "crm_resource -W -r '%s'"
 rsc_showxml = "crm_resource -q -r '%s'"
 rsc_setrole = "crm_resource --meta -r '%s' -p target-role -v '%s'"
-rsc_migrate = "crm_resource -M -r '%s' %s"
-rsc_unmigrate = "crm_resource -U -r '%s'"
+rsc_move_to = "crm_resource --move-to -r '%s' %s"
+rsc_move_from = "crm_resource --move-from -r '%s' %s"
+rsc_move_cleanup = "crm_resource --move-cleanup -r '%s'"
 rsc_cleanup = "crm_resource -C -r '%s' -H '%s'"
 rsc_cleanup_all = "crm_resource -C -r '%s'"
 rsc_param =  {
@@ -776,8 +777,12 @@
 self.cmd_table["demote"] = (self.demote,(1,1),0)
 self.cmd_table["manage"] = (self.manage,(1,1),0)
 self.cmd_table["unmanage"] = (self.unmanage,(1,1),0)
+# the next two commands are deprecated
 self.cmd_table["migrate"] = (self.migrate,(1,4),0)
 self.cmd_table["unmigrate"] = (self.unmigrate,(1,1),0)
+self.cmd_table["move-to"] = (self.move_to,(2,4),0)
+self.cmd_table["move-from"] = (self.move_from,(1,4),0)
+self.cmd_table["move-cleanup"] = (self.move_cleanup,(1,1),0)
 self.cmd_table["param"] = (self.param,(3,4),1)
 self.cmd_table["meta"] = (self.meta,(3,4),1)
 self.cmd_table["utilization"] = (self.utilization,(3,4),1)
@@ -846,9 +851,67 @@
 if not is_name_sane(rsc):
 return False
 return set_deep_meta_attr("is-managed","false",rsc)
+def move_to(self,cmd,*args):
+"""usage: move-to [:master]  [] [force]"""
+elem = args[0].split(':')
+rsc = elem[0]
+master = False
+if len(elem) > 1:
+master = elem[1]
+if master != "master":
+common_error("%s is invalid, specify 'master'" % master)
+return False
+master = True
+if not is_name_sane(rsc):
+return False
+node = args[1]
+lifetime = None
+force = False
+if len(args) == 3:
+if args[2] == "force":
+force = True
+else:
+lifetime = args[2]
+elif len(args) == 4:
+if args[2] == "force":
+force = True
+lifetime = args[3]
+elif args[3] == "force":
+force = True
+lifetime = args[2]
+else:
+syntax_err((cmd,force))
+return False
+
+opts = ''
+if node:
+opts = "--node='%s'" % node
+if lifetime:
+opts = "%s --lifetime='%s'" % (opts,lifetime)
+if force or user_prefs.get_force():
+opts = "%s --force" % opts
+if master:
+opts = "%s --master" % opts
+return ext_cmd(self.rsc_move_to % (rsc,opts)) == 0
+
 def migrate(self,cmd,*args):
-"""usage: migrate  [] [] [force]"""
-rsc = args[0]
+"""Deprecated: migrate  [] [] [force]"""
+common_warning("migrate is deprecated, use move-to or move-from")
+if len(args) >= 2 and args[1] in listnodes():
+return self.move_to(cmd, *args)
+return self.move_from(cmd, *args)
+
+def move_from(self,cmd,*args):
+"""usage: move-from [:master] [] [] [force]"""
+elem = args[0].split(':')
+rsc = elem[0]
+master = False
+if len(elem) > 1:
+master = elem[1]
+if master != "master":
+common_error("%s is invalid, specify 'master'" % master)
+return False
+master = True
 if not is_name_sane(rsc):
 return False
 node = None
@@ -888,12 +951,18 @@
 opts = "%s --lifetime='%s'" % (opts,lifetime)
 if force or user_prefs.get_force():
 opts = "%s --force" % opts
-return ext_cmd(self.rsc_migrate % (rsc,opts)) == 0
-def unmigrate(self,cmd,rsc):
-"usage: unmigrate "
+if master:
+opts = "%s --master" % opts
+return ext_cmd(self.rsc_move_from % (rsc,opts)) == 0
+def move_cleanup(self,cmd,rsc):
+"usage: move_cleanup 

Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-04-28 Thread Holger Teutsch
Hi Dejan,

On Tue, 2011-04-26 at 17:58 +0200, Dejan Muhamedagic wrote:
> Hi Holger,
> 
> On Sun, Apr 24, 2011 at 04:31:33PM +0200, Holger Teutsch wrote:
> > On Mon, 2011-04-11 at 20:50 +0200, Andrew Beekhof wrote:
> > > why?
> > > CMD_ERR("Resource %s not moved:"
> > > " specifying --master is not supported for
> > > --move-from\n", rsc_id);
> > > 
> > it did not look sensible to me but I can't recall the exact reasons 8-)
> > It's now implemented.
> > > also the legacy handling is a little off - do a make install and run
> > > tools/regression.sh and you'll see what i mean.
> > 
> > Remaining diffs seem to be not related to my changes.
> > 
> > > other than that the crm_resource part looks pretty good.
> > > can you add some regression testcases in tools/ too please?
> > > 
> > Will add them once the code is in the repo.
> > 
> > Latest diffs are attached.
> 
> The diffs seem to be against the 1.1 code, but this should go
> into the devel repository. Can you please rebase the patches
> against the devel code.
> 
Unfortunately the devel code does not run at all in my environment so I
have to fix this first.

- holger
> Cheers,
> 
> Dejan
> 
> > -holger
> > 
> 



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-05-05 Thread Holger Teutsch
On Fri, 2011-04-29 at 09:41 +0200, Andrew Beekhof wrote:
> >>
> > Unfortunately the devel code does not run at all in my environment so I
> > have to fix this first.
> 
> Oh?  I ran CTS on it the other day and it was fine here.
> 

I installed pacemaker-devel on top of a compilation of pacemaker-1.1. In
addition I tried "make uninstall" for both versions and then again
"make install" for devel. Pacemaker does not come up, crm_mon shows
nodes as offline.

I suspect reason is
May  5 17:09:34 devel1 crmd: [5942]: notice: crmd_peer_update: Status update: 
Client devel1/crmd now has status [online] (DC=)
May  5 17:09:34 devel1 crmd: [5942]: info: crm_update_peer: Node devel1: 
id=1790093504 state=unknown addr=(null) votes=0 born=0 seen=0 
proc=00111312 (new)
May  5 17:09:34 devel1 crmd: [5942]: info: pcmk_quorum_notification: Membership 
0: quorum retained (0)
May  5 17:09:34 devel1 crmd: [5942]: debug: do_fsa_action: actions:trace: 
#011// A_STARTED
May  5 17:09:34 devel1 crmd: [5942]: info: do_started: Delaying start, no 
membership data (0010)
   
^
May  5 17:09:34 devel1 crmd: [5942]: debug: register_fsa_input_adv: Stalling 
the FSA pending further input: cause=C_FSA_INTERNAL

Any ideas ?
-holger




May  5 17:09:33 devel1 pacemakerd: [5929]: info: Invoked: pacemakerd 
May  5 17:09:33 devel1 pacemakerd: [5929]: info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/root
May  5 17:09:33 devel1 pacemakerd: [5929]: info: get_cluster_type: Cluster type is: 'corosync'
May  5 17:09:33 devel1 pacemakerd: [5929]: info: read_config: Reading configure for stack: corosync
May  5 17:09:33 devel1 corosync[2101]:  [CONFDB] lib_init_fn: conn=0x6872f0
May  5 17:09:33 devel1 pacemakerd: [5929]: info: config_find_next: Processing additional logging options...
May  5 17:09:33 devel1 pacemakerd: [5929]: info: get_config_opt: Found 'on' for option: debug
May  5 17:09:33 devel1 pacemakerd: [5929]: info: get_config_opt: Found 'no' for option: to_logfile
May  5 17:09:33 devel1 pacemakerd: [5929]: info: get_config_opt: Found 'yes' for option: to_syslog
May  5 17:09:33 devel1 pacemakerd: [5929]: info: get_config_opt: Found 'daemon' for option: syslog_facility
May  5 17:09:33 devel1 corosync[2101]:  [CONFDB] exit_fn for conn=0x6872f0
May  5 17:09:33 devel1 pacemakerd: [5931]: info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/root
May  5 17:09:33 devel1 pacemakerd: [5931]: info: main: Starting Pacemaker 1.1.5 (Build: unknown):  ncurses corosync-quorum corosync
May  5 17:09:33 devel1 pacemakerd: [5931]: info: main: Maximum core file size is: 18446744073709551615
May  5 17:09:33 devel1 pacemakerd: [5931]: debug: cluster_connect_cfg: Our nodeid: 1790093504
May  5 17:09:33 devel1 pacemakerd: [5931]: debug: cluster_connect_cfg: Adding fd=5 to mainloop
May  5 17:09:33 devel1 corosync[2101]:  [CPG   ] lib_init_fn: conn=0x68bfc0, cpd=0x6903f0
May  5 17:09:33 devel1 pacemakerd: [5931]: debug: cluster_connect_cpg: Our nodeid: 1790093504
May  5 17:09:33 devel1 pacemakerd: [5931]: debug: cluster_connect_cpg: Adding fd=6 to mainloop
May  5 17:09:33 devel1 pacemakerd: [5931]: info: update_node_processes: 0x60d920 Node 1790093504 now known as devel1 (was: (null))
May  5 17:09:33 devel1 pacemakerd: [5931]: info: update_node_processes: Node devel1 now has process list: 0002 (was )
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] mcasted message added to pending queue
May  5 17:09:33 devel1 corosync[2101]:  [CPG   ] got mcast request on 0x68bfc0
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] Delivering 24 to 25
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] Delivering MCAST message with seq 25 to pending delivery queue
May  5 17:09:33 devel1 corosync[2101]:  [CPG   ] got procjoin message from cluster node 1790093504
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] Received ringid(192.168.178.106:332) seq 25
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] Received ringid(192.168.178.106:332) seq 25
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] mcasted message added to pending queue
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] Delivering 25 to 26
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] Delivering MCAST message with seq 26 to pending delivery queue
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] Received ringid(192.168.178.106:332) seq 26
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] Received ringid(192.168.178.106:332) seq 26
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] releasing messages up to and including 25
May  5 17:09:33 devel1 pacemakerd: [5931]: info: start_child: Forked child 5935 for process stonith-ng
May  5 17:09:33 devel1 pacemakerd: [5931]: info: update_node_processes: Node devel1 now has process list: 0012 (was 00

Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-05-06 Thread Holger Teutsch
On Fri, 2011-05-06 at 11:03 +0200, Andrew Beekhof wrote:
> On Fri, May 6, 2011 at 9:53 AM, Andrew Beekhof  wrote:
> > On Thu, May 5, 2011 at 5:43 PM, Holger Teutsch  
> > wrote:
> >> On Fri, 2011-04-29 at 09:41 +0200, Andrew Beekhof wrote:
> >>> >>
> >>> > Unfortunately the devel code does not run at all in my environment so I
> >>> > have to fix this first.
> >>>
> >>> Oh?  I ran CTS on it the other day and it was fine here.
> >>>
> >>
> >> I installed pacemaker-devel on top of a compilation of pacemaker-1.1. In
> >> addition I tried "make uninstall" for both versions and then again
> >> "make install" for devel. Pacemaker does not come up, crm_mon shows
> >> nodes as offline.
> >>
> >> I suspect reason is
> >> May  5 17:09:34 devel1 crmd: [5942]: notice: crmd_peer_update: Status 
> >> update: Client devel1/crmd now has status [online] (DC=)
> >> May  5 17:09:34 devel1 crmd: [5942]: info: crm_update_peer: Node devel1: 
> >> id=1790093504 state=unknown addr=(null) votes=0 born=0 seen=0 
> >> proc=00111312 (new)
> >> May  5 17:09:34 devel1 crmd: [5942]: info: pcmk_quorum_notification: 
> >> Membership 0: quorum retained (0)
> >> May  5 17:09:34 devel1 crmd: [5942]: debug: do_fsa_action: actions:trace: 
> >> #011// A_STARTED
> >> May  5 17:09:34 devel1 crmd: [5942]: info: do_started: Delaying start, no 
> >> membership data (0010)
> >>   
> >> ^
> >> May  5 17:09:34 devel1 crmd: [5942]: debug: register_fsa_input_adv: 
> >> Stalling the FSA pending further input: cause=C_FSA_INTERNAL
> >>
> >> Any ideas ?
> >
> > Hg version?  Corosync config?
> > I'm running -devel here right now and things are fine.
> 
> Uh, I think I see now.
> Try http://hg.clusterlabs.org/pacemaker/1.1/rev/b94ce5673ce4
> 

Page not found.



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-05-09 Thread Holger Teutsch
On Fri, 2011-05-06 at 16:15 +0200, Andrew Beekhof wrote:
> On Fri, May 6, 2011 at 12:28 PM, Holger Teutsch  wrote:
> > On Fri, 2011-05-06 at 11:03 +0200, Andrew Beekhof wrote:
> >> On Fri, May 6, 2011 at 9:53 AM, Andrew Beekhof  wrote:
> >> > On Thu, May 5, 2011 at 5:43 PM, Holger Teutsch  
> >> > wrote:
> >> >> On Fri, 2011-04-29 at 09:41 +0200, Andrew Beekhof wrote:
> >> >>> >>
> >> >>> > Unfortunately the devel code does not run at all in my environment 
> >> >>> > so I
> >> >>> > have to fix this first.
> >> >>>
> >> >>> Oh?  I ran CTS on it the other day and it was fine here.
> >> >>>
> >> >>
> >> >> I installed pacemaker-devel on top of a compilation of pacemaker-1.1. In
> >> >> addition I tried "make uninstall" for both versions and then again
> >> >> "make install" for devel. Pacemaker does not come up, crm_mon shows
> >> >> nodes as offline.
> >> >>
> >> >> I suspect reason is
> >> >> May  5 17:09:34 devel1 crmd: [5942]: notice: crmd_peer_update: Status 
> >> >> update: Client devel1/crmd now has status [online] (DC=)
> >> >> May  5 17:09:34 devel1 crmd: [5942]: info: crm_update_peer: Node 
> >> >> devel1: id=1790093504 state=unknown addr=(null) votes=0 born=0 seen=0 
> >> >> proc=00111312 (new)
> >> >> May  5 17:09:34 devel1 crmd: [5942]: info: pcmk_quorum_notification: 
> >> >> Membership 0: quorum retained (0)
> >> >> May  5 17:09:34 devel1 crmd: [5942]: debug: do_fsa_action: 
> >> >> actions:trace: #011// A_STARTED
> >> >> May  5 17:09:34 devel1 crmd: [5942]: info: do_started: Delaying start, 
> >> >> no membership data (0010)
> >> >>   
> >> >> ^
> >> >> May  5 17:09:34 devel1 crmd: [5942]: debug: register_fsa_input_adv: 
> >> >> Stalling the FSA pending further input: cause=C_FSA_INTERNAL
> >> >>
> >> >> Any ideas ?
> >> >
> >> > Hg version?  Corosync config?
> >> > I'm running -devel here right now and things are fine.
> >>
> >> Uh, I think I see now.
> >> Try http://hg.clusterlabs.org/pacemaker/1.1/rev/b94ce5673ce4
> >>
> 
> Yeah, I realized afterwards that it was specific to devel.
> What does your corosync config look like?
I run corosync-1.3.0-3.1.x86_64.
It's exactly the same config that worked with
pacemaker 1.1 rev 10608:b4f456380f60



# Please read the corosync.conf.5 manual page
compatibility: whitetank

aisexec {
# Run as root - this is necessary to be able to manage
# resources with Pacemaker
user:   root
group:  root
}

service {
# Load the Pacemaker Cluster Resource Manager
ver:1
name:   pacemaker
use_mgmtd:  yes
use_logd:   yes
}

totem {
# The only valid version is 2
version:2

# How long before declaring a token lost (ms)
token:  5000

# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 10

# How long to wait for join messages in the membership protocol (ms)
join:   60

# How long to wait for consensus to be achieved before starting
# a new round of membership configuration (ms)
consensus:  6000

# Turn off the virtual synchrony filter
vsftype:none

# Number of messages that may be sent by one processor on
# receipt of the token
max_messages:   20

# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes

# Disable encryption
secauth:off

# How many threads to use for encryption/decryption
threads:0

# Optionally assign a fixed node id (integer)
# nodeid:   1234

rrp_mode:   active

interface {
ringnumber: 0

# The following values need to be set based on your environment
bindnetaddr:192.168.178.0
mcastaddr:  226.94.40.1
mcastport:  5409
}

interface {
ringnumber: 1

# The following values need to be set based on your environment
bindnetaddr:10.1.1.0
   

Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-05-09 Thread Holger Teutsch
I had 1.1 but Dejan asked my to rebase my patches on devel.

So long story short: devel now works after upgrading to the rev you
mentioned and I got back to working on my patches.

Thanx
Holger

On Mon, 2011-05-09 at 10:58 +0200, Andrew Beekhof wrote:
> I thought you said you were running 1.1?
> 
> May  5 17:09:33 devel1 pacemakerd: [5929]: info: read_config: Reading
> configure for stack: corosync
> 
> This message is specific to the devel branch.
> 
> Update to get the following fix and you should be fine:
> http://hg.clusterlabs.org/pacemaker/devel/rev/84ef5401322f
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-05-09 Thread Holger Teutsch
On Wed, 2011-04-27 at 13:25 +0200, Andrew Beekhof wrote:
> On Sun, Apr 24, 2011 at 4:31 PM, Holger Teutsch  wrote:
...
> > Remaining diffs seem to be not related to my changes.
> 
> Unlikely I'm afraid.  We run the regression tests after every commit
> and complain loudly if they fail.
> What is the regression test output?

That's the output of tools/regression.sh of pacemaker-devel *without* my
patches:
Version: parent: 10731:bf7b957f4cbe tip

see attachment
-holger


Using local binaries from: .
* Passed: cibadmin   - Require --force for CIB erasure
* Passed: cibadmin   - Allow CIB erasure with --force
* Passed: cibadmin   - Query CIB
* Passed: crm_attribute  - Set cluster option
* Passed: cibadmin   - Query new cluster option
* Passed: cibadmin   - Query cluster options
* Passed: cibadmin   - Delete nvpair
* Passed: cibadmin   - Create operaton should fail with: -21, The object 
already exists
* Passed: cibadmin   - Modify cluster options section
* Passed: cibadmin   - Query updated cluster option
* Passed: crm_attribute  - Set duplicate cluster option
* Passed: crm_attribute  - Setting multiply defined cluster option should fail 
with -216, Could not set cluster option
* Passed: crm_attribute  - Set cluster option with -s
* Passed: crm_attribute  - Delete cluster option with -i
* Passed: cibadmin   - Create node entry
* Passed: cibadmin   - Create node status entry
* Passed: crm_attribute  - Create node attribute
* Passed: cibadmin   - Query new node attribute
* Passed: cibadmin   - Digest calculation
* Passed: cibadmin   - Replace operation should fail with: -45, Update was 
older than existing configuration
* Passed: crm_standby- Default standby value
* Passed: crm_standby- Set standby status
* Passed: crm_standby- Query standby value
* Passed: crm_standby- Delete standby value
* Passed: cibadmin   - Create a resource
* Passed: crm_resource   - Create a resource meta attribute
* Passed: crm_resource   - Query a resource meta attribute
* Passed: crm_resource   - Remove a resource meta attribute
* Passed: crm_resource   - Create a resource attribute
* Passed: crm_resource   - List the configured resources
* Passed: crm_resource   - Set a resource's fail-count
* Passed: crm_resource   - Require a destination when migrating a resource that 
is stopped
* Passed: crm_resource   - Don't support migration to non-existant locations
* Passed: crm_resource   - Migrate a resource
* Passed: crm_resource   - Un-migrate a resource
--- ./regression.exp2011-05-09 20:26:27.669381187 +0200
+++ ./regression.out2011-05-09 20:38:27.112098949 +0200
@@ -616,7 +616,7 @@
   
 
 * Passed: crm_resource   - List the configured resources
-
+
   
 
   
@@ -642,19 +642,13 @@
 
   
   
-
-  
-
-  
-
-  
-
+
   
 
 * Passed: crm_resource   - Set a resource's fail-count
 Resource dummy not moved: not-active and no preferred location specified.
 Error performing operation: cib object missing
-
+
   
 
   
@@ -680,19 +674,13 @@
 
   
   
-
-  
-
-  
-
-  
-
+
   
 
 * Passed: crm_resource   - Require a destination when migrating a resource 
that is stopped
 Error performing operation: i.dont.exist is not a known node
 Error performing operation: The object/attribute does not exist
-
+
   
 
   
@@ -718,13 +706,7 @@
 
   
   
-
-  
-
-  
-
-  
-
+
   
 
 * Passed: crm_resource   - Don't support migration to non-existant locations
@@ -760,13 +742,7 @@
 
   
   
-
-  
-
-  
-
-  
-
+
   
 
 * Passed: crm_resource   - Migrate a resource
@@ -796,13 +772,7 @@
 
   
   
-
-  
-
-  
-
-  
-
+
   
 
 * Passed: crm_resource   - Un-migrate a resource
Tests passed but diff failed
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-05-10 Thread Holger Teutsch
On Tue, 2011-05-10 at 08:24 +0200, Andrew Beekhof wrote:
> On Mon, May 9, 2011 at 8:44 PM, Holger Teutsch  wrote:
> > On Wed, 2011-04-27 at 13:25 +0200, Andrew Beekhof wrote:
> >> On Sun, Apr 24, 2011 at 4:31 PM, Holger Teutsch  
> >> wrote:
> > ...
> >> > Remaining diffs seem to be not related to my changes.
> >>
> >> Unlikely I'm afraid.  We run the regression tests after every commit
> >> and complain loudly if they fail.
> >> What is the regression test output?
> >
> > That's the output of tools/regression.sh of pacemaker-devel *without* my
> > patches:
> > Version: parent: 10731:bf7b957f4cbe tip
> >
> > see attachment
> 
> There seems to be something not quite right with your environment.
> Had you built the tools directory before running the test?
Yes, + install

> In a clean chroot it passes onboth opensuse and fedora:
> 
> http://build.clusterlabs.org:8010/builders/opensuse-11.3-i386-devel/builds/48/steps/cli_test/logs/stdio
> and
> 
> http://build.clusterlabs.org:8010/builders/fedora-13-x86_64-devel/builds/48/steps/cli_test/logs/stdio
> 
> What distro are you on?
> 
Opensuse 11.4

> Could you try running it as:
> /full/path/to/pacemaker/sources/tools/regression.sh
> 
> The PATH magic that allows the tests to be run from the source
> directory may not be fully functional.
> 
Did not help, will do further investigation.
-holger



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker