Re: [Pacemaker] Pacemaker Explained -- Issues

2011-05-09 Thread Andrew Beekhof
On Sat, May 7, 2011 at 10:03 AM, KIA posei...@mail.ru wrote:
 Hi Andrew,

 I've found a consistency issue in subj.

 In section 1.4.1 DC is Designated Co-ordinator.

 In section 2.3 DC is Designated Controller.

 In all following occurrences of DC we have been left alone to guess what
 it means here. :(

Both mean the same thing.


 A decision should be made, and the issue should be fixed.

 Sincerely,

 Igor



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-05-09 Thread Holger Teutsch
On Fri, 2011-05-06 at 16:15 +0200, Andrew Beekhof wrote:
 On Fri, May 6, 2011 at 12:28 PM, Holger Teutsch holger.teut...@web.de wrote:
  On Fri, 2011-05-06 at 11:03 +0200, Andrew Beekhof wrote:
  On Fri, May 6, 2011 at 9:53 AM, Andrew Beekhof and...@beekhof.net wrote:
   On Thu, May 5, 2011 at 5:43 PM, Holger Teutsch holger.teut...@web.de 
   wrote:
   On Fri, 2011-04-29 at 09:41 +0200, Andrew Beekhof wrote:
   
Unfortunately the devel code does not run at all in my environment 
so I
have to fix this first.
  
   Oh?  I ran CTS on it the other day and it was fine here.
  
  
   I installed pacemaker-devel on top of a compilation of pacemaker-1.1. In
   addition I tried make uninstall for both versions and then again
   make install for devel. Pacemaker does not come up, crm_mon shows
   nodes as offline.
  
   I suspect reason is
   May  5 17:09:34 devel1 crmd: [5942]: notice: crmd_peer_update: Status 
   update: Client devel1/crmd now has status [online] (DC=null)
   May  5 17:09:34 devel1 crmd: [5942]: info: crm_update_peer: Node 
   devel1: id=1790093504 state=unknown addr=(null) votes=0 born=0 seen=0 
   proc=00111312 (new)
   May  5 17:09:34 devel1 crmd: [5942]: info: pcmk_quorum_notification: 
   Membership 0: quorum retained (0)
   May  5 17:09:34 devel1 crmd: [5942]: debug: do_fsa_action: 
   actions:trace: #011// A_STARTED
   May  5 17:09:34 devel1 crmd: [5942]: info: do_started: Delaying start, 
   no membership data (0010)
 
   ^
   May  5 17:09:34 devel1 crmd: [5942]: debug: register_fsa_input_adv: 
   Stalling the FSA pending further input: cause=C_FSA_INTERNAL
  
   Any ideas ?
  
   Hg version?  Corosync config?
   I'm running -devel here right now and things are fine.
 
  Uh, I think I see now.
  Try http://hg.clusterlabs.org/pacemaker/1.1/rev/b94ce5673ce4
 
 
 Yeah, I realized afterwards that it was specific to devel.
 What does your corosync config look like?
I run corosync-1.3.0-3.1.x86_64.
It's exactly the same config that worked with
pacemaker 1.1 rev 10608:b4f456380f60



# Please read the corosync.conf.5 manual page
compatibility: whitetank

aisexec {
# Run as root - this is necessary to be able to manage
# resources with Pacemaker
user:   root
group:  root
}

service {
# Load the Pacemaker Cluster Resource Manager
ver:1
name:   pacemaker
use_mgmtd:  yes
use_logd:   yes
}

totem {
# The only valid version is 2
version:2

# How long before declaring a token lost (ms)
token:  5000

# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 10

# How long to wait for join messages in the membership protocol (ms)
join:   60

# How long to wait for consensus to be achieved before starting
# a new round of membership configuration (ms)
consensus:  6000

# Turn off the virtual synchrony filter
vsftype:none

# Number of messages that may be sent by one processor on
# receipt of the token
max_messages:   20

# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes

# Disable encryption
secauth:off

# How many threads to use for encryption/decryption
threads:0

# Optionally assign a fixed node id (integer)
# nodeid:   1234

rrp_mode:   active

interface {
ringnumber: 0

# The following values need to be set based on your environment
bindnetaddr:192.168.178.0
mcastaddr:  226.94.40.1
mcastport:  5409
}

interface {
ringnumber: 1

# The following values need to be set based on your environment
bindnetaddr:10.1.1.0
mcastaddr:  226.94.41.1
mcastport:  5411
}
}

logging {
fileline:   off
to_stderr:  no
to_logfile: no
to_syslog:  yes
syslog_facility: daemon
debug:  on
timestamp:  off
}

amf {
mode: disabled
}

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-05-09 Thread Andrew Beekhof
On Mon, May 9, 2011 at 9:58 AM, Holger Teutsch holger.teut...@web.de wrote:
 On Fri, 2011-05-06 at 16:15 +0200, Andrew Beekhof wrote:
 On Fri, May 6, 2011 at 12:28 PM, Holger Teutsch holger.teut...@web.de 
 wrote:
  On Fri, 2011-05-06 at 11:03 +0200, Andrew Beekhof wrote:
  On Fri, May 6, 2011 at 9:53 AM, Andrew Beekhof and...@beekhof.net wrote:
   On Thu, May 5, 2011 at 5:43 PM, Holger Teutsch holger.teut...@web.de 
   wrote:
   On Fri, 2011-04-29 at 09:41 +0200, Andrew Beekhof wrote:
   
Unfortunately the devel code does not run at all in my environment 
so I
have to fix this first.
  
   Oh?  I ran CTS on it the other day and it was fine here.
  
  
   I installed pacemaker-devel on top of a compilation of pacemaker-1.1. 
   In
   addition I tried make uninstall for both versions and then again
   make install for devel. Pacemaker does not come up, crm_mon shows
   nodes as offline.
  
   I suspect reason is
   May  5 17:09:34 devel1 crmd: [5942]: notice: crmd_peer_update: Status 
   update: Client devel1/crmd now has status [online] (DC=null)
   May  5 17:09:34 devel1 crmd: [5942]: info: crm_update_peer: Node 
   devel1: id=1790093504 state=unknown addr=(null) votes=0 born=0 seen=0 
   proc=00111312 (new)
   May  5 17:09:34 devel1 crmd: [5942]: info: pcmk_quorum_notification: 
   Membership 0: quorum retained (0)
   May  5 17:09:34 devel1 crmd: [5942]: debug: do_fsa_action: 
   actions:trace: #011// A_STARTED
   May  5 17:09:34 devel1 crmd: [5942]: info: do_started: Delaying start, 
   no membership data (0010)
                                                         
   ^
   May  5 17:09:34 devel1 crmd: [5942]: debug: register_fsa_input_adv: 
   Stalling the FSA pending further input: cause=C_FSA_INTERNAL
  
   Any ideas ?
  
   Hg version?  Corosync config?
   I'm running -devel here right now and things are fine.
 
  Uh, I think I see now.
  Try http://hg.clusterlabs.org/pacemaker/1.1/rev/b94ce5673ce4
 

 Yeah, I realized afterwards that it was specific to devel.
 What does your corosync config look like?
 I run corosync-1.3.0-3.1.x86_64.
 It's exactly the same config that worked with
 pacemaker 1.1 rev 10608:b4f456380f60

If it works for that version, then it should work for any of the 7
commits since.
None of them could produce:

May  5 17:09:33 devel1 pacemakerd: [5929]: info: get_cluster_type:
Cluster type is: 'corosync'

Going back through the commit logs, it looks like the following is
when things broke:
   http://hg.clusterlabs.org/pacemaker/1.1/rev/44e5f4e760f6

Specifically this need to be removed:
+setenv(HA_cluster_type,  corosync,1);

Could you see if that change helps?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-05-09 Thread Andrew Beekhof
I thought you said you were running 1.1?

May  5 17:09:33 devel1 pacemakerd: [5929]: info: read_config: Reading
configure for stack: corosync

This message is specific to the devel branch.

Update to get the following fix and you should be fine:
http://hg.clusterlabs.org/pacemaker/devel/rev/84ef5401322f

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [pacemaker][patch 3/4] Simple changes for Pacemaker Explained, Chapter 6 CH_Constraints.xml

2011-05-09 Thread Andrew Beekhof
On Fri, May 6, 2011 at 12:29 PM, Dejan Muhamedagic deja...@fastmail.fm wrote:
 On Fri, May 06, 2011 at 09:47:29AM +0200, Andrew Beekhof wrote:
 On Thu, May 5, 2011 at 5:20 PM, Dejan Muhamedagic deja...@fastmail.fm 
 wrote:
  On Thu, May 05, 2011 at 12:02:01PM +0200, Andrew Beekhof wrote:
  On Thu, May 5, 2011 at 11:37 AM, Dejan Muhamedagic deja...@fastmail.fm 
  wrote:
   On Thu, May 05, 2011 at 09:07:05AM +0200, Andrew Beekhof wrote:
   On Wed, May 4, 2011 at 7:15 PM, Dejan Muhamedagic 
   deja...@fastmail.fm wrote:
Hi,
   
On Wed, May 04, 2011 at 12:49:03PM +0200, Andrew Beekhof wrote:
Tick tock.  I'm going to push this soon unless someone raises an 
objection RSN.
   
On Fri, Apr 15, 2011 at 4:55 PM, Andrew Beekhof 
and...@beekhof.net wrote:
 On Fri, Apr 15, 2011 at 3:00 PM, Lars Marowsky-Bree 
 l...@novell.com wrote:
 On 2011-04-13T08:37:12, Andrew Beekhof and...@beekhof.net 
 wrote:

  Before:
 
        rsc_colocation id=coloc-set score=INFINITY
          resource_set id=coloc-set-0
            resource_ref id=dummy2/
            resource_ref id=dummy3/
          /resource_set
          resource_set id=coloc-set-1 sequential=false 
  role=Master
            resource_ref id=dummy0/
            resource_ref id=dummy1/
          /resource_set
        /rsc_colocation
        rsc_order id=order-set score=INFINITY
          resource_set id=order-set-0 role=Master
            resource_ref id=dummy0/
            resource_ref id=dummy1/
          /resource_set
          resource_set id=order-set-1 sequential=false
            resource_ref id=dummy2/
            resource_ref id=dummy3/
          /resource_set
        /rsc_order
 
 
 
  After:

 So I am understanding this properly - we're getting rid of the
 sequential attribute, yes?

 Absolutely.
   
So, the internal-collocation replaces the sequential attribute?
  
   Yes.
  
What are the possible and/or meaningfull values for
internal-collocation? It looks like that would be 0 or INFINITY
only, which would translate to old sequential false and true,
right?
  
   No.
  
                 choice
                   data type=integer/
                   valueINFINITY/value
                   value+INFINITY/value
                   value-INFINITY/value
                 /choice
  
   I saw that, but wonder what makes sense in this context. What's
   the difference between values 0, INF, 50, -50, 100? Are all those
   necessary?
 
  Just as necessary as for colocation constraints not involving sets.
  You're setting up the colocation score between elements of the set.
 
  OK.
 
Looking at the schema, the ordering constraint lost score
  
   Score was being mapped to kind inside the PE anyway.
  
and is
using only the kind attribute which can have one of:
   
     valueNone/value
     valueOptional/value
     valueMandatory/value
     valueSerialize/value
   
But then, the kind attribute is optional. If missing, how's
that different from value None?
  
   If its missing you get the default.  Which IIRC is Mandatory not None.
  
What does Serialize mean? (in orders)
  
   Same as it did before, this is not new.
  
What does score-attribute-mangle mean? (in collocations)
  
   As above.  Not new.
  
   Where are these two documented? Couldn't find anything in the
   docs.
 
  Looks to be just an alias for XML_RULE_ATTR_SCORE_ATTRIBUTE dating back 
  to 2005.
  So there is probably a reason I didn't document it.
 
  So, it's obsolete then? The crm shell actually never supported
  it :-|  And I can't recall that I've ever seen it in a
  configuration.
 
  Serialize is newer.  Its like optional but for a set - no member will
  start or stop at the same time as another.
 
  OK.
 
I think that it'd be good to clarify the shell syntax before
applying these changes.

 Actually I'm going to flip-flop here... there's really no need for this.
 Until the shell understands the new syntax, it will just show xml right?

 Right. But in my experience trying things out in shell syntax
 sometimes reveals design shortcomings. That was so with the
 resource sets.

 Going back to the example you've shown earlier in this thread ...

 Before:

 collocation c inf: ( dummy0:Master dummy1:Master ) dummy2 dummy3
 order o Mandatory: dummy0:promote dummy1:promote ( dummy2 dummy3 )

 After(1):

 collocation_set c inf: 0:[dummy0:Master dummy1:Master] inf:[dummy2 dummy3]
 order_set o Mandatory: Mandatory:[dummy0:promote dummy1:promote] 
 Optional:[dummy2 dummy3]

 After(2):

 collocation_set c inf: 0:[dummy0:Master dummy1:Master] dummy2 dummy3
 order_set o Mandatory: dummy0:promote dummy1:promote Optional:[dummy2 dummy3]

 The second version removes redundant specification, i.e. for the
 sets which have the same kind/score as the constraint.

 Would this kind 

Re: [Pacemaker] filesystem could not mount after reboot

2011-05-09 Thread Viacheslav Biriukov
Hello.
I think it was some resource  fail and it was been lock by failcount or you
got some wrong location score.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-05-09 Thread Holger Teutsch
I had 1.1 but Dejan asked my to rebase my patches on devel.

So long story short: devel now works after upgrading to the rev you
mentioned and I got back to working on my patches.

Thanx
Holger

On Mon, 2011-05-09 at 10:58 +0200, Andrew Beekhof wrote:
 I thought you said you were running 1.1?
 
 May  5 17:09:33 devel1 pacemakerd: [5929]: info: read_config: Reading
 configure for stack: corosync
 
 This message is specific to the devel branch.
 
 Update to get the following fix and you should be fine:
 http://hg.clusterlabs.org/pacemaker/devel/rev/84ef5401322f
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-05-09 Thread Holger Teutsch
On Wed, 2011-04-27 at 13:25 +0200, Andrew Beekhof wrote:
 On Sun, Apr 24, 2011 at 4:31 PM, Holger Teutsch holger.teut...@web.de wrote:
...
  Remaining diffs seem to be not related to my changes.
 
 Unlikely I'm afraid.  We run the regression tests after every commit
 and complain loudly if they fail.
 What is the regression test output?

That's the output of tools/regression.sh of pacemaker-devel *without* my
patches:
Version: parent: 10731:bf7b957f4cbe tip

see attachment
-holger


Using local binaries from: .
* Passed: cibadmin   - Require --force for CIB erasure
* Passed: cibadmin   - Allow CIB erasure with --force
* Passed: cibadmin   - Query CIB
* Passed: crm_attribute  - Set cluster option
* Passed: cibadmin   - Query new cluster option
* Passed: cibadmin   - Query cluster options
* Passed: cibadmin   - Delete nvpair
* Passed: cibadmin   - Create operaton should fail with: -21, The object 
already exists
* Passed: cibadmin   - Modify cluster options section
* Passed: cibadmin   - Query updated cluster option
* Passed: crm_attribute  - Set duplicate cluster option
* Passed: crm_attribute  - Setting multiply defined cluster option should fail 
with -216, Could not set cluster option
* Passed: crm_attribute  - Set cluster option with -s
* Passed: crm_attribute  - Delete cluster option with -i
* Passed: cibadmin   - Create node entry
* Passed: cibadmin   - Create node status entry
* Passed: crm_attribute  - Create node attribute
* Passed: cibadmin   - Query new node attribute
* Passed: cibadmin   - Digest calculation
* Passed: cibadmin   - Replace operation should fail with: -45, Update was 
older than existing configuration
* Passed: crm_standby- Default standby value
* Passed: crm_standby- Set standby status
* Passed: crm_standby- Query standby value
* Passed: crm_standby- Delete standby value
* Passed: cibadmin   - Create a resource
* Passed: crm_resource   - Create a resource meta attribute
* Passed: crm_resource   - Query a resource meta attribute
* Passed: crm_resource   - Remove a resource meta attribute
* Passed: crm_resource   - Create a resource attribute
* Passed: crm_resource   - List the configured resources
* Passed: crm_resource   - Set a resource's fail-count
* Passed: crm_resource   - Require a destination when migrating a resource that 
is stopped
* Passed: crm_resource   - Don't support migration to non-existant locations
* Passed: crm_resource   - Migrate a resource
* Passed: crm_resource   - Un-migrate a resource
--- ./regression.exp2011-05-09 20:26:27.669381187 +0200
+++ ./regression.out2011-05-09 20:38:27.112098949 +0200
@@ -616,7 +616,7 @@
   /status
 /cib
 * Passed: crm_resource   - List the configured resources
-cib epoch=16 num_updates=2 admin_epoch=0 validate-with=pacemaker-1.2 
+cib epoch=16 num_updates=1 admin_epoch=0 validate-with=pacemaker-1.2 
   configuration
 crm_config
   cluster_property_set id=cib-bootstrap-options/
@@ -642,19 +642,13 @@
 constraints/
   /configuration
   status
-node_state id=clusterNode-UUID uname=clusterNode-UNAME
-  transient_attributes id=clusterNode-UUID
-instance_attributes id=status-clusterNode-UUID
-  nvpair id=status-clusterNode-UUID-fail-count-dummy 
name=fail-count-dummy value=10/
-/instance_attributes
-  /transient_attributes
-/node_state
+node_state id=clusterNode-UUID uname=clusterNode-UNAME/
   /status
 /cib
 * Passed: crm_resource   - Set a resource's fail-count
 Resource dummy not moved: not-active and no preferred location specified.
 Error performing operation: cib object missing
-cib epoch=16 num_updates=2 admin_epoch=0 validate-with=pacemaker-1.2 
+cib epoch=16 num_updates=1 admin_epoch=0 validate-with=pacemaker-1.2 
   configuration
 crm_config
   cluster_property_set id=cib-bootstrap-options/
@@ -680,19 +674,13 @@
 constraints/
   /configuration
   status
-node_state id=clusterNode-UUID uname=clusterNode-UNAME
-  transient_attributes id=clusterNode-UUID
-instance_attributes id=status-clusterNode-UUID
-  nvpair id=status-clusterNode-UUID-fail-count-dummy 
name=fail-count-dummy value=10/
-/instance_attributes
-  /transient_attributes
-/node_state
+node_state id=clusterNode-UUID uname=clusterNode-UNAME/
   /status
 /cib
 * Passed: crm_resource   - Require a destination when migrating a resource 
that is stopped
 Error performing operation: i.dont.exist is not a known node
 Error performing operation: The object/attribute does not exist
-cib epoch=16 num_updates=2 admin_epoch=0 validate-with=pacemaker-1.2 
+cib epoch=16 num_updates=1 admin_epoch=0 validate-with=pacemaker-1.2 
   configuration
 crm_config
   cluster_property_set id=cib-bootstrap-options/
@@ -718,13 +706,7 @@
 constraints/
   /configuration
   status
-node_state id=clusterNode-UUID uname=clusterNode-UNAME
-  transient_attributes id=clusterNode-UUID
-