Re: [Pacemaker] Same host displayed twice in crm status

2013-04-02 Thread Nicolas J.
I've already tried to remove the node following the document:
http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-node-delete.html

with the following commands:
crm_node -R *COROSYNC_ID*
 cibadmin --delete --obj_type nodes --crm_xml 'node uname=
VMTESTORADG2.it.dbi-services.com/'
 cibadmin --delete --obj_type status --crm_xml 'node_state uname=
VMTESTORADG2.it.dbi-services.com/'

I also tried to delete the status before deleting the node with the same
results.

I have the same issue, the node is deleted but when corosync is restarted
on any node, the deleted name appears again.
The server with the changed hostname has been restarted so I don't know
where the reference to the old name can be saved except in the cluster.

Regarding the version, here are the details:
- Corosync 1.2.7-1.1.el5
- Pacemaker 1.1.5-1.1.el5


2013/4/1 David Vossel dvos...@redhat.com

 - Original Message -
  From: Nicolas J. nikkro70+pacema...@gmail.com
  To: pacemaker@oss.clusterlabs.org
  Sent: Friday, March 29, 2013 8:55:30 AM
  Subject: [Pacemaker] Same host displayed twice in crm status
 
  Hi,
 
  I have a problem with a Corosync/Pacemaker configuration.
  One host of the cluster has been renamed and now the host is displayed
 twice
  in the configuration.
 
  When I try to remove the host from the configuration it works but if
 corosync
  is restarted on one node, the old host appears again.
  I tried several ways to delete the host with no effect.
 
  How can I delete the wrong host?

 For the pacemaker version you are using, try deleting the node from the
 configuration in both the node and status sections, then use crm_node
 -R option to remove the node from the cluster's internal cache.  In
 pacemaker versions = 1.1.8 only the crm_node -R option is required to
 remove a node.

 -- Vossel

  I checked the Linux configuration and there is no place where the old
 name is
  referenced.
  It's an OEL/Red Hat linux.
 
  Output
  -
  [root@vmtestoradg2 ~]# crm status
  
  Last updated: Fri Mar 29 14:51:56 2013
  Stack: openais
  Current DC: vmtestoradg1 - partition with quorum
  Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
  4 Nodes configured, 3 expected votes
  1 Resources configured.
  
 
  Online: [ vmtestoradg1 vmtestora10g01 vmtestoradg2 ]
  OFFLINE: [ VMTESTORADG2.it.dbi-services.com ]
 
  DG_IP (ocf::heartbeat:IPaddr2): Started vmtestoradg1
 
  [root@vmtestoradg2 ~]# crm node clearstate
 VMTESTORADG2.it.dbi-services.com
  Do you really want to drop state for node
 VMTESTORADG2.it.dbi-services.com ?
  y
  [root@vmtestoradg2 ~]# crm node delete VMTESTORADG2.it.dbi-services.com
  INFO: node VMTESTORADG2.it.dbi-services.com not found by crm_node
  INFO: node VMTESTORADG2.it.dbi-services.com deleted
 
  Thanks in advance
 
  Best Regards,
 
  Nicolas J.
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [PATCH] Use correct OCF_ROOT_DIR in include/crm/services.h.

2013-04-02 Thread Andrei Belov

Previously, libcrmservice always has OCF_ROOT_DIR defined as /usr/lib/ocf,
despite the fact that another path was defined in glue_config.h.

Caught on SunOS 5.11 while configuring cluster-glue and pacemaker using
non-standard prefix.
---
 lib/services/Makefile.am |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/services/Makefile.am b/lib/services/Makefile.am
index 3ee3347..8d44dad 100644
--- a/lib/services/Makefile.am
+++ b/lib/services/Makefile.am
@@ -25,7 +25,7 @@ noinst_HEADERS  = upstart.h systemd.h services_private.h
 
 libcrmservice_la_SOURCES = services.c services_linux.c
 libcrmservice_la_LDFLAGS = -version-info 1:0:0
-libcrmservice_la_CFLAGS  = $(GIO_CFLAGS)
+libcrmservice_la_CFLAGS  = -DOCF_ROOT_DIR=\@OCF_ROOT_DIR@\ $(GIO_CFLAGS)
 libcrmservice_la_LIBADD   = $(GIO_LIBS)
 
 if BUILD_UPSTART

-- 
Andrei

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem][crmsh]The designation of the 'ordered' attribute becomes the error.

2013-04-02 Thread Dejan Muhamedagic
Hi,

On Mon, Apr 01, 2013 at 09:19:51PM +0200, Andreas Kurz wrote:
 Hi Dejan,
 
 On 2013-03-06 11:59, Dejan Muhamedagic wrote:
  Hi Hideo-san,
  
  On Wed, Mar 06, 2013 at 10:37:44AM +0900, renayama19661...@ybb.ne.jp wrote:
  Hi Dejan,
  Hi Andrew,
 
  As for the crm shell, the check of the meta attribute was revised with the 
  next patch.
 
   * http://hg.savannah.gnu.org/hgweb/crmsh/rev/d1174f42f4b3
 
  This patch was backported in Pacemaker1.0.13.
 
   * 
  https://github.com/ClusterLabs/pacemaker-1.0/commit/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc#shell/modules/cibconfig.py
 
  However, the ordered,colocated attribute of the group resource is treated 
  as an error when I use crm Shell which adopted this patch.
 
  --
  (snip)
  ### Group Configuration ###
  group master-group \
  vip-master \
  vip-rep \
  meta \
  ordered=false
  (snip)
 
  [root@rh63-heartbeat1 ~]# crm configure load update test2339.crm 
  INFO: building help index
  crm_verify[20028]: 2013/03/06_17:57:18 WARN: unpack_nodes: Blind faith: 
  not fencing unseen nodes
  WARNING: vip-master: specified timeout 60s for start is smaller than the 
  advised 90
  WARNING: vip-master: specified timeout 60s for stop is smaller than the 
  advised 100
  WARNING: vip-rep: specified timeout 60s for start is smaller than the 
  advised 90
  WARNING: vip-rep: specified timeout 60s for stop is smaller than the 
  advised 100
  ERROR: master-group: attribute ordered does not exist  - WHY?
  Do you still want to commit? y
  --
 
  If it chooses `yes` by a confirmation message, it is reflected, but it is 
  a problem that error message is displayed.
   * The error occurs in the same way when I appoint colocated attribute.
  AndI noticed that there was not explanation of ordered,colocated of 
  the group resource in online help of Pacemaker.
 
  I think that the designation of the ordered,colocated attribute should not 
  become the error in group resource.
  In addition, I think that ordered,colocated should be added to online help.
  
  These attributes are not listed in crmsh. Does the attached patch
  help?
 
 Dejan, will this patch for the missing ordered and collocated group
 meta-attribute be included in the next crmsh release? ... can't see the
 patch in the current tip.

The shell in pacemaker v1.0.x is in maintenance mode and shipped
along with the pacemaker code. The v1.1.x doesn't have the
ordered and collocated meta attributes.

Thanks,

Dejan


 Thanks  Regards,
 Andreas
 
  
  Thanks,
  
  Dejan
 
  Best Regards,
  Hideo Yamauchi.
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] resource failover in active/active cluster

2013-04-02 Thread Charles Mean
Hello guys,

I am running corosync 1.4.2 and pacemaker 1.1.7 on Debian environments
trying to deploy an active/active cluster with two nodes.
The main problem is that I have two resources that depend on each other, so
I have the VIP cloned over the two node and nginx daemon which depends on
VIP, but when nginx goes down the VIP onde the failed node is not removed
from cluster and that node still answer to the requests.
Follow my current configuration:
node host01
node host02
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip=192.168.2.100 cidr_netmask=24 nic=eth0
clusterip_hash=sourceip \
op monitor interval=1s
primitive WebSite lsb:nginx \
op monitor interval=1s
clone WebIP ClusterIP \
meta globally-unique=true clone-max=2 clone-node-max=2
clone WebSiteClone WebSite
colocation website-with-ip inf: WebSiteClone WebIP
order nginx-after-ip inf: WebIP WebSiteClone
property $id=cib-bootstrap-options \
dc-version=1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff \
cluster-infrastructure=openais \
expected-quorum-votes=2 \
stonith-enabled=false \
last-lrm-refresh=1364509824

I have followed the pacemaker doc(
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf)
but I haven't used the distributed file system because, in this case, nginx
is not sharing the configuration files and is playing like a load balancer.
Can you tell me how to link the resources ?

Thank you
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Speeding up startup after migration

2013-04-02 Thread David Vossel
- Original Message -
 From: Lars Marowsky-Bree l...@suse.com
 To: pacemaker@oss.clusterlabs.org
 Sent: Monday, April 1, 2013 5:21:53 PM
 Subject: Re: [Pacemaker] Speeding up startup after migration
 
 On 2013-04-01T13:09:14, David Vossel dvos...@redhat.com wrote:
 
   So, if I understand correctly, new lrmd runs as many simultaneous jobs
   as possible. Unfortunately, in some circumstances this would result in
   the high node load and timeouts. Is there a way to some-how limit that
   load?
  Isn't that what the batch-limit option does?  or are you saying you want a
  batch limit type option that is node specific? Why are you concerned about
  this behavior living in the LRMD instead of at the transition processing
  level?
  
  I believe if we do any batch limiting type behavior at the LRMD level we're
  going to run into problems with the transition timers in the crmd.  The
  LRMD needs to always perform the actions it is given as soon as possible.
 
 Seriously, folks, the LRM rewrite may turn out not to be the best
 example of pacemaker's attention to detail ;-)


such is any re-write of poorly designed code ;-)  --- I included the smiley so 
my jab is acceptable and not in poor taste just like yours! :D --- I included 
this smiley because I think it looks funny.

 Yes, the previous LRM had a per-node concurrency limit. This avoided
 overloading the nodes via IO, which is why it was added. (And also
 smoothed out spikes in the monitoring calls should they happen to
 coincide.) Default limit of parallel executions was 4 or half the number
 of CPU cores, if memory serves.
 
 This turned out to actually improve performance (since it avoided said
 spikes), and avoid timeouts. (While it is true that, given a perfect
 scheduler, the total runtime of N_1..100 being kicked off all at once
 should be equal to N_1..100 being kicked off serially, it's quite
 likely that doing the former will mean at least a few of those 100
 operations hitting its *individual* timeout at the LRM level.)

I'm convinced this useful.

I'll add PCMK_MAX_CHILDREN to the sysconfig documentation.  To be backwards 
compatible I'll have the lrmd internally interpret your LRMD_MAX_CHILDREN 
environment variable as well.

sound reasonable?

 
 The TE doesn't have enough knowledge to enforce this, since it doesn't
 know if monitors get scheduled. The transition timers weren't really a
 problem, since they had some lee-way accounted for.
 
 If we don't have this functionality right now anymore, I do believe we
 need it back.
 
 I do seem to recall that at the time, Andrew preferred it to be
 implemented at the LRM level, because it avoided a more complex
 transition graph logic (e.g., the batch-limit functionality on a
 per-node level, and doing something smart about monitors); but my memory
 is hazy on this detail.
 
 Nowadays, since we have the migration-threshold anyway, it may be
 possible to do something about it cleanly in the TE, but that still
 would leave the monitors unsolved ...

 
 Regards,
     Lars
 
 (PS: 1.1.8 really isn't turning out to be my favorite release. If I
 wasn't afraid it'd received as a rant, I'd try to write up a post-mortem
 from my/our perspective to see what might be avoidable in the future.)

We should open this discussion at some point.  As long as it is constructive 
criticism I doubt it will be perceived as a rant.

I've mentioned to Andrew that we might need to consider doing release 
candidates. This would at least put some of the responsibility back on the 
community to verify the release with us before we officially tag it.  We 
definitely test our code, but it is impossible for us to test everyone's 
possible deployment use-case.

-- Vossel

 
 --
 Architect Storage/HA
 SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,
 HRB 21284 (AG Nürnberg)
 Experience is the name everyone gives to their mistakes. -- Oscar Wilde
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Same host displayed twice in crm status

2013-04-02 Thread David Vossel




- Original Message -
 From: Nicolas J. nikkro70+pacema...@gmail.com
 To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Sent: Tuesday, April 2, 2013 2:07:14 AM
 Subject: Re: [Pacemaker] Same host displayed twice in crm status
 
 I've already tried to remove the node following the document:
 http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-node-delete.html
 
 with the following commands:
 crm_node -R COROSYNC_ID
 cibadmin --delete --obj_type nodes --crm_xml 'node uname=
 VMTESTORADG2.it.dbi-services.com /'
 cibadmin --delete --obj_type status --crm_xml 'node_state uname=
 VMTESTORADG2.it.dbi-services.com /'
 
 I also tried to delete the status before deleting the node with the same
 results.
 
 I have the same issue, the node is deleted but when corosync is restarted on
 any node, the deleted name appears again.
 The server with the changed hostname has been restarted so I don't know where
 the reference to the old name can be saved except in the cluster.

Unfortunately this looks like a bug.  It sounds like the crm_node -R option 
isn't properly discarding the node cache, which means every time a new policy 
engine transition is generated that node slips back in.  I'm not sure what to 
tell you. I'm guessing the only way to fix this is to shutdown the cluster 
entirely making sure their is on mention of the old node in the config on 
startup, or trying a new version of pacemaker.  I'm sure neither of those 
solutions are what you want to hear though.  Maybe someone else has some better 
advice who has encountered this with your version.

-- Vossel

 Regarding the version, here are the details:
 - Corosync 1.2.7-1.1.el5
 - Pacemaker 1.1.5-1.1.el5
 
 
 2013/4/1 David Vossel  dvos...@redhat.com 
 
 
 
 - Original Message -
  From: Nicolas J.  nikkro70+pacema...@gmail.com 
  To: pacemaker@oss.clusterlabs.org
  Sent: Friday, March 29, 2013 8:55:30 AM
  Subject: [Pacemaker] Same host displayed twice in crm status
  
  Hi,
  
  I have a problem with a Corosync/Pacemaker configuration.
  One host of the cluster has been renamed and now the host is displayed
  twice
  in the configuration.
  
  When I try to remove the host from the configuration it works but if
  corosync
  is restarted on one node, the old host appears again.
  I tried several ways to delete the host with no effect.
  
  How can I delete the wrong host?
 
 For the pacemaker version you are using, try deleting the node from the
 configuration in both the node and status sections, then use crm_node -R
 option to remove the node from the cluster's internal cache. In pacemaker
 versions = 1.1.8 only the crm_node -R option is required to remove a node.
 
 -- Vossel
 
  I checked the Linux configuration and there is no place where the old name
  is
  referenced.
  It's an OEL/Red Hat linux.
  
  Output
  -
  [root@vmtestoradg2 ~]# crm status
  
  Last updated: Fri Mar 29 14:51:56 2013
  Stack: openais
  Current DC: vmtestoradg1 - partition with quorum
  Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
  4 Nodes configured, 3 expected votes
  1 Resources configured.
  
  
  Online: [ vmtestoradg1 vmtestora10g01 vmtestoradg2 ]
  OFFLINE: [ VMTESTORADG2.it.dbi-services.com ]
  
  DG_IP (ocf::heartbeat:IPaddr2): Started vmtestoradg1
  
  [root@vmtestoradg2 ~]# crm node clearstate VMTESTORADG2.it.dbi-services.com
  Do you really want to drop state for node VMTESTORADG2.it.dbi-services.com
  ?
  y
  [root@vmtestoradg2 ~]# crm node delete VMTESTORADG2.it.dbi-services.com
  INFO: node VMTESTORADG2.it.dbi-services.com not found by crm_node
  INFO: node VMTESTORADG2.it.dbi-services.com deleted
  
  Thanks in advance
  
  Best Regards,
  
  Nicolas J.
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
  
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: