Re: [Pacemaker] timed out / exec error

2012-12-20 Thread Dejan Muhamedagic
Hi,

On Tue, Dec 18, 2012 at 10:58:18AM +, James Harper wrote:
 For the following failure:
 
 Failed actions:
 p_lvm_iscsi:0_monitor_1 (node=bitvs6, call=57, rc=-2, status=Timed 
 Out): unknown exec error
 
 Is this the ra itself returning a Timed Out error, or is it
 the cluster software determining that the ra is taking too long
 and so killing it and declaring it failed? stonith kicks in

The latter.

 shortly after this happens so tracking it down is a bit of a
 pain.

Is it expected? Normally, a monitor failing should cause a
resource restart. If a resource fails to stop, it may be a
resource agent bug.

 It happens any time the system gets loaded (eg when making a
 config change)

What kind of change?

 and I can't seem to put my finger on what is
 causing it.

Which resource is that? Which version of resource agents do you
run?

Thanks,

Dejan

 Thanks
 
 James
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Split-brain on DRBD + Corosync/Pacemaker

2012-12-20 Thread Felipe Gutierrez
Thanks Soni,

I will discuss it with my professor.

Thanks,
Felipe

On Thu, Dec 20, 2012 at 12:26 AM, Soni Maula Harriz 
soni.har...@sangkuriang.co.id wrote:

 bonding in network




-- 
*--
-- Felipe Oliveira Gutierrez
-- felipe.o.gutier...@gmail.com
-- https://sites.google.com/site/lipe82/Home/diaadia*
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] timed out / exec error

2012-12-20 Thread James Harper
 Hi,
 
 On Tue, Dec 18, 2012 at 10:58:18AM +, James Harper wrote:
  For the following failure:
 
  Failed actions:
  p_lvm_iscsi:0_monitor_1 (node=bitvs6, call=57, rc=-2,
  status=Timed Out): unknown exec error
 
  Is this the ra itself returning a Timed Out error, or is it the
  cluster software determining that the ra is taking too long and so
  killing it and declaring it failed? stonith kicks in
 
 The latter.
 
  shortly after this happens so tracking it down is a bit of a pain.
 
 Is it expected? Normally, a monitor failing should cause a resource restart. 
 If
 a resource fails to stop, it may be a resource agent bug.
 
  It happens any time the system gets loaded (eg when making a config
  change)
 
 What kind of change?
 
  and I can't seem to put my finger on what is causing it.
 
 Which resource is that? Which version of resource agents do you run?
 

Any cib change throws the system load up for 10-20 seconds, and then things 
start timing out, despite having set the timeouts well in excess of the time it 
takes for pacemaker to mark the resource as timed out.

All packages are from debian wheezy.

James

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] timed out / exec error

2012-12-20 Thread Dejan Muhamedagic
On Thu, Dec 20, 2012 at 11:43:20AM +, James Harper wrote:
  Hi,
  
  On Tue, Dec 18, 2012 at 10:58:18AM +, James Harper wrote:
   For the following failure:
  
   Failed actions:
   p_lvm_iscsi:0_monitor_1 (node=bitvs6, call=57, rc=-2,
   status=Timed Out): unknown exec error
  
   Is this the ra itself returning a Timed Out error, or is it the
   cluster software determining that the ra is taking too long and so
   killing it and declaring it failed? stonith kicks in
  
  The latter.
  
   shortly after this happens so tracking it down is a bit of a pain.
  
  Is it expected? Normally, a monitor failing should cause a resource 
  restart. If
  a resource fails to stop, it may be a resource agent bug.
  
   It happens any time the system gets loaded (eg when making a config
   change)
  
  What kind of change?
  
   and I can't seem to put my finger on what is causing it.
  
  Which resource is that? Which version of resource agents do you run?
  
 
 Any cib change throws the system load up for 10-20 seconds, and then things 
 start timing out, despite having set the timeouts well in excess of the time 
 it takes for pacemaker to mark the resource as timed out.

Hmm, unless your CIB (the configuration) is really huge, that
shouldn't be happening. I'd open a bugzilla with debian. Check
beforehand which processes go wild. Increase timeouts to prevent
resources failing and stonith.

 All packages are from debian wheezy.

I don't know which versions are currently in debian wheezy
(looks like 1.1.7).

Thanks,

Dejan

 James
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm shell

2012-12-20 Thread Dejan Muhamedagic
Hi,

On Tue, Dec 18, 2012 at 02:19:18PM -0500, Jay Janssen wrote:
 Having learned pretty much everything I know about pacemaker (which isn't a 
 lot) using the crm shell, I am dismayed to find it isn't included in 
 pacemaker 1.1.8.  
 
 Since when is it a good development practice to deprecate (and not only 
 deprecate, but completely abandon and stop supporting altogether) features 
 that were in the previous dot release?  It's almost like you *want* us 
 annoying users to go use something else. /rant
 
 Seriously, how am I supposed to edit CRM configurations on the command line 
 with the provided tools?

The crm shell is available here:

http://savannah.nongnu.org/projects/crmsh/

The current version is 1.2.4, though it hasn't been announced
yet. But you can find all the relevant URLs in previous news
items.

Thanks,

Dejan

 Jay Janssen, MySQL Consulting Lead, Percona Inc.
 http://about.me/jay.janssen
 Percona Live in Santa Clara, CA  April 22nd-25th 2013
 http://www.percona.com/live/mysql-conference-2013/
 

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm shell

2012-12-20 Thread David Vossel


- Original Message -
 From: Jay Janssen jay.jans...@percona.com
 To: pacemaker@oss.clusterlabs.org
 Sent: Tuesday, December 18, 2012 1:19:18 PM
 Subject: [Pacemaker] crm shell
 
 
 
 Having learned pretty much everything I know about pacemaker (which
 isn't a lot) using the crm shell, I am dismayed to find it isn't
 included in pacemaker 1.1.8.
 
 
 Since when is it a good development practice to deprecate (and not
 only deprecate, but completely abandon and stop supporting
 altogether) features that were in the previous dot release? It's

Whoa, crm shell didn't get deprecated, it got split out of pacemaker.  I 
understand your frustration though. There are multiple HA management tools 
available now which are completely maintained outside of pacemaker.

http://clusterlabs.org/#info

Take a look at the Configuration Tools Section, it will point you in the 
right direction.

-- Vossel

 almost like you *want* us annoying users to go use something else.
 /rant
 
 
 Seriously, how am I supposed to edit CRM configurations on the
 command line with the provided tools?
 
 
 
 
 
 
 
 
 
 
 Jay Janssen, MySQL Consulting Lead, Percona Inc.
 http://about.me/jay.janssen
 
 Percona Live in Santa Clara, CA April 22nd-25th 2013
 http://www.percona.com/live/mysql-conference-2013/
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Clone resource as a dependency

2012-12-20 Thread Attila Megyeri
Is this so difficult or so trivial, that no one responded? :)

I would appreciate a reference to some documentation as well.

Thank you,
Attila

From: Attila Megyeri [mailto:amegy...@minerva-soft.com]
Sent: Wednesday, December 19, 2012 10:05 AM
To: The Pacemaker cluster resource manager
Subject: [Pacemaker] Clone resource as a dependency

Hi,

How can I configure a resource (e.g. an apache) to depend on the start of a 
clone resource (e.g. a filesystem resource) for the given node?
I know how to arrange a primitive into a group, but in this particular case, 
the primitive must run on the passive node as well (performing some async 
offline operations), but apache may run only if the clone is started on the 
node where apache is about to start.

I tried by defining the clone resource and then by adding a mandatory order 
where apache depends on the filesystem resource, but apache keeps on running 
even if the filesystem runs only on a different node (stopped on the apache 
node).

BTW, the filesystem is glusterfs.

Thank you in advance!


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Clone resource as a dependency

2012-12-20 Thread Jake Smith
A collocation constraint as well as the order so it must run on the same node 
as a running clone might do it. Not quite sure with the clone though. 

Doc reference would require some more info such as what version of pacemaker, 
etc. 

Including configuration helps get answers quicker. 

HTH 




Jake 
- Original Message -

From: Attila Megyeri amegy...@minerva-soft.com 
To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org 
Sent: Thursday, December 20, 2012 1:23:07 PM 
Subject: Re: [Pacemaker] Clone resource as a dependency 



Is this so difficult or so trivial, that no one responded? J 

I would appreciate a reference to some documentation as well. 

Thank you, 
Attila 



From: Attila Megyeri [mailto:amegy...@minerva-soft.com] 
Sent: Wednesday, December 19, 2012 10:05 AM 
To: The Pacemaker cluster resource manager 
Subject: [Pacemaker] Clone resource as a dependency 

Hi, 

How can I configure a resource (e.g. an apache) to depend on the start of a 
clone resource (e.g. a filesystem resource) for the given node? 
I know how to arrange a primitive into a group, but in this particular case, 
the primitive must run on the passive node as well (performing some async 
offline operations), but apache may run only if the clone is started on the 
node where apache is about to start. 

I tried by defining the clone resource and then by adding a mandatory order 
where apache depends on the filesystem resource, but apache keeps on running 
even if the filesystem runs only on a different node (stopped on the apache 
node). 

BTW, the filesystem is glusterfs. 

Thank you in advance! 


___ 
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker 

Project Home: http://www.clusterlabs.org 
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
Bugs: http://bugs.clusterlabs.org 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Clone resource as a dependency

2012-12-20 Thread Attila Megyeri
Thanks Jake,

I did not try with the collocation constraint as the clone was running on all 
nodes, but I will give it a try – not sure whether this would work with a clone.
I am using pacemaker 1.1.6 on a debian system, the critical RAs are from latest 
github. The cluster is assymetric.

The config itself is quite big so I wouldn’t paste it here, but the basic 
requirement is very simple:


-  Primitive “fs” (filesystem)

-  Clone of “fs” with clone-max=4. It shall run on 4 of the 7 nodes.

-  primitive apache, which is allowed to run on 2 of 7 nodes, but in 
one instance only

-  property $id=cib-bootstrap-options \

-  dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \

-  cluster-infrastructure=openais \

-  expected-quorum-votes=7 \

-  stonith-enabled=false \

-  no-quorum-policy=stop \

-  start-failure-is-fatal=false \

-  stonith-action=reboot \

-  symmetric-cluster=false \

-  last-lrm-refresh=1355960642

-



The goal is to make sure that apache runs only if a FS clone is running on that 
node as well. At the same time, the FS clone must run on all 4 nodes.

Thanks,
Attila



From: Jake Smith [mailto:jsm...@argotec.com]
Sent: Thursday, December 20, 2012 8:37 PM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Clone resource as a dependency

A collocation constraint as well as the order so it must run on the same node 
as a running clone might do it.  Not quite sure with the clone though.

Doc reference would require some more info such as what version of pacemaker, 
etc.

Including configuration helps get answers quicker.

HTH
Jake


From: Attila Megyeri 
amegy...@minerva-soft.commailto:amegy...@minerva-soft.com
To: The Pacemaker cluster resource manager 
pacemaker@oss.clusterlabs.orgmailto:pacemaker@oss.clusterlabs.org
Sent: Thursday, December 20, 2012 1:23:07 PM
Subject: Re: [Pacemaker] Clone resource as a dependency
Is this so difficult or so trivial, that no one responded? ☺

I would appreciate a reference to some documentation as well.

Thank you,
Attila

From: Attila Megyeri [mailto:amegy...@minerva-soft.com]
Sent: Wednesday, December 19, 2012 10:05 AM
To: The Pacemaker cluster resource manager
Subject: [Pacemaker] Clone resource as a dependency

Hi,

How can I configure a resource (e.g. an apache) to depend on the start of a 
clone resource (e.g. a filesystem resource) for the given node?
I know how to arrange a primitive into a group, but in this particular case, 
the primitive must run on the passive node as well (performing some async 
offline operations), but apache may run only if the clone is started on the 
node where apache is about to start.

I tried by defining the clone resource and then by adding a mandatory order 
where apache depends on the filesystem resource, but apache keeps on running 
even if the filesystem runs only on a different node (stopped on the apache 
node).

BTW, the filesystem is glusterfs.

Thank you in advance!



___
Pacemaker mailing list: 
Pacemaker@oss.clusterlabs.orgmailto:Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Clone resource as a dependency

2012-12-20 Thread Jake Smith
- Original Message -

 From: Attila Megyeri amegy...@minerva-soft.com
 To: The Pacemaker cluster resource manager
 pacemaker@oss.clusterlabs.org
 Sent: Thursday, December 20, 2012 3:07:06 PM
 Subject: Re: [Pacemaker] Clone resource as a dependency

 Thanks Jake,

 I did not try with the collocation constraint as the clone was
 running on all nodes, but I will give it a try – n ot sure whether
 this would work with a clone.

If you setup the collocation so apache depends upon the fs then the fs can run 
anywhere but apache can only run where fs is. I think that will take care of it 
for you. 

 I am using pacemaker 1.1.6 on a debian system, the critical RAs are
 from latest github. The cluster is assymetric.

 The config itself is quite big so I wouldn’t paste it here, but the
 basic requirement is very simple:

 - Primitive “fs” (filesystem)
 - Clone of “fs” with clone-max=4. It shall run on 4 of the 7 nodes.
 - primitive apache, which is allowed to run on 2 of 7 nodes, but in
 one instance only
 - property $id=cib-bootstrap-options \
 - dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \
 - cluster-infrastructure=openais \
 - expected-quorum-votes=7 \
 - stonith-enabled=false \
 - no-quorum-policy=stop \
 - start-failure-is-fatal=false \
 - stonith-action=reboot \
 - symmetric-cluster=false \
 - last-lrm-refresh=1355960642
 -

 The goal is to make sure that apache runs only if a FS clone is
 running on that node as well. At the same time, the FS clone must
 run on all 4 nodes.

 Thanks,
 Attila

 From: Jake Smith [mailto:jsm...@argotec.com]
 Sent: Thursday, December 20, 2012 8:37 PM
 To: The Pacemaker cluster resource manager
 Subject: Re: [Pacemaker] Clone resource as a dependency

 A collocation constraint as well as the order so it must run on the
 same node as a running clone might do it. Not quite sure with the
 clone though.

 Doc reference would require some more info such as what version of
 pacemaker, etc.

 Including configuration helps get answers quicker.

 HTH

 Jake

 - Original Message -

 From: Attila Megyeri  amegy...@minerva-soft.com 
 To: The Pacemaker cluster resource manager 
 pacemaker@oss.clusterlabs.org 
 Sent: Thursday, December 20, 2012 1:23:07 PM
 Subject: Re: [Pacemaker] Clone resource as a dependency
 Is this so difficult or so trivial, that no one responded? J

 I would appreciate a reference to some documentation as well.

 Thank you,
 Attila

 From: Attila Megyeri [ mailto:amegy...@minerva-soft.com ]
 Sent: Wednesday, December 19, 2012 10:05 AM
 To: The Pacemaker cluster resource manager
 Subject: [Pacemaker] Clone resource as a dependency

 Hi,

 How can I configure a resource (e.g. an apache) to depend on the
 start of a clone resource (e.g. a filesystem resource) for the given
 node?
 I know how to arrange a primitive into a group, but in this
 particular case, the primitive must run on the passive node as well
 (performing some async offline operations), but apache may run only
 if the clone is started on the node where apache is about to start.

 I tried by defining the clone resource and then by adding a mandatory
 order where apache depends on the filesystem resource, but apache
 keeps on running even if the filesystem runs only on a different
 node (stopped on the apache node).

 BTW, the filesystem is glusterfs.

 Thank you in advance!

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Newbie Pacemakerd on CentOS 5.8

2012-12-20 Thread Michael Papet
I may be doing the impossible trying to get a pacemaker+corosync cluster to 
work on Centos 5.8 building from source.  I have some system constraints I 
cannot ignore.

Corosync finds the nodes just fine. (kslinux1, kslinux2)  SELinux and the 
firewall is turned off.
Pacemakerd starts just fine on kslinux1.  kslinux2 seems to be the problem.

Starting pacemakerd -f -V on kslinux2 returns

Could not establish pacemakerd connection: Connection refused (111)
    info: crm_ipc_connect:  Could not establish pacemakerd connection: 
Connection refused (111)
    info: get_cluster_type: Detected an active 'corosync' cluster
    info: read_config:  Reading configure for stack: corosync
  notice: crm_add_logfile:  Additional logging available in 
/var/log/cluster/corosync.log
    info: read_config:  User configured file based logging and explicitly 
disabled syslog.
  notice: main: Starting Pacemaker 1.1.8 (Build: 3035414):  
generated-manpages agent-manpages ncurses libqb-logging libqb-ipc lha-fencing 
upstart systemd  corosync-native snmp
    info: main: Maximum core file size is: 4294967295
    info: qb_ipcs_us_publish:   server name: pacemakerd
  notice: corosync_node_name:   Unable to get node name for nodeid 0
  notice: get_local_node_name:  Defaulting to uname(2).nodename for the local 
corosync node name
  notice: update_node_processes:    0x9415ea0 Node  now known as 
kslinux2, was:
  notice: find_and_track_existing_processes:    Tracking existing lrmd process 
(pid=23794)
  notice: find_and_track_existing_processes:    Tracking existing cib process 
(pid=24068)
  notice: find_and_track_existing_processes:    Tracking existing attrd process 
(pid=24069)
    info: start_child:  Forked child 25857 for process stonith-ng
    info: start_child:  Forked child 25858 for process pengine
    info: start_child:  Forked child 25859 for process crmd
    info: main: Starting mainloop
  
And then this is in /var/log/cluster/corosync.log

Dec 20 15:42:02 [27261] kslinux2   crmd: info: crm_ipc_connect: 
Could not establish cib_shm connection: Connection refused (111)
Dec 20 15:42:02 [27261] kslinux2   crmd: info: do_cib_control:  Could 
not connect to the CIB service: Transport endpoint is not connected
Dec 20 15:42:02 [27261] kslinux2   crmd:  warning: do_cib_control:  
Couldn't complete CIB registration 16 times... pause and retry
Dec 20 15:42:04 [27261] kslinux2   crmd: info: crm_timer_popped:    
Wait Timer (I_NULL) just popped (2000ms)
Dec 20 15:42:04 [27261] kslinux2   crmd: info: crm_ipc_connect: 
Could not establish cib_shm connection: Connection refused (111)
Dec 20 15:42:05 [27261] kslinux2   crmd: info: crm_ipc_connect: 
Could not establish cib_shm connection: Connection refused (111)
Dec 20 15:42:05 [27261] kslinux2   crmd: info: do_cib_control:  Could 
not connect to the CIB service: Transport endpoint is not connected
Dec 20 15:42:05 [27261] kslinux2   crmd:  warning: do_cib_control:  
Couldn't complete CIB registration 17 times... pause and retry
Dec 20 15:42:07 [27261] kslinux2   crmd: info: crm_timer_popped:    
Wait Timer (I_NULL) just popped (2000ms)
Dec 20 15:42:07 [27261] kslinux2   crmd: info: crm_ipc_connect: 
Could not establish cib_shm connection: Connection refused (111)
Dec 20 15:42:08 [27261] kslinux2   crmd: info: crm_ipc_connect: 
Could not establish cib_shm connection: Connection refused (111)
Dec 20 15:42:08 [27261] kslinux2   crmd: info: do_cib_control:  Could 
not connect to the CIB service: Transport endpoint is not connected
Dec 20 15:42:08 [27261] kslinux2   crmd:  warning: do_cib_control:  
Couldn't complete CIB registration 18 times... pause and retry
Dec 20 15:42:10 [27261] kslinux2   crmd: info: crm_timer_popped:    
Wait Timer (I_NULL) just popped (2000ms)
Dec 20 15:42:10 [27261] kslinux2   crmd: info: crm_ipc_connect: 
Could not establish cib_shm connection: Connection refused (111)
Dec 20 15:42:11 [27261] kslinux2   crmd: info: crm_ipc_connect: 
Could not establish cib_shm connection: Connection refused (111)
Dec 20 15:42:11 [27261] kslinux2   crmd: info: do_cib_control:  Could 
not connect to the CIB service: Transport endpoint is not connected
Dec 20 15:42:11 [27261] kslinux2   crmd:  warning: do_cib_control:  
Couldn't complete CIB registration 19 times... pause and retry
Dec 20 15:42:13 [27261] kslinux2   crmd: info: crm_timer_popped:    
Wait Timer (I_NULL) just popped (2000ms)
Dec 20 15:42:13 [27261] kslinux2   crmd: info: crm_ipc_connect: 
Could not establish cib_shm connection: Connection refused (111)
Dec 20 15:42:14 [27261] kslinux2   crmd: info: crm_ipc_connect: 
Could not establish cib_shm connection: Connection refused (111)
Dec 20 15:42:14 [27261] kslinux2   crmd: info: 

Re: [Pacemaker] timed out / exec error

2012-12-20 Thread James Harper
 
  Any cib change throws the system load up for 10-20 seconds, and then
  things start timing out, despite having set the timeouts well in excess of 
  the
  time it takes for pacemaker to mark the resource as timed out.
 
 Hmm, unless your CIB (the configuration) is really huge, that shouldn't be
 happening. I'd open a bugzilla with debian. Check beforehand which
 processes go wild. Increase timeouts to prevent resources failing and stonith.

Hopefully I can do some testing over the Christmas break and come up with 
something meaningful to report. I've set up a test cluster of virtual machines 
which is a copy of my main cluster with all the resources changed to dummy and 
while there is a bit of a spike when I make a change, that is to be expected 
because all 5 vm's are running on an underpowered physical machine. It 
otherwise works fine and never times anything out.

The problem is that I've set the monitor timeouts to 5 minutes but the actual 
timeout seems to be happening within 30 seconds of making the changes to the 
configuration, which is why I was wondering if the resource was reporting its 
own timeout.

I'm assuming that 5 nodes and 61 resources isn't a particularly big cluster?

 
  All packages are from debian wheezy.
 
 I don't know which versions are currently in debian wheezy (looks like 1.1.7).
 

Yes that's right.

James


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] booth is the state of started on pacemaker before booth write ticket info in cib.

2012-12-20 Thread Yuichi SEINO
Hi Jiaju,

2012/12/18 Jiaju Zhang jjzh...@suse.de:
 On Mon, 2012-12-17 at 10:40 +0900, Yuichi SEINO wrote:
 Hi Jiaju,

  
   Perhaps,  this problem didn't happen before the following commit.
   https://github.com/jjzhang/booth/commit/4b00d46480f45a205f2550ff0760c8b372009f7f
  
   Currently when all of the initialization (including loading the new
   ticket information) finished, booth should be regarded as ready. So if
   you encounter some problem here, I guess we should improve the RA to
   better reflect the booth startup status, but not moving the
   initialization order, since it may introduce other regression as we have
   encountered before;)
  
 
  I am not still sure which we should fix RA or booth.
 
  I suggest to add a new function to clear the old ticket info in the CIB,
  and call that function when booth just run but before deamonized. So,
  before booth_start in the RA returned, the stale data has been cleared.
  What do you think about this?;)
 

 In the case of using cib info, Can you implement it? For example,
 booth is fail-over on local. Then, booth need to get the ticket in
 cib. If there is no this problem, I can agree to it.

 OK, I'll implement it;)

 Thanks,
 Jiaju



OK, thanks.
Are you going to implement it in the next development ?

Sincerely,
Yuichi

--
Yuichi SEINO
METROSYSTEMS CORPORATION
E-mail:seino.clust...@gmail.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org