Re: [Pacemaker] timed out / exec error
Hi, On Tue, Dec 18, 2012 at 10:58:18AM +, James Harper wrote: For the following failure: Failed actions: p_lvm_iscsi:0_monitor_1 (node=bitvs6, call=57, rc=-2, status=Timed Out): unknown exec error Is this the ra itself returning a Timed Out error, or is it the cluster software determining that the ra is taking too long and so killing it and declaring it failed? stonith kicks in The latter. shortly after this happens so tracking it down is a bit of a pain. Is it expected? Normally, a monitor failing should cause a resource restart. If a resource fails to stop, it may be a resource agent bug. It happens any time the system gets loaded (eg when making a config change) What kind of change? and I can't seem to put my finger on what is causing it. Which resource is that? Which version of resource agents do you run? Thanks, Dejan Thanks James ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Split-brain on DRBD + Corosync/Pacemaker
Thanks Soni, I will discuss it with my professor. Thanks, Felipe On Thu, Dec 20, 2012 at 12:26 AM, Soni Maula Harriz soni.har...@sangkuriang.co.id wrote: bonding in network -- *-- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia* ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] timed out / exec error
Hi, On Tue, Dec 18, 2012 at 10:58:18AM +, James Harper wrote: For the following failure: Failed actions: p_lvm_iscsi:0_monitor_1 (node=bitvs6, call=57, rc=-2, status=Timed Out): unknown exec error Is this the ra itself returning a Timed Out error, or is it the cluster software determining that the ra is taking too long and so killing it and declaring it failed? stonith kicks in The latter. shortly after this happens so tracking it down is a bit of a pain. Is it expected? Normally, a monitor failing should cause a resource restart. If a resource fails to stop, it may be a resource agent bug. It happens any time the system gets loaded (eg when making a config change) What kind of change? and I can't seem to put my finger on what is causing it. Which resource is that? Which version of resource agents do you run? Any cib change throws the system load up for 10-20 seconds, and then things start timing out, despite having set the timeouts well in excess of the time it takes for pacemaker to mark the resource as timed out. All packages are from debian wheezy. James ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] timed out / exec error
On Thu, Dec 20, 2012 at 11:43:20AM +, James Harper wrote: Hi, On Tue, Dec 18, 2012 at 10:58:18AM +, James Harper wrote: For the following failure: Failed actions: p_lvm_iscsi:0_monitor_1 (node=bitvs6, call=57, rc=-2, status=Timed Out): unknown exec error Is this the ra itself returning a Timed Out error, or is it the cluster software determining that the ra is taking too long and so killing it and declaring it failed? stonith kicks in The latter. shortly after this happens so tracking it down is a bit of a pain. Is it expected? Normally, a monitor failing should cause a resource restart. If a resource fails to stop, it may be a resource agent bug. It happens any time the system gets loaded (eg when making a config change) What kind of change? and I can't seem to put my finger on what is causing it. Which resource is that? Which version of resource agents do you run? Any cib change throws the system load up for 10-20 seconds, and then things start timing out, despite having set the timeouts well in excess of the time it takes for pacemaker to mark the resource as timed out. Hmm, unless your CIB (the configuration) is really huge, that shouldn't be happening. I'd open a bugzilla with debian. Check beforehand which processes go wild. Increase timeouts to prevent resources failing and stonith. All packages are from debian wheezy. I don't know which versions are currently in debian wheezy (looks like 1.1.7). Thanks, Dejan James ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm shell
Hi, On Tue, Dec 18, 2012 at 02:19:18PM -0500, Jay Janssen wrote: Having learned pretty much everything I know about pacemaker (which isn't a lot) using the crm shell, I am dismayed to find it isn't included in pacemaker 1.1.8. Since when is it a good development practice to deprecate (and not only deprecate, but completely abandon and stop supporting altogether) features that were in the previous dot release? It's almost like you *want* us annoying users to go use something else. /rant Seriously, how am I supposed to edit CRM configurations on the command line with the provided tools? The crm shell is available here: http://savannah.nongnu.org/projects/crmsh/ The current version is 1.2.4, though it hasn't been announced yet. But you can find all the relevant URLs in previous news items. Thanks, Dejan Jay Janssen, MySQL Consulting Lead, Percona Inc. http://about.me/jay.janssen Percona Live in Santa Clara, CA April 22nd-25th 2013 http://www.percona.com/live/mysql-conference-2013/ ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm shell
- Original Message - From: Jay Janssen jay.jans...@percona.com To: pacemaker@oss.clusterlabs.org Sent: Tuesday, December 18, 2012 1:19:18 PM Subject: [Pacemaker] crm shell Having learned pretty much everything I know about pacemaker (which isn't a lot) using the crm shell, I am dismayed to find it isn't included in pacemaker 1.1.8. Since when is it a good development practice to deprecate (and not only deprecate, but completely abandon and stop supporting altogether) features that were in the previous dot release? It's Whoa, crm shell didn't get deprecated, it got split out of pacemaker. I understand your frustration though. There are multiple HA management tools available now which are completely maintained outside of pacemaker. http://clusterlabs.org/#info Take a look at the Configuration Tools Section, it will point you in the right direction. -- Vossel almost like you *want* us annoying users to go use something else. /rant Seriously, how am I supposed to edit CRM configurations on the command line with the provided tools? Jay Janssen, MySQL Consulting Lead, Percona Inc. http://about.me/jay.janssen Percona Live in Santa Clara, CA April 22nd-25th 2013 http://www.percona.com/live/mysql-conference-2013/ ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Clone resource as a dependency
Is this so difficult or so trivial, that no one responded? :) I would appreciate a reference to some documentation as well. Thank you, Attila From: Attila Megyeri [mailto:amegy...@minerva-soft.com] Sent: Wednesday, December 19, 2012 10:05 AM To: The Pacemaker cluster resource manager Subject: [Pacemaker] Clone resource as a dependency Hi, How can I configure a resource (e.g. an apache) to depend on the start of a clone resource (e.g. a filesystem resource) for the given node? I know how to arrange a primitive into a group, but in this particular case, the primitive must run on the passive node as well (performing some async offline operations), but apache may run only if the clone is started on the node where apache is about to start. I tried by defining the clone resource and then by adding a mandatory order where apache depends on the filesystem resource, but apache keeps on running even if the filesystem runs only on a different node (stopped on the apache node). BTW, the filesystem is glusterfs. Thank you in advance! ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Clone resource as a dependency
A collocation constraint as well as the order so it must run on the same node as a running clone might do it. Not quite sure with the clone though. Doc reference would require some more info such as what version of pacemaker, etc. Including configuration helps get answers quicker. HTH Jake - Original Message - From: Attila Megyeri amegy...@minerva-soft.com To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Sent: Thursday, December 20, 2012 1:23:07 PM Subject: Re: [Pacemaker] Clone resource as a dependency Is this so difficult or so trivial, that no one responded? J I would appreciate a reference to some documentation as well. Thank you, Attila From: Attila Megyeri [mailto:amegy...@minerva-soft.com] Sent: Wednesday, December 19, 2012 10:05 AM To: The Pacemaker cluster resource manager Subject: [Pacemaker] Clone resource as a dependency Hi, How can I configure a resource (e.g. an apache) to depend on the start of a clone resource (e.g. a filesystem resource) for the given node? I know how to arrange a primitive into a group, but in this particular case, the primitive must run on the passive node as well (performing some async offline operations), but apache may run only if the clone is started on the node where apache is about to start. I tried by defining the clone resource and then by adding a mandatory order where apache depends on the filesystem resource, but apache keeps on running even if the filesystem runs only on a different node (stopped on the apache node). BTW, the filesystem is glusterfs. Thank you in advance! ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Clone resource as a dependency
Thanks Jake, I did not try with the collocation constraint as the clone was running on all nodes, but I will give it a try – not sure whether this would work with a clone. I am using pacemaker 1.1.6 on a debian system, the critical RAs are from latest github. The cluster is assymetric. The config itself is quite big so I wouldn’t paste it here, but the basic requirement is very simple: - Primitive “fs” (filesystem) - Clone of “fs” with clone-max=4. It shall run on 4 of the 7 nodes. - primitive apache, which is allowed to run on 2 of 7 nodes, but in one instance only - property $id=cib-bootstrap-options \ - dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \ - cluster-infrastructure=openais \ - expected-quorum-votes=7 \ - stonith-enabled=false \ - no-quorum-policy=stop \ - start-failure-is-fatal=false \ - stonith-action=reboot \ - symmetric-cluster=false \ - last-lrm-refresh=1355960642 - The goal is to make sure that apache runs only if a FS clone is running on that node as well. At the same time, the FS clone must run on all 4 nodes. Thanks, Attila From: Jake Smith [mailto:jsm...@argotec.com] Sent: Thursday, December 20, 2012 8:37 PM To: The Pacemaker cluster resource manager Subject: Re: [Pacemaker] Clone resource as a dependency A collocation constraint as well as the order so it must run on the same node as a running clone might do it. Not quite sure with the clone though. Doc reference would require some more info such as what version of pacemaker, etc. Including configuration helps get answers quicker. HTH Jake From: Attila Megyeri amegy...@minerva-soft.commailto:amegy...@minerva-soft.com To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.orgmailto:pacemaker@oss.clusterlabs.org Sent: Thursday, December 20, 2012 1:23:07 PM Subject: Re: [Pacemaker] Clone resource as a dependency Is this so difficult or so trivial, that no one responded? ☺ I would appreciate a reference to some documentation as well. Thank you, Attila From: Attila Megyeri [mailto:amegy...@minerva-soft.com] Sent: Wednesday, December 19, 2012 10:05 AM To: The Pacemaker cluster resource manager Subject: [Pacemaker] Clone resource as a dependency Hi, How can I configure a resource (e.g. an apache) to depend on the start of a clone resource (e.g. a filesystem resource) for the given node? I know how to arrange a primitive into a group, but in this particular case, the primitive must run on the passive node as well (performing some async offline operations), but apache may run only if the clone is started on the node where apache is about to start. I tried by defining the clone resource and then by adding a mandatory order where apache depends on the filesystem resource, but apache keeps on running even if the filesystem runs only on a different node (stopped on the apache node). BTW, the filesystem is glusterfs. Thank you in advance! ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.orgmailto:Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Clone resource as a dependency
- Original Message - From: Attila Megyeri amegy...@minerva-soft.com To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Sent: Thursday, December 20, 2012 3:07:06 PM Subject: Re: [Pacemaker] Clone resource as a dependency Thanks Jake, I did not try with the collocation constraint as the clone was running on all nodes, but I will give it a try – n ot sure whether this would work with a clone. If you setup the collocation so apache depends upon the fs then the fs can run anywhere but apache can only run where fs is. I think that will take care of it for you. I am using pacemaker 1.1.6 on a debian system, the critical RAs are from latest github. The cluster is assymetric. The config itself is quite big so I wouldn’t paste it here, but the basic requirement is very simple: - Primitive “fs” (filesystem) - Clone of “fs” with clone-max=4. It shall run on 4 of the 7 nodes. - primitive apache, which is allowed to run on 2 of 7 nodes, but in one instance only - property $id=cib-bootstrap-options \ - dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \ - cluster-infrastructure=openais \ - expected-quorum-votes=7 \ - stonith-enabled=false \ - no-quorum-policy=stop \ - start-failure-is-fatal=false \ - stonith-action=reboot \ - symmetric-cluster=false \ - last-lrm-refresh=1355960642 - The goal is to make sure that apache runs only if a FS clone is running on that node as well. At the same time, the FS clone must run on all 4 nodes. Thanks, Attila From: Jake Smith [mailto:jsm...@argotec.com] Sent: Thursday, December 20, 2012 8:37 PM To: The Pacemaker cluster resource manager Subject: Re: [Pacemaker] Clone resource as a dependency A collocation constraint as well as the order so it must run on the same node as a running clone might do it. Not quite sure with the clone though. Doc reference would require some more info such as what version of pacemaker, etc. Including configuration helps get answers quicker. HTH Jake - Original Message - From: Attila Megyeri amegy...@minerva-soft.com To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Sent: Thursday, December 20, 2012 1:23:07 PM Subject: Re: [Pacemaker] Clone resource as a dependency Is this so difficult or so trivial, that no one responded? J I would appreciate a reference to some documentation as well. Thank you, Attila From: Attila Megyeri [ mailto:amegy...@minerva-soft.com ] Sent: Wednesday, December 19, 2012 10:05 AM To: The Pacemaker cluster resource manager Subject: [Pacemaker] Clone resource as a dependency Hi, How can I configure a resource (e.g. an apache) to depend on the start of a clone resource (e.g. a filesystem resource) for the given node? I know how to arrange a primitive into a group, but in this particular case, the primitive must run on the passive node as well (performing some async offline operations), but apache may run only if the clone is started on the node where apache is about to start. I tried by defining the clone resource and then by adding a mandatory order where apache depends on the filesystem resource, but apache keeps on running even if the filesystem runs only on a different node (stopped on the apache node). BTW, the filesystem is glusterfs. Thank you in advance! ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Newbie Pacemakerd on CentOS 5.8
I may be doing the impossible trying to get a pacemaker+corosync cluster to work on Centos 5.8 building from source. I have some system constraints I cannot ignore. Corosync finds the nodes just fine. (kslinux1, kslinux2) SELinux and the firewall is turned off. Pacemakerd starts just fine on kslinux1. kslinux2 seems to be the problem. Starting pacemakerd -f -V on kslinux2 returns Could not establish pacemakerd connection: Connection refused (111) info: crm_ipc_connect: Could not establish pacemakerd connection: Connection refused (111) info: get_cluster_type: Detected an active 'corosync' cluster info: read_config: Reading configure for stack: corosync notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log info: read_config: User configured file based logging and explicitly disabled syslog. notice: main: Starting Pacemaker 1.1.8 (Build: 3035414): generated-manpages agent-manpages ncurses libqb-logging libqb-ipc lha-fencing upstart systemd corosync-native snmp info: main: Maximum core file size is: 4294967295 info: qb_ipcs_us_publish: server name: pacemakerd notice: corosync_node_name: Unable to get node name for nodeid 0 notice: get_local_node_name: Defaulting to uname(2).nodename for the local corosync node name notice: update_node_processes: 0x9415ea0 Node now known as kslinux2, was: notice: find_and_track_existing_processes: Tracking existing lrmd process (pid=23794) notice: find_and_track_existing_processes: Tracking existing cib process (pid=24068) notice: find_and_track_existing_processes: Tracking existing attrd process (pid=24069) info: start_child: Forked child 25857 for process stonith-ng info: start_child: Forked child 25858 for process pengine info: start_child: Forked child 25859 for process crmd info: main: Starting mainloop And then this is in /var/log/cluster/corosync.log Dec 20 15:42:02 [27261] kslinux2 crmd: info: crm_ipc_connect: Could not establish cib_shm connection: Connection refused (111) Dec 20 15:42:02 [27261] kslinux2 crmd: info: do_cib_control: Could not connect to the CIB service: Transport endpoint is not connected Dec 20 15:42:02 [27261] kslinux2 crmd: warning: do_cib_control: Couldn't complete CIB registration 16 times... pause and retry Dec 20 15:42:04 [27261] kslinux2 crmd: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms) Dec 20 15:42:04 [27261] kslinux2 crmd: info: crm_ipc_connect: Could not establish cib_shm connection: Connection refused (111) Dec 20 15:42:05 [27261] kslinux2 crmd: info: crm_ipc_connect: Could not establish cib_shm connection: Connection refused (111) Dec 20 15:42:05 [27261] kslinux2 crmd: info: do_cib_control: Could not connect to the CIB service: Transport endpoint is not connected Dec 20 15:42:05 [27261] kslinux2 crmd: warning: do_cib_control: Couldn't complete CIB registration 17 times... pause and retry Dec 20 15:42:07 [27261] kslinux2 crmd: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms) Dec 20 15:42:07 [27261] kslinux2 crmd: info: crm_ipc_connect: Could not establish cib_shm connection: Connection refused (111) Dec 20 15:42:08 [27261] kslinux2 crmd: info: crm_ipc_connect: Could not establish cib_shm connection: Connection refused (111) Dec 20 15:42:08 [27261] kslinux2 crmd: info: do_cib_control: Could not connect to the CIB service: Transport endpoint is not connected Dec 20 15:42:08 [27261] kslinux2 crmd: warning: do_cib_control: Couldn't complete CIB registration 18 times... pause and retry Dec 20 15:42:10 [27261] kslinux2 crmd: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms) Dec 20 15:42:10 [27261] kslinux2 crmd: info: crm_ipc_connect: Could not establish cib_shm connection: Connection refused (111) Dec 20 15:42:11 [27261] kslinux2 crmd: info: crm_ipc_connect: Could not establish cib_shm connection: Connection refused (111) Dec 20 15:42:11 [27261] kslinux2 crmd: info: do_cib_control: Could not connect to the CIB service: Transport endpoint is not connected Dec 20 15:42:11 [27261] kslinux2 crmd: warning: do_cib_control: Couldn't complete CIB registration 19 times... pause and retry Dec 20 15:42:13 [27261] kslinux2 crmd: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms) Dec 20 15:42:13 [27261] kslinux2 crmd: info: crm_ipc_connect: Could not establish cib_shm connection: Connection refused (111) Dec 20 15:42:14 [27261] kslinux2 crmd: info: crm_ipc_connect: Could not establish cib_shm connection: Connection refused (111) Dec 20 15:42:14 [27261] kslinux2 crmd: info:
Re: [Pacemaker] timed out / exec error
Any cib change throws the system load up for 10-20 seconds, and then things start timing out, despite having set the timeouts well in excess of the time it takes for pacemaker to mark the resource as timed out. Hmm, unless your CIB (the configuration) is really huge, that shouldn't be happening. I'd open a bugzilla with debian. Check beforehand which processes go wild. Increase timeouts to prevent resources failing and stonith. Hopefully I can do some testing over the Christmas break and come up with something meaningful to report. I've set up a test cluster of virtual machines which is a copy of my main cluster with all the resources changed to dummy and while there is a bit of a spike when I make a change, that is to be expected because all 5 vm's are running on an underpowered physical machine. It otherwise works fine and never times anything out. The problem is that I've set the monitor timeouts to 5 minutes but the actual timeout seems to be happening within 30 seconds of making the changes to the configuration, which is why I was wondering if the resource was reporting its own timeout. I'm assuming that 5 nodes and 61 resources isn't a particularly big cluster? All packages are from debian wheezy. I don't know which versions are currently in debian wheezy (looks like 1.1.7). Yes that's right. James ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] booth is the state of started on pacemaker before booth write ticket info in cib.
Hi Jiaju, 2012/12/18 Jiaju Zhang jjzh...@suse.de: On Mon, 2012-12-17 at 10:40 +0900, Yuichi SEINO wrote: Hi Jiaju, Perhaps, this problem didn't happen before the following commit. https://github.com/jjzhang/booth/commit/4b00d46480f45a205f2550ff0760c8b372009f7f Currently when all of the initialization (including loading the new ticket information) finished, booth should be regarded as ready. So if you encounter some problem here, I guess we should improve the RA to better reflect the booth startup status, but not moving the initialization order, since it may introduce other regression as we have encountered before;) I am not still sure which we should fix RA or booth. I suggest to add a new function to clear the old ticket info in the CIB, and call that function when booth just run but before deamonized. So, before booth_start in the RA returned, the stale data has been cleared. What do you think about this?;) In the case of using cib info, Can you implement it? For example, booth is fail-over on local. Then, booth need to get the ticket in cib. If there is no this problem, I can agree to it. OK, I'll implement it;) Thanks, Jiaju OK, thanks. Are you going to implement it in the next development ? Sincerely, Yuichi -- Yuichi SEINO METROSYSTEMS CORPORATION E-mail:seino.clust...@gmail.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org