Re: [Pacemaker] crmd restart due to internal error - pacemaker 1.1.8
On Fri, May 10, 2013 at 6:21 AM, Andrew Beekhof and...@beekhof.net wrote: On 08/05/2013, at 9:16 PM, pavan tc pavan...@gmail.com wrote: Hi Andrew, Thanks much for looking into this. I have some queries inline. Hi, I have a two-node cluster with STONITH disabled. Thats not a good idea. Ok. I'll try and configure stonith. I am still running with the pcmk plugin as opposed to the recommended CMAN plugin. On rhel6? Yes. With 1.1.8, I see some messages (appended to this mail) once in a while. I do not understand some keywords here - There is a Leave action. I am not sure what that is. It means the cluster is not going to change the state of the resource. Why did the cluster execute the Leave action at this point? Is there some other error that triggers this? Or is it a benign message? And, there is a CIB update failure that leads to a RECOVER action. There is a message that says the RECOVER action is not supported. Finally this leads to a stop and start of my resource. Well, and also Pacemaker's crmd process. My guess... the node is overloaded which is causing the cib queries to time out. Is there a cib query timeout value that I can set? I was earlier getting the TOTEM timeout. So, I set the token to a larger value (5 seconds) in corosync.conf and things were much better. But now, I have started hitting this problem. Thanks, Pavan I can copy the crm configure show output, but nothing special there. Thanks much. Pavan PS: The resource vha-bcd94724-3ec0-4a8d-8951-9d27be3a6acb is stale. The underlying device that represents this resource has been removed. However, the resource is still part of the CIB. All errors related to that resource can be ignored. But can this cause a node to be stopped/fenced? Not if fencing is disabled. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crmd restart due to internal error - pacemaker 1.1.8
Is there a cib query timeout value that I can set? I was earlier getting the TOTEM timeout. So, I set the token to a larger value (5 seconds) in corosync.conf and things were much better. But now, I have started hitting this problem. I'll experiment with the cibadmin -t (--timeout) option to see if it helps. As I can see from the code, the default seems to be 30 ms. Is there a widely used default for systems with a high load or is it found out the hard way for each setup? Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crmd restart due to internal error - pacemaker 1.1.8
I'll experiment with the cibadmin -t (--timeout) option to see if it helps. As I can see from the code, the default seems to be 30 ms. Is there a widely used default for systems with a high load or is it found out the hard way for each setup? Easier said than done. Can someone help with how to use the --timeout option in cibadmin? Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues
Another user hit the same issue and was able to reproduce. You can see the resolution at https://bugzilla.redhat.com/show_bug.cgi?id=951340 Thanks much for letting me know. I will watch the Fixed in version field and upgrade as necessary. Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues
Yes, but looking at the code it should be impossible. Would it be possible for you to add: export PCMK_trace_functions=peer_update_callback to /etc/sysconfig/pacemaker and re-test (and send me the new logs - probably in /var/log/pacemaker.log)? Sorry about the delay. I have put these in place and am running tests now. The next time I hit this, I'll post the messages. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues
On Fri, Apr 12, 2013 at 9:27 AM, pavan tc pavan...@gmail.com wrote: Absolutely none in the syslog. Only the regular monitor logs from my resource agent which continued to report as secondary. This is very strange, because the thing that caused the I_PE_CALC is a timer that goes off every 15 minutes. Which would seem to imply that there was a transition of some kind about when the failure happened - but somehow it didnt go into the logs. Could you post the complete logs from 14:00 to 14:30? Sure. Here goes. Attached are two logs and corosync.conf - 1. syslog (Edited, messages from other modules removed. I have not touched the pacemaker/corosync related messages) 2 corosync.log (Unedited) 3 corosync.conf Wanted to mention a couple of things: -- 14:06 is when the system was coming back up from a reboot. I have started from the earliest message during boot to the point the I_PE_CALC timer popped and a promote was called. -- I see the following during boot up. Does that mean pacemaker did not start? Apr 10 14:06:26 corosync [pcmk ] info: process_ais_conf: Enabling MCP mode: Use the Pacemaker init script to complete Pacemaker startup Could that contribute to any of this behaviour? I'll be glad to provide any other information. Did anybody get a chance to look at the information attached in the previous email? Thanks, Pavan Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues
Hi Andrew, Thanks much for looking at this. Then (after about 15 minutes), I see the following: There were no logs at all in between? Absolutely none in the syslog. Only the regular monitor logs from my resource agent which continued to report as secondary. I also checked /var/log/cluster/corosync.log. The only difference between this and the ones in syslog are the messages below: From /var/log/cluster/corosync.log: --- Apr 10 14:12:38 [3391] vsanqa4 crmd: notice: ais_dispatch_message: Membership 166060: quorum lost Apr 10 14:12:38 [3386] vsanqa4cib: notice: crm_update_peer_state: crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost Apr 10 14:12:38 [3391] vsanqa4 crmd: notice: crm_update_peer_state: crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost Apr 10 14:12:38 [3391] vsanqa4 crmd: info: peer_update_callback: vsanqa3 is now lost (was member) Apr 10 14:12:38 corosync [CPG ] chosen downlist: sender r(0) ip(172.16.68.117) ; members(old:2 left:1) Apr 10 14:12:38 corosync [MAIN ] Completed service synchronization, ready to provide service. Apr 10 14:12:38 [3386] vsanqa4cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/62, version=0.668.12): OK (rc=0) Apr 10 14:12:38 [3386] vsanqa4cib: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/64, version=0.668.14): OK (rc=0) Apr 10 14:12:38 [3391] vsanqa4 crmd: info: crmd_ais_dispatch: Setting expected votes to 2 Apr 10 14:12:38 [3386] vsanqa4cib: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/66, version=0.668.15): OK (rc=0) The first six out of the 10 messages above were seen on syslog too. Adding them here for context. The last four are the extra messages in corosync.log Pavan Apr 10 14:26:46 vsanqa4 crmd[3391]: notice: do_state_transition: State transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] Apr 10 14:26:46 vsanqa4 pengine[3390]: notice: unpack_config: On loss of CCM Quorum: Ignore Apr 10 14:26:46 vsanqa4 pengine[3390]: notice: LogActions: Promote vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e:0#011(Slave - Master vsanqa4) Apr 10 14:26:46 vsanqa4 pengine[3390]: notice: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-392.bz2 Thanks, Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues
Hi, [I did go through the mail thread titled: RHEL6 and clones: CMAN needed anyway?, but was not sure about some answers there] I recently moved from pacemaker 1.1.7 to 1.1.8-7 on centos 6.2. I see the following in syslog: corosync[2966]: [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon. corosync[2966]: [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN Does this mean that my current configuration is incorrect and will not work as it used to with pacemaker 1.1.7/Corosync? I looked at the Clusters from Scratch instructions and it talks mostly about GFS2. I don't have any filesystem requirements. In that case, can I live with Pacemaker/Corosync? I do understand that this config is not recommended, but the reason I ask is because I am hitting a weird problem with this setup which I will explain below. Just want to make sure that I don't start off with an erroneous setup. I have a two-node multi-state resource configured with the following config: [root@vsanqa4 ~]# crm configure show node vsanqa3 node vsanqa4 primitive vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e ocf:heartbeat:vgc-cm-agent.ocf \ params cluster_uuid=6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \ op monitor interval=30s role=Master timeout=100s \ op monitor interval=31s role=Slave timeout=100s ms ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \ meta clone-max=2 globally-unique=false target-role=Started location ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e-nodes ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e \ rule $id=ms-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e-nodes-rule -inf: #uname ne vsanqa4 and #uname ne vsanqa3 property $id=cib-bootstrap-options \ dc-version=1.1.8-7.el6-394e906 \ cluster-infrastructure=classic openais (with plugin) \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore rsc_defaults $id=rsc-options \ resource-stickiness=100 With this config, if I simulate a crash on the master with echo c /proc/sysrq-trigger, the slave does not get promoted for about 15 minutes. It does detect the peer going down, but does not seem to issue the promote immediately: Apr 10 14:12:32 vsanqa4 corosync[2966]: [TOTEM ] A processor failed, forming new configuration. Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 166060: memb=1, new=0, lost=1 Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: pcmk_peer_update: memb: vsanqa4 1967394988 Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: pcmk_peer_update: lost: vsanqa3 1950617772 Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 166060: memb=1, new=0, lost=0 Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: pcmk_peer_update: MEMB: vsanqa4 1967394988 Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: ais_mark_unseen_peer_dead: Node vsanqa3 was not seen in the previous transition Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: update_member: Node 1950617772/vsanqa3 is now: lost Apr 10 14:12:38 vsanqa4 corosync[2966]: [pcmk ] info: send_member_notification: Sending membership update 166060 to 2 children Apr 10 14:12:38 vsanqa4 corosync[2966]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Apr 10 14:12:38 vsanqa4 cib[3386]: notice: ais_dispatch_message: Membership 166060: quorum lost Apr 10 14:12:38 vsanqa4 crmd[3391]: notice: ais_dispatch_message: Membership 166060: quorum lost Apr 10 14:12:38 vsanqa4 cib[3386]: notice: crm_update_peer_state: crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost Apr 10 14:12:38 vsanqa4 crmd[3391]: notice: crm_update_peer_state: crm_update_ais_node: Node vsanqa3[1950617772] - state is now lost Apr 10 14:12:38 vsanqa4 corosync[2966]: [CPG ] chosen downlist: sender r(0) ip(172.16.68.117) ; members(old:2 left:1) Apr 10 14:12:38 vsanqa4 corosync[2966]: [MAIN ] Completed service synchronization, ready to provide service. Then (after about 15 minutes), I see the following: Apr 10 14:26:46 vsanqa4 crmd[3391]: notice: do_state_transition: State transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] Apr 10 14:26:46 vsanqa4 pengine[3390]: notice: unpack_config: On loss of CCM Quorum: Ignore Apr 10 14:26:46 vsanqa4 pengine[3390]: notice: LogActions: Promote vha-6f92a1f6-969c-4c41-b9ca-7eb6f83ace2e:0#011(Slave - Master vsanqa4) Apr 10 14:26:46 vsanqa4 pengine[3390]: notice: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-392.bz2 Thanks, Pavan ___ Pacemaker
[Pacemaker] CentOS 6.2 and pacemaker versions
Hi, I have installed pacemaker/corosync from the standard yum repositories on my CentOS 6.2 box. What I get is the following: pacemaker-cli-1.1.7-6.el6.x86_64 pacemaker-cluster-libs-1.1.7-6.el6.x86_64 pacemaker-libs-1.1.7-6.el6.x86_64 pacemaker-1.1.7-6.el6.x86_64 corosynclib-1.4.1-7.el6_3.1.x86_64 corosync-1.4.1-7.el6_3.1.x86_64 In one of my earlier queries to this list, I was advised against using pacemaker version 1.1.7. But if I try to move to pacemaker 1.1.8, it has a dependency on glibc-2.14, whereas the default glibc shipped with CentOS 6.2 is glibc-2.12, and I'd prefer to stick with it. Is it possible for me to move to later versions of pacemaker in some way? Thanks, Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker stop behaviour when underlying resource is unavailable
[..] The idea is to make sure that stop does not fail when the underlying resource goes away. (Otherwise I see that the resource gets to an unmanaged state) Also, the expectation is that when the resource comes back, it joins the cluster without much fuss. What I see is that pacemaker calls stop twice That would not be expected. Bug? Are you pointing at stop getting called 'twice'? If yes, I will confirm once more about the behaviour and will raise a bug. and if it finds that stop returns success, it does not continue with monitor any more. I also do not see an attempt to start. Anywhere? Or just on the same node? On the same node. The resource does get promoted on the other node. My expectation was that if I kept returning OCF_NOT_RUNNING in monitor, then it should attempt a start-stop-monitor cycle till the resource came back. It seems this is not what the cluster manager does? Is there a way to keep the monitor going in such circumstances? Not really. You can define a recurring monitor for the Stopped role though. I did not want to go there if I could achieve it via the usual mechanisms. If that is not, possible, I will explore this option in more detail. But why would it come back? You _really_ should not be starting services outside of the cluster - not least of all because we've probably started it somewhere else in the meantime. Even if we started the resource elsewhere, we are running in degraded mode. (My bad, I did not mention this is a _two-node_ multi-state resource). We would like to come back to the available mode as early as possible and with the least amount of manual intervention with the cluster. Pavan Am I using incorrect resource agent return codes? Thanks, Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Pacemaker stop behaviour when underlying resource is unavailable
Hi, I have structured my multi-state resource agent as below when the underlying resource becomes unavailable for some reason: monitor() { state=get_primitive_resource_state() ... ... if ($state == unavailable) return $OCF_NOT_RUNNING ... ... } stop() { monitor() ret=$? if (ret == $OCF_NOT_RUNNING) return $OCF_SUCCESS } start() { start_primitive() if (start_primitive_failure) return OCF_ERR_GENERIC } The idea is to make sure that stop does not fail when the underlying resource goes away. (Otherwise I see that the resource gets to an unmanaged state) Also, the expectation is that when the resource comes back, it joins the cluster without much fuss. What I see is that pacemaker calls stop twice and if it finds that stop returns success, it does not continue with monitor any more. I also do not see an attempt to start. Is there a way to keep the monitor going in such circumstances? Am I using incorrect resource agent return codes? Thanks, Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Moving multi-state resources
Hi, My requirement was to do some administration on one of the nodes where a 2-node multi-state resource was running. To effect a resource instance stoppage on one of the nodes, I added a resource constraint as below: crm configure location ms_stop_res_on_node ms_resource rule -inf: \#uname eq `hostname` The resource cleanly moved over to the other node. Incidentally, the resource was the master on this node and was successfully moved to a master state on the other node too. Now, I want to bring the resource back onto the original node. But the above resource constraint seems to have a persistent behaviour. crm resource unmigrate ms_resource does not seem to undo the effects of the constraint addition. I think the location constraint is preventing the resource from starting on the original node. How do I delete this location constraint now? Is there a more standard way of doing such administrative tasks? The requirement is that I do not want to offline the entire node while doing the administration but rather would want to stop only the resource instance, do the admin work and restart the resource instance on the node. Thanks, Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Moving multi-state resources
On Wed, Dec 12, 2012 at 6:46 PM, Dejan Muhamedagic deja...@fastmail.fmwrote: Hi, On Wed, Dec 12, 2012 at 03:50:01PM +0530, pavan tc wrote: Hi, My requirement was to do some administration on one of the nodes where a 2-node multi-state resource was running. To effect a resource instance stoppage on one of the nodes, I added a resource constraint as below: crm configure location ms_stop_res_on_node ms_resource rule -inf: \#uname eq `hostname` The resource cleanly moved over to the other node. Incidentally, the resource was the master on this node and was successfully moved to a master state on the other node too. Now, I want to bring the resource back onto the original node. But the above resource constraint seems to have a persistent behaviour. crm resource unmigrate ms_resource does not seem to undo the effects of the constraint addition. You can try to remove your constraint: crm configure delete ms_stop_res_on_node That did the job. Thanks a ton! Pavan migrate/unmigrate generate/remove special constraints. Thanks, Dejan I think the location constraint is preventing the resource from starting on the original node. How do I delete this location constraint now? Is there a more standard way of doing such administrative tasks? The requirement is that I do not want to offline the entire node while doing the administration but rather would want to stop only the resource instance, do the admin work and restart the resource instance on the node. Thanks, Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Listing resources by attributes
Hi, Is there a way in which resources can be listed based on some attributes? For example, listing resource running on a certain node, or listing ms resources. The crm_resource manpage talks about the -N and -t options that seem to address the requirements above. But they do not provide the expected result. crm_resource --list or crm_resource --list-raw give the same output immaterial of whether it was provided with -N or -t. I had to do the following to pull out 'ms' resources, for example: crm configure show | grep -w ^ms | awk '{print $2}' Is there a cleaner way to list resources? Thanks, Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Nodes OFFLINE with not in our membership messages
On Thu, Dec 6, 2012 at 5:21 PM, Nikita Michalko michalko.sys...@a-i-p.comwrote: Hi, did you already try to google on: not in our membership ? Not sure which part you were addressing. I mean, I did not pluck the github link out of thin air ;) And if it is the lack of information in my email that you are talking about, I think I pointed at the fix for the issue and hence I presume the conditions under which it happens are known and sending the details about the same from my setup is a little superfluous. What I wanted to know was if there is a bug ID that describes the problem and the fix, and if I could address this issue by staying on Pacemaker 1.1.7. Thanks, Pavan E.g. : http://lists.linux-ha.org/pipermail/linux-ha/2007-February/023469.html Nikita Michalko ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Difference between crm resource and crm_resource
Hi, Can someone please explain how the commands - crm resource stop resource name and crm_resource --resource resource name --set-parameter target-role --meta --parameter-value Stopped are different? Also, I see that crm has a -w option (which gives synchronous behaviour to the command) Is there something similar for crm_resource? Thanks, Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Difference between crm resource and crm_resource
They are not. crm shell just provides a more coherent wrapper around the various commands. Also, I see that crm has a -w option (which gives synchronous behaviour to the command) Is there something similar for crm_resource? No. crm shell then watches the DC until the transition triggered by the change has completed. crm_resource just modifies the configuration. Thanks much. Pavan Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Nodes OFFLINE with not in our membership messages
Hi, I have now hit this issue twice in my setup. I see the following github commit addressing this issue: https://github.com/ClusterLabs/pacemaker/commit/03f6105592281901cc10550b8ad19af4beb5f72f From the patch, it appears there is an incorrect conclusion about the status of the membership of nodes. Is there a root cause analysis of this issue that I can read through? I am currently using 1.1.7. Would the suggestion be to move to 1.1.8, or is there a workaround? (I have already done a good deal of testing with 1.1.7, and would like to live with it if possible) Thanks, Pavan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org