[ClusterLabs] Serialize and symmetrical=true does not work together
Hi, I have pacemaker clusters which were working fine with Serialize and symmetrical=true together. After an upgrade, I see that Serialize does not work with symmetrical=true. How do I make sure the resources in Serialize stop in reverse order of start ? All the clusters are already in production. Please help. Thanks & Regards Dileep Nair Squad Lead - SAP Base Togaf Certified Enterprise Architect IBM Services for Managed Applications +91 98450 22258 Mobile dilen...@in.ibm.com IBM Services ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Safe way to stop pacemaker on both nodes of a two node cluster
Hi, I am confused about the best way to stop pacemaker on both nodes of a two node cluster. The options I know of are 1. Put the cluster in Maintenance Mode, stop the applications manually and then stop pacemaker on both nodes. For this I need the application to be stopped manually 2. Stop pacemaker on one node, wait for all resources to come up on second node, then stop pacemaker on second node. This might cause a significant delay because all resources has to come up on second node. Is there any other way to stop pacemaker on both nodes gracefully ? Thanks in advance. Thanks & Regards Dileep Nair Squad Lead - SAP Base Togaf Certified Enterprise Architect IBM Services for Managed Applications +91 98450 22258 Mobile dilen...@in.ibm.com IBM Services ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Strange behaviour of group resource
I see that this behavior is on Pacemaker 1.1.15 and does not see it on 1.1.15. Thanks & Regards Dileep Nair Squad Lead - SAP Base Togaf Certified Enterprise Architect IBM Services for Managed Applications +91 98450 22258 Mobile dilen...@in.ibm.com IBM Services From: Ken Gaillot To: Cluster Labs - All topics related to open-source clustering welcomed Date: 07/30/2019 08:03 PM Subject:[EXTERNAL] Re: [ClusterLabs] Strange behaviour of group resource Sent by:"Users" On Tue, 2019-07-30 at 16:26 +0530, Dileep V Nair wrote: > Thanks Ken for the response. I see below errors. Not sure why it says > target: 7 vs. rc: 0. Does that mean that pacemaker expect the > resource to be stopped and since it is running, it is taking an > action ? > > Jul 30 10:08:59 dntstdb2s0703 cib[90848]: warning: A-Sync reply to > crmd failed: No message of desired type > > Jul 30 10:09:04 dntstdb2s0703 crmd[90853]: warning: Action 16 (fs- > sapdata4_monitor_0) on dntstdb2s0703 failed (target: 7 vs. rc: 0): > Error > Jul 30 10:09:04 dntstdb2s0703 crmd[90853]: notice: Transition 1445 > aborted by operation fs-sapdata4_monitor_0 'modify' on dntstdb2s0703: > Event failed These actually aren't errors, and they're expected after a clean-up. I recently merged a change to make the message more accurate. As of the next release, it will look like: notice: Transition 1445 action 5 (fs-sapdata4_monitor_0 on dntstdb2s0703): expected 'not running' but got 'ok' Cleaning up a resource involves clearing its history. That makes the cluster expect that it is stopped. The cluster then runs probes to find out the actual status, and if the probe finds it running, the above situation happens. So, that's not causing the restarts. An actual failure that could cause restarts would have a similar message, but the rc would be something other than 0 or 7. > Jul 30 10:09:04 dntstdb2s0703 crmd[90853]: warning: Action 16 (fs- > sapdata4_monitor_0) on dntstdb2s0703 failed (target: 7 vs. rc: 0): > Error > Jul 30 10:09:04 dntstdb2s0703 stonith-ng[90849]: notice: On loss of > CCM Quorum: Ignore > Jul 30 10:09:04 dntstdb2s0703 crmd[90853]: notice: Result of probe > operation for fs-saptmp3 on dntstdb2s0703: 0 (ok) > Jul 30 10:09:04 dntstdb2s0703 crmd[90853]: warning: Action 19 (fs- > saptmp3_monitor_0) on dntstdb2s0703 failed (target: 7 vs. rc: 0): > Error > Jul 30 10:09:04 dntstdb2s0703 crmd[90853]: warning: Action 19 (fs- > saptmp3_monitor_0) on dntstdb2s0703 failed (target: 7 vs. rc: 0): > Error > > Thanks & Regards > > Dileep Nair > Squad Lead - SAP Base > Togaf Certified Enterprise Architect > IBM Services for Managed Applications > +91 98450 22258 Mobile > dilen...@in.ibm.com > > IBM Services > > > Ken Gaillot ---07/30/2019 12:47:52 AM---On Thu, 2019-07-25 at 20:51 > +0530, Dileep V Nair wrote: > Hi, > > From: Ken Gaillot > To: Cluster Labs - All topics related to open-source clustering > welcomed > Date: 07/30/2019 12:47 AM > Subject: [EXTERNAL] Re: [ClusterLabs] Strange behaviour of group > resource > Sent by: "Users" > > > > On Thu, 2019-07-25 at 20:51 +0530, Dileep V Nair wrote: > > Hi, > > > > I have around 10 filesystems in a group. When I do a crm resource > > refresh, the filesystems are unmounted and remounted, starting from > > the fourth resource in the group. Any idea what could be going on, > is > > it expected ? > > No, it sounds like some of the reprobes are failing. The logs may > have > more info. Each filesystem will have a probe like RSCNAME_monitor_0 > on > each node. > > > > > Thanks & Regards > > > > Dileep Nair > > Squad Lead - SAP Base > > Togaf Certified Enterprise Architect > > IBM Services for Managed Applications > > +91 98450 22258 Mobile > > dilen...@in.ibm.com > > > > IBM Services > ___ > Manage your subscription: > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.clusterlabs.org_mailman_listinfo_users=DwICAg=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=8No4o8ZPAhC6uixSlqjSkiBExcNnw1RqwA2zgIBi1zQ=8OAcZFYq8rUunpXXJ_e76xvVkqBbCsA0U-K9p27qEcE= > > ClusterLabs home: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.clusterlabs.org_=DwICAg=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=8No4o8ZPAhC6uixSlqjSkiBExcNnw1RqwA2zgIBi1zQ=W6qIo88IoD1emMnTqQehMlXVSbh0EVqhoywkMlfhtRU= -- Ken Gaillot ___ Manage your subscription: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.clusterlabs.org_mailman_listinfo_users=DwICAg=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=8No4o8
Re: [ClusterLabs] Strange behaviour of group resource
Thanks Ken for the response. I see below errors. Not sure why it says target: 7 vs. rc: 0. Does that mean that pacemaker expect the resource to be stopped and since it is running, it is taking an action ? Jul 30 10:08:59 dntstdb2s0703 cib[90848]: warning: A-Sync reply to crmd failed: No message of desired type Jul 30 10:09:04 dntstdb2s0703 crmd[90853]: warning: Action 16 (fs-sapdata4_monitor_0) on dntstdb2s0703 failed (target: 7 vs. rc: 0): Error Jul 30 10:09:04 dntstdb2s0703 crmd[90853]: notice: Transition 1445 aborted by operation fs-sapdata4_monitor_0 'modify' on dntstdb2s0703: Event failed Jul 30 10:09:04 dntstdb2s0703 crmd[90853]: warning: Action 16 (fs-sapdata4_monitor_0) on dntstdb2s0703 failed (target: 7 vs. rc: 0): Error Jul 30 10:09:04 dntstdb2s0703 stonith-ng[90849]: notice: On loss of CCM Quorum: Ignore Jul 30 10:09:04 dntstdb2s0703 crmd[90853]: notice: Result of probe operation for fs-saptmp3 on dntstdb2s0703: 0 (ok) Jul 30 10:09:04 dntstdb2s0703 crmd[90853]: warning: Action 19 (fs-saptmp3_monitor_0) on dntstdb2s0703 failed (target: 7 vs. rc: 0): Error Jul 30 10:09:04 dntstdb2s0703 crmd[90853]: warning: Action 19 (fs-saptmp3_monitor_0) on dntstdb2s0703 failed (target: 7 vs. rc: 0): Error Thanks & Regards Dileep Nair Squad Lead - SAP Base Togaf Certified Enterprise Architect IBM Services for Managed Applications +91 98450 22258 Mobile dilen...@in.ibm.com IBM Services From: Ken Gaillot To: Cluster Labs - All topics related to open-source clustering welcomed Date: 07/30/2019 12:47 AM Subject:[EXTERNAL] Re: [ClusterLabs] Strange behaviour of group resource Sent by:"Users" On Thu, 2019-07-25 at 20:51 +0530, Dileep V Nair wrote: > Hi, > > I have around 10 filesystems in a group. When I do a crm resource > refresh, the filesystems are unmounted and remounted, starting from > the fourth resource in the group. Any idea what could be going on, is > it expected ? No, it sounds like some of the reprobes are failing. The logs may have more info. Each filesystem will have a probe like RSCNAME_monitor_0 on each node. > > Thanks & Regards > > Dileep Nair > Squad Lead - SAP Base > Togaf Certified Enterprise Architect > IBM Services for Managed Applications > +91 98450 22258 Mobile > dilen...@in.ibm.com > > IBM Services -- Ken Gaillot ___ Manage your subscription: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.clusterlabs.org_mailman_listinfo_users=DwICAg=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=N21VARBhbZ1K8RgI-VSKGoCUjZZvcl0R2SN1w1rIrW0=C7kpNZ8b9AJHm39KxwOwOqMCu91lX3u-T5mTzMzsEXk= ClusterLabs home: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.clusterlabs.org_=DwICAg=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=N21VARBhbZ1K8RgI-VSKGoCUjZZvcl0R2SN1w1rIrW0=TSwFUqzpViYaOC0lzUBCuD0nWE22MECPl_aQJI4ZBkM= ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Strange behaviour of group resource
Hi, I have around 10 filesystems in a group. When I do a crm resource refresh, the filesystems are unmounted and remounted, starting from the fourth resource in the group. Any idea what could be going on, is it expected ? Thanks & Regards Dileep Nair Squad Lead - SAP Base Togaf Certified Enterprise Architect IBM Services for Managed Applications +91 98450 22258 Mobile dilen...@in.ibm.com IBM Services ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: Re: Issue with DB2 HADR cluster
Thanks a lot for all the information. I could see that the issue was indeed with the PEER_WINDOW. I increased it and now it is working. Thanks & Regards Dileep Nair Togaf Certified Enterprise Architect From: Valentin Vidic To: users@clusterlabs.org Date: 04/03/2019 01:26 PM Subject:Re: [ClusterLabs] Antw: Re: Issue with DB2 HADR cluster Sent by:"Users" On Wed, Apr 03, 2019 at 10:36:52AM +0300, Andrei Borzenkov wrote: > I assume this is path failover time? As I doubt storage latency can be > that high? > > I wonder, does IBM have official guidelines for integrating SBD with > their storage? Otherwise where this requirement comes from? Yes, we had problems with SBD when the timeouts were lower, so it is now configured based on this info: # Set SCSI command timeout to 120s (default == 30 or 60) for IBM 2145 devices https://www.ibm.com/support/knowledgecenter/ST3FR7_8.1.1/com.ibm.storwize.v7000.811.doc/svc_linux_settings.html -- Valentin ___ Manage your subscription: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.clusterlabs.org_mailman_listinfo_users=DwICAg=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=wC-0hCcJ5EJgDwQYM6IGC7YFmh0llzlM21u4FRDEWjI=EElhExf_586KA8W5cCqJf2qPuEePMN0OKP6XR2OPiv8= ClusterLabs home: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.clusterlabs.org_=DwICAg=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=wC-0hCcJ5EJgDwQYM6IGC7YFmh0llzlM21u4FRDEWjI=gpeRMXxOx687efR5nUTYvwRw_37ZvqqtiSikE7oMV2U= ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Change SBD Disk to VCenter Stonith
Hello Klaus, Thanks for the suggestion. I tried both ways, but even then pacemaker service is not starting because there is a dependency on SBD service, which does not start without the SBD disk. I am planning to try uninstalling SBD service itself and see if that removes the dependency. Any suggestion on how to remove the dependency without actually uninstalling service will be helpful. Thanks & Regards Dileep Nair Squad Lead - SAP Base Togaf Certified Enterprise Architect IBM Services for Managed Applications +91 98450 22258 Mobile dilen...@in.ibm.com IBM Services From: Klaus Wenninger To: Cluster Labs - All topics related to open-source clustering welcomed , Dileep V Nair Date: 02/22/2019 12:25 PM Subject:Re: [ClusterLabs] Change SBD Disk to VCenter Stonith On 02/22/2019 06:24 AM, Dileep V Nair wrote: Hi, I have a running cluster with Stonith configured as SBD. Now I would like to remove the SBD disks and move to VCenter Stonith. After removing the SBD disk, I am not able to start pacemaker because of the dependant SBD service which was configured during ha-cluster-init. What is the best way to remove SBD disk from the VM. The other way round would probably be more handy like first remove SBD configuration and afterwards remove the disk from the VM. But I guess it would probably be quickest to remove the disk from the SBD-config-file (e.g. /etc/sysconfig/sbd) to get pacemaker-service up again. Or you just disable SBD-service (e.g. systemctl disable sbd). In both cases the SBD fencing resource would probably moan on monitoring but that shouldn't prevent you from adapting the stonith configuration. Klaus Thanks & Regards Dileep Nair Squad Lead - SAP Base Togaf Certified Enterprise Architect IBM Services for Managed Applications +91 98450 22258 Mobile dilen...@in.ibm.com IBM Services ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Change SBD Disk to VCenter Stonith
Hi, I have a running cluster with Stonith configured as SBD. Now I would like to remove the SBD disks and move to VCenter Stonith. After removing the SBD disk, I am not able to start pacemaker because of the dependant SBD service which was configured during ha-cluster-init. What is the best way to remove SBD disk from the VM. Thanks & Regards Dileep Nair Squad Lead - SAP Base Togaf Certified Enterprise Architect IBM Services for Managed Applications +91 98450 22258 Mobile dilen...@in.ibm.com IBM Services ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker log showing time mismatch after
Thanks Ken for prompt response. Yes.. It was at system boot. I am still to find out a reason as to what caused the reboot. There was no Stonith OR any other error in pacemaker log. Thanks & Regards Dileep Nair Squad Lead - SAP Base Togaf Certified Enterprise Architect IBM Services for Managed Applications +91 98450 22258 Mobile dilen...@in.ibm.com IBM Services From: Ken Gaillot To: Cluster Labs - All topics related to open-source clustering welcomed Date: 01/28/2019 09:18 PM Subject:Re: [ClusterLabs] Pacemaker log showing time mismatch after Sent by:"Users" On Mon, 2019-01-28 at 18:04 +0530, Dileep V Nair wrote: > Hi, > > I am seeing that there is a log entry showing Recheck Timer popped > and the time in pacemaker.log went back in time. After sometime, the > time issue Around the same time the resources also failed over (Slave > became master). Do anyone know why this behavior ? > > Jan 23 01:16:48 [9383] pn4ushleccp1 lrmd: notice: operation_finished: > db_cp1_monitor_2:32476:stderr [ /usr/bin/.: Permission denied. ] > Jan 23 01:16:48 [9383] pn4ushleccp1 lrmd: notice: operation_finished: > db_cp1_monitor_2:32476:stderr [ /usr/bin/.: Permission denied. ] > Jan 22 20:17:03 [9386] pn4ushleccp1 crmd: info: crm_timer_popped: > PEngine Recheck Timer (I_PE_CALC) just popped (90ms) Pacemaker can handle the clock jumping forward, but not backward. The recheck timer here is unrelated to the clock jump, it's just the first log message to appear since it jumped. You definitely want to find out what's changing the clock. If this is at system boot, likely the hardware clock is wrong and some time manager (ntp, etc.) is adjusting it. Pacemaker's systemd unit file has "After=time-sync.target" to try to ensure that it doesn't start until after this has happened, but unfortunately you often have to take extra steps to make time managers use that target (e.g. enable chronyd- wait.service if you're using chronyd), and of course if you're not using systemd it's not any help. But the basic idea is you want to ensure pacemaker starts after the time has been adjusted at boot. If this isn't at boot, then your host has something weird going on. Check the system log around the time of the jump, etc. > Jan 22 20:17:03 [9386] pn4ushleccp1 crmd: notice: > do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE | > input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped > Jan 22 20:17:03 [9386] pn4ushleccp1 crmd: info: do_state_transition: > Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED > Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: > process_pe_message: Input has not changed since last time, not saving > to disk > Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: notice: unpack_config: > Relying on watchdog integration for fencing > Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: > determine_online_status_fencing: Node pn4us7leccp1 is active > Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: > determine_online_status: Node pn4us7leccp1 is online > Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: > determine_online_status_fencing: Node pn4ushleccp1 is active > Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: > determine_online_status: Node pn4ushleccp1 is online > Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: > determine_op_status: Operation monitor found resource db_cp1:0 active > on pn4us7leccp1 > Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: > determine_op_status: Operation monitor found resource TSM_DB2 active > on pn4us7leccp1 > Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: > determine_op_status: Operation monitor found resource TSM_DB2 active > on pn4us7leccp1 > Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: > determine_op_status: Operation monitor found resource ip_cp1 active > on pn4ushleccp1 > Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: > determine_op_status: Operation monitor found resource db_cp1:1 active > in master mode on pn4ushleccp1 > Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: > determine_op_status: Operation monitor found resource TSM_DB2log > active on pn4ushleccp1 > Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: > determine_op_status: Operation monitor found resource KUD_DB2 active > on pn4ushleccp1 > Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: native_print: > stonith-sbd (stonith:external/sbd): Started pn4ushleccp1 > Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: native_print: > ip_cp1 (ocf::heartbeat:IPaddr2): Started pn4us7leccp1 > Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: clone_print: > Master/Slave Set: ms_db2_cp1 [db_cp1] > Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: short_print: > Masters: [ pn4us7leccp1 ] > Jan 22 20:17:03 [9385] p
[ClusterLabs] Pacemaker log showing time mismatch after
Hi, I am seeing that there is a log entry showing Recheck Timer popped and the time in pacemaker.log went back in time. After sometime, the time issue Around the same time the resources also failed over (Slave became master). Do anyone know why this behavior ? Jan 23 01:16:48 [9383] pn4ushleccp1 lrmd: notice: operation_finished: db_cp1_monitor_2:32476:stderr [ /usr/bin/.: Permission denied. ] Jan 23 01:16:48 [9383] pn4ushleccp1 lrmd: notice: operation_finished: db_cp1_monitor_2:32476:stderr [ /usr/bin/.: Permission denied. ] Jan 22 20:17:03 [9386] pn4ushleccp1 crmd: info: crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped (90ms) Jan 22 20:17:03 [9386] pn4ushleccp1 crmd: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped Jan 22 20:17:03 [9386] pn4ushleccp1 crmd: info: do_state_transition: Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: process_pe_message: Input has not changed since last time, not saving to disk Jan 22 20:17:03 [9385] pn4ushleccp1pengine: notice: unpack_config: Relying on watchdog integration for fencing Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: determine_online_status_fencing: Node pn4us7leccp1 is active Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: determine_online_status: Node pn4us7leccp1 is online Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: determine_online_status_fencing: Node pn4ushleccp1 is active Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: determine_online_status: Node pn4ushleccp1 is online Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: determine_op_status: Operation monitor found resource db_cp1:0 active on pn4us7leccp1 Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: determine_op_status: Operation monitor found resource TSM_DB2 active on pn4us7leccp1 Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: determine_op_status: Operation monitor found resource TSM_DB2 active on pn4us7leccp1 Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: determine_op_status: Operation monitor found resource ip_cp1 active on pn4ushleccp1 Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: determine_op_status: Operation monitor found resource db_cp1:1 active in master mode on pn4ushleccp1 Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: determine_op_status: Operation monitor found resource TSM_DB2log active on pn4ushleccp1 Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: determine_op_status: Operation monitor found resource KUD_DB2 active on pn4ushleccp1 Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: native_print: stonith-sbd (stonith:external/sbd): Started pn4ushleccp1 Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: native_print: ip_cp1 (ocf::heartbeat:IPaddr2): Started pn4us7leccp1 Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: clone_print: Master/Slave Set: ms_db2_cp1 [db_cp1] Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: short_print: Masters: [ pn4us7leccp1 ] Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: short_print: Slaves: [ pn4ushleccp1 ] Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: native_print: TSM_DB2 (systemd:dsmcad_db2): Started pn4us7leccp1 Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: native_print: TSM_DB2log (systemd:dsmcad_db2log):Started pn4us7leccp1 Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: native_print: KUD_DB2 (systemd:kuddb2_db2): Started pn4us7leccp1 Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: rsc_merge_weights:ms_db2_cp1: Breaking dependency loop at ms_db2_cp1 Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: master_color: Promoting db_cp1:0 (Master pn4us7leccp1) Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: master_color: ms_db2_cp1: Promoted 1 instances of a possible 1 to master Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: LogActions: Leave ip_cp1 (Started pn4us7leccp1) After the transition, the date was shifted back to normal Jan 22 20:47:03 [9386] pn4ushleccp1 crmd: info: do_log: Input I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd Jan 22 20:47:03 [9386] pn4ushleccp1 crmd: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd Jan 23 01:47:22 [9383] pn4ushleccp1 lrmd: notice: operation_finished: db_cp1_monitor_2:19518:stderr [ /usr/bin/.: Permission denied. ] Jan 23 01:47:22 [9383] pn4ushleccp1 lrmd: notice: operation_finished: db_cp1_monitor_2:19518:stderr [ /usr/bin/.: Permission denied. ] Thanks & Regards Dileep Nair Squad Lead - SAP Base Togaf Certified
[ClusterLabs] Anyone have a document on how to configure VMWare fencing on Suse Linux
Hi, I am using pacemaker for my clusters and shared sbd disk as the Stonith mechanism. Now I have an issue because I am using VMWare SRM for DR and that does not support shared disk. So I am thinking of configuring external/vcenter as the stonith mechanism. Is there any document which I can refer for configuring the same. Is there some specific settings / configurations to be done on the vcenter to do this. Any help is highly appreciated. Thanks & Regards Dileep Nair ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Floating IP active in both nodes
Hello Gabriel, I have a similar cluster configuration running fine. I am using the virtual IP to NFS mount a filesystem from Node 1 to node 2. The differences I could see from your configuration .. > primitive site_one_ip ocf:heartbeat:IPaddr \ > params ip="192.168.2.200" cidr_netmask="255.255.252.0" nic="eth0" \ > op monitor interval="40s" timeout="20s" I use ocf:heartbeat:IPaddr2 I have given only the IP parameter, no netmask and nic I have a virtual hostname associated with the IP addr using /etc/hosts and use the virtual hostname to connect. Thanks & Regards Dileep Nair Squad Lead - SAP Base IBM Services for Managed Applications +91 98450 22258 Mobile dilen...@in.ibm.com IBM Services From: Gabriel Buades To: users@clusterlabs.org Date: 10/26/2018 06:34 PM Subject:Re: [ClusterLabs] Floating IP active in both nodes Sent by:"Users" Hello Andrei. I did not add lvs_support at first. I added it later when noticed the problem to test if something changes, but I got same result Gabriel El vie., 26 oct. 2018 a las 11:47, Andrei Borzenkov () escribió: 26.10.2018 11:14, Gabriel Buades пишет: > Dear cluster labs team. > > I previously configured a two nodes cluster with replicated maria Db. > To use one database as the active, and the other one as failover, I > configured a cluster using heartbeat: > > root@logpmgid01v:~$ sudo crm configure show > node $id="59bbdb76-be67-4be0-aedb-9e27d65f371e" logpmgid01v > node $id="adbc5972-c491-4fc4-b87d-8170e1b2d4d0" logpmgid02v \ > attributes standby="off" > primitive site_one_ip ocf:heartbeat:IPaddr \ > params ip="192.168.2.200" cidr_netmask="255.255.252.0" nic="eth0" \ > op monitor interval="40s" timeout="20s" > location site_one_ip_pref site_one_ip 100: logpmgid01v > property $id="cib-bootstrap-options" \ > dc-version="1.1.10-42f2063" \ > cluster-infrastructure="heartbeat" \ > stonith-enabled="false" > > Now, I've done a similar setup using corosync: > root@908soffid02:~# crm configure show > node 1: 908soffid01 > node 2: 908soffid02 > primitive site_one_ip IPaddr \ > params ip=10.6.12.118 cidr_netmask=255.255.0.0 nic=ens160 lvs_support=true \ What is the reason you added lvs_support? Previous configuration did not have it. > meta target-role=Started is-managed=true > location cli-prefer-site_one_ip site_one_ip role=Started inf: 908soffid01 > location site_one_ip_pref site_one_ip 100: 908soffid01 > property cib-bootstrap-options: \ > have-watchdog=false \ > dc-version=1.1.14-70404b0 \ > cluster-infrastructure=corosync \ > cluster-name=debian \ > stonith-enabled=false \ > no-quorum-policy=ignore \ > maintenance-mode=false > > Apparently, it works fine, and floating IP address is active in node1: > root@908soffid02:~# crm_mon -1 > Last updated: Fri Oct 26 10:06:12 2018 Last change: Fri Oct 26 10:02:53 > 2018 by root via cibadmin on 908soffid02 > Stack: corosync > Current DC: 908soffid01 (version 1.1.14-70404b0) - partition with quorum > 2 nodes and 1 resource configured > > Online: [ 908soffid01 908soffid02 ] > > site_one_ip (ocf::heartbeat:IPaddr): Started 908soffid01 > > But when node2 tries to connect to the floating IP address, it gets > connected to itself, despite the IP address is bound to the first node: > root@908soffid02:~# ssh root@10.6.12.118 hostname > root@soffiddb's password: > 908soffid02 > > I'd want the second node connects to the actual floating IP address, but I > cannot see how to set it up. Any help is welcome. > > I am using pacemaker 1.1.14-2ubuntu1.4 and corosync 2.3.5-3ubuntu2.1 > > Kind regards. > > > > Gabriel Buades > > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Encrypted passwords for Resource Agent Scripts
Hi, I have written heartbeat resource agent scripts for Oracle and Sybase. Both the scripts take user passwords as parameters. Is there a way to do some encryption for the passwords so that the plain text passwords are not visible from the primitive also. Thanks & Regards Dileep Nair Squad Lead - SAP Base IBM Services for Managed Applications +91 98450 22258 Mobile dilen...@in.ibm.com IBM Services ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Pacemaker not restarting Resource on same node
Hi, I have a cluster with DB2 running in HADR mode. I have used the db2 resource agent. My problem is whenever DB2 fails on primary it is migrating to the secondary node. Ideally it should restart thrice (Migration Threshold set to 3) but not happening. This is causing extra downtime for customer. Is there any other settings / parameters which needs to be set. Did anyone face similar issue ? I am on pacemaker version 1.1.15-21.1. Dileep V Nair dilen...@in.ibm.com IBM Services ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Sybase HADR Resource Agent
Thanks for the response. I tried that but I think that does not take care of HADR setup. Regards, Dileep V Nair E-mail: dilen...@in.ibm.com Outer Ring Road, Embassy Manya Bangalore, KA 560045 India From: Oyvind Albrigtsen <oalbr...@redhat.com> To: Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org> Date: 02/22/2018 04:07 PM Subject:Re: [ClusterLabs] Sybase HADR Resource Agent Sent by:"Users" <users-boun...@clusterlabs.org> On 21/02/18 21:41 +0530, Dileep V Nair wrote: > > >Hi, > >I am trying to configure Pacemaker to automate a Sybase HADR setup. >Is anyone aware of a Resource Agent which I can use for this. There's a Sybase ASE agent available at: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ClusterLabs_resource-2Dagents_pull_=DwICAg=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=hjmR7GlCov4nPXe6oVwUNr_nvQTpUN0ZrmBeVoUH5bA=s01uAps4uLnSIv5wPvqkx01hT-DLo0oKhsgS4rENrY8= > > > > Regards, > > Dileep V Nair > Senior AIX Administrator > Cloud Managed Services Delivery (MSD), India > IBM Cloud > > > > > E-mail: dilen...@in.ibm.com Outer Ring Road, Embassy Manya > Bangalore, KA 560045 > India > > > > >___ >Users mailing list: Users@clusterlabs.org > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.clusterlabs.org_mailman_listinfo_users=DwICAg=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=hjmR7GlCov4nPXe6oVwUNr_nvQTpUN0ZrmBeVoUH5bA=FU_w0ezHasAFDADCNzp2JNnJyMIa97O-tlImRxSn5BA= > >Project Home: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.clusterlabs.org=DwICAg=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=hjmR7GlCov4nPXe6oVwUNr_nvQTpUN0ZrmBeVoUH5bA=qccrDiQZli1T14ikdlW8lzu12WI5HzI0Hp9C-B4Vj74= >Getting started: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.clusterlabs.org_doc_Cluster-5Ffrom-5FScratch.pdf=DwICAg=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=hjmR7GlCov4nPXe6oVwUNr_nvQTpUN0ZrmBeVoUH5bA=j8-FpzXutLFPWK4Bid6flXthTp-K7xo2b_SJkrweyss= >Bugs: https://urldefense.proofpoint.com/v2/url?u=http-3A__bugs.clusterlabs.org=DwICAg=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=hjmR7GlCov4nPXe6oVwUNr_nvQTpUN0ZrmBeVoUH5bA=58i7pXq9ISFRfRVk1QpNamLkC9Age55QpdnxOxbA1FY= ___ Users mailing list: Users@clusterlabs.org https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.clusterlabs.org_mailman_listinfo_users=DwICAg=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=hjmR7GlCov4nPXe6oVwUNr_nvQTpUN0ZrmBeVoUH5bA=FU_w0ezHasAFDADCNzp2JNnJyMIa97O-tlImRxSn5BA= Project Home: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.clusterlabs.org=DwICAg=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=hjmR7GlCov4nPXe6oVwUNr_nvQTpUN0ZrmBeVoUH5bA=qccrDiQZli1T14ikdlW8lzu12WI5HzI0Hp9C-B4Vj74= Getting started: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.clusterlabs.org_doc_Cluster-5Ffrom-5FScratch.pdf=DwICAg=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=hjmR7GlCov4nPXe6oVwUNr_nvQTpUN0ZrmBeVoUH5bA=j8-FpzXutLFPWK4Bid6flXthTp-K7xo2b_SJkrweyss= Bugs: https://urldefense.proofpoint.com/v2/url?u=http-3A__bugs.clusterlabs.org=DwICAg=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=hjmR7GlCov4nPXe6oVwUNr_nvQTpUN0ZrmBeVoUH5bA=58i7pXq9ISFRfRVk1QpNamLkC9Age55QpdnxOxbA1FY= ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Sybase HADR Resource Agent
Hi, I am trying to configure Pacemaker to automate a Sybase HADR setup. Is anyone aware of a Resource Agent which I can use for this. Regards, Dileep V Nair Senior AIX Administrator Cloud Managed Services Delivery (MSD), India IBM Cloud E-mail: dilen...@in.ibm.com Outer Ring Road, Embassy Manya Bangalore, KA 560045 India ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Issues with DB2 HADR Resource Agent
Hello Ondrej, I am still having issues with my DB2 HADR on Pacemaker. When I do a db2_kill on Primary for testing, initially it does a restart of DB2 on the same node. But if I let it run for some days and then try the same test, it goes into fencing and then reboots the Primary Node. I am not sure how exactly it should behave in case my DB2 crashes on Primary. Also if I crash the Node 1 (the node itself, not only DB2), it promotes Node 2 to Primary, but once the Pacemaker is started again on Node 1, the DB on Node 1 is also promoted to Primary. Is that expected behaviour ? Regards, Dileep V Nair Senior AIX Administrator Cloud Managed Services Delivery (MSD), India IBM Cloud E-mail: dilen...@in.ibm.com Outer Ring Road, Embassy Manya Bangalore, KA 560045 India From: Ondrej Famera <ofam...@redhat.com> To: Dileep V Nair <dilen...@in.ibm.com> Cc: Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org> Date: 02/12/2018 11:46 AM Subject:Re: [ClusterLabs] Issues with DB2 HADR Resource Agent On 02/01/2018 07:24 PM, Dileep V Nair wrote: > Thanks Ondrej for the response. I have set the PEER_WINDOWto 1000 which > I guess is a reasonable value. What I am noticing is it does not wait > for the PEER_WINDOW. Before that itself the DB goes into a > REMOTE_CATCHUP_PENDING state and Pacemaker give an Error saying a DB in > STANDBY/REMOTE_CATCHUP_PENDING/DISCONNECTED can never be promoted. > > > Regards, > > *Dileep V Nair* Hi Dileep, sorry for later response. The DB2 should not get into the 'REMOTE_CATCHUP' phase or the DB2 resource agent will indeed not promote. From my experience it usually gets into that state when the DB2 on standby was restarted during or after PEER_WINDOW timeout. When the primary DB2 fails then standby should end up in some state that would match the one on line 770 of DB2 resource agent and the promote operation is attempted. 770 STANDBY/*PEER/DISCONNECTED|Standby/DisconnectedPeer) https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ClusterLabs_resource-2Dagents_blob_master_heartbeat_db2-23L770=DwIDBA=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=dhvUwjWghTBfDEHmzU3P5eaU9Ce3DkCRdRPNd71L1bU=3vPiNA4KGdZzc0xJOYv5hMCObjWdlxZDO_bLb86YaGM= The DB2 on standby can get restarted when the 'promote' operation times out, so you can try increasing the 'promote' timeout to something higher if this was the case. So if you see that DB2 was restarted after Primary failed, increase the promote timeout. If DB2 was not restarted then question is why DB2 has decided to change the status in this way. Let me know if above helped. -- Ondrej Faměra @Red Hat ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Issues with DB2 HADR Resource Agent
Thanks Ondrej for the response. I also figured out the same and reduced the HADR_TIMEOUT and increased the promote timeout which helped in resolving the issue. Regards, Dileep V Nair Senior AIX Administrator Cloud Managed Services Delivery (MSD), India IBM Cloud E-mail: dilen...@in.ibm.com Outer Ring Road, Embassy Manya Bangalore, KA 560045 India From: Ondrej Famera <ofam...@redhat.com> To: Dileep V Nair <dilen...@in.ibm.com> Cc: Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org> Date: 02/12/2018 11:46 AM Subject:Re: [ClusterLabs] Issues with DB2 HADR Resource Agent On 02/01/2018 07:24 PM, Dileep V Nair wrote: > Thanks Ondrej for the response. I have set the PEER_WINDOWto 1000 which > I guess is a reasonable value. What I am noticing is it does not wait > for the PEER_WINDOW. Before that itself the DB goes into a > REMOTE_CATCHUP_PENDING state and Pacemaker give an Error saying a DB in > STANDBY/REMOTE_CATCHUP_PENDING/DISCONNECTED can never be promoted. > > > Regards, > > *Dileep V Nair* Hi Dileep, sorry for later response. The DB2 should not get into the 'REMOTE_CATCHUP' phase or the DB2 resource agent will indeed not promote. From my experience it usually gets into that state when the DB2 on standby was restarted during or after PEER_WINDOW timeout. When the primary DB2 fails then standby should end up in some state that would match the one on line 770 of DB2 resource agent and the promote operation is attempted. 770 STANDBY/*PEER/DISCONNECTED|Standby/DisconnectedPeer) https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ClusterLabs_resource-2Dagents_blob_master_heartbeat_db2-23L770=DwIDBA=jf_iaSHvJObTbx-siA1ZOg=syjI0TzCX7--Qy0vFS1xy17vob_50Cur84Jg-YprJuw=dhvUwjWghTBfDEHmzU3P5eaU9Ce3DkCRdRPNd71L1bU=3vPiNA4KGdZzc0xJOYv5hMCObjWdlxZDO_bLb86YaGM= The DB2 on standby can get restarted when the 'promote' operation times out, so you can try increasing the 'promote' timeout to something higher if this was the case. So if you see that DB2 was restarted after Primary failed, increase the promote timeout. If DB2 was not restarted then question is why DB2 has decided to change the status in this way. Let me know if above helped. -- Ondrej Faměra @Red Hat ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Issues with DB2 HADR Resource Agent
Thanks Ondrej for the response. I have set the PEER_WINDOW to 1000 which I guess is a reasonable value. What I am noticing is it does not wait for the PEER_WINDOW. Before that itself the DB goes into a REMOTE_CATCHUP_PENDING state and Pacemaker give an Error saying a DB in STANDBY/REMOTE_CATCHUP_PENDING/DISCONNECTED can never be promoted. Regards, Dileep V Nair Senior AIX Administrator Cloud Managed Services Delivery (MSD), India IBM Cloud E-mail: dilen...@in.ibm.com Outer Ring Road, Embassy Manya Bangalore, KA 560045 India From: Ondrej Famera <ofam...@redhat.com> To: Dileep V Nair <dilen...@in.ibm.com> Cc: Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org> Date: 02/01/2018 02:48 PM Subject:Re: [ClusterLabs] Issues with DB2 HADR Resource Agent On 02/01/2018 05:57 PM, Dileep V Nair wrote: > Now the second issue I am facing is that when I crash the node were DB > is primary, the STANDBY DB is not getting promoted to PRIMARY. I could > fix that by adding below lines in db2_promote() > > 773 *) > 774 # must take over forced > 775 force="by force" > 776 > 777 ;; > > But I am not sure of the implications that this can cause. > > Can someone suggest whether what I am doing is correct OR will this lead > to any Data loss. Hi Dileep, As for the 'by force' implications you may check the documentation on what it brings. In short: the data can get corrupted. https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0011553.html#r0011553__byforce The original 'by force peer window only' is limiting the takeover to period when DB2 is within PEER_WINDOW which gives a bit more safety. (the table in link above also explains how much safer it is) Instead of changing the resource agent I would rather suggest checking the PEER_WINDOW and HADR_TIMEOUT variables in DB2. They determine how long it is possible to do takeover 'by force peer window only'. -- Ondrej Faměra @Red Hat ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Issues with DB2 HADR Resource Agent
Hi, I am facing multiple issues with the DB2 Resource Agent. The first issue was that when I start Pacemaker, DB2 is also started and immediately after that DB2 gets stopped. I fixed that by removing \ before $ in Line numbers 686, 689 and 691. 684 CMD="if db2 connect to $db; 685 then 686 db2 select * from sysibm.sysversions ; rc=$?; 687 db2 terminate; 688 else 689 rc=$?; 690 fi; 691 exit $rc" Now the second issue I am facing is that when I crash the node were DB is primary, the STANDBY DB is not getting promoted to PRIMARY. I could fix that by adding below lines in db2_promote() 773 *) 774 # must take over forced 775 force="by force" 776 777 ;; But I am not sure of the implications that this can cause. Can someone suggest whether what I am doing is correct OR will this lead to any Data loss. Regards, Dileep V Nair Senior AIX Administrator Cloud Managed Services Delivery (MSD), India IBM Cloud E-mail: dilen...@in.ibm.com Outer Ring Road, Embassy Manya Bangalore, KA 560045 India ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org