[ClusterLabs] Unable to perform resource failover.
Hi All, I am new in pacemaker corosync. I have created a simple environment with 2 nodes(Active/Passive) having 2 resources. Resources: One resource is added on VIP. Other resource is added as Httpd apache service. [root@node1 ~]# pcs resource show Httpd Resource: Httpd (class=ocf provider=heartbeat type=apache) Attributes: configfile=/etc/httpd/conf/httpd.conf Operations: monitor interval=30s (Httpd-monitor-interval-30s) start interval=0s timeout=40s (Httpd-start-interval-0s) stop interval=0s timeout=60s (Httpd-stop-interval-0s) [root@node1 ~]# pcs resource show Cluster_VIP Resource: Cluster_VIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: cidr_netmask=32 ip=10.0.4.99 Operations: monitor interval=20s (Cluster_VIP-monitor-interval-20s) start interval=0s timeout=20s (Cluster_VIP-start-interval-0s) stop interval=0s timeout=20s (Cluster_VIP-stop-interval-0s) [root@node1 ~]# pcs status Cluster name: Cluster Stack: corosync Current DC: node2 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum Last updated: Tue Nov 7 15:09:40 2017 Last change: Tue Nov 7 15:03:22 2017 by root via cibadmin on node1 2 nodes configured 2 resources configured Online: [ node1 node2 ] Full list of resources: Cluster_VIP(ocf::heartbeat:IPaddr2): Started node1 Httpd (ocf::heartbeat:apache):Started node1 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled To check and kill process ID(pid) of httpd by using command: * ps -aef | grep httpd [root@node1 ~]# ps -aef | grep httpd root 4392 1 0 15:03 ?00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid apache4393 4392 0 15:03 ?00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid apache4394 4392 0 15:03 ?00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid apache4395 4392 0 15:03 ?00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid apache4396 4392 0 15:03 ?00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid apache4397 4392 0 15:03 ?00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid [root@node1 ~]# kill -9 4392 I am trying to do resource failover by killing pid of httpd. Observation: I observed that resource failover is not happing after killing the pid. Status of resource(Httpd) remain started on node1. We don't want to use resource move "pcs resource move Httpd" and resource disable"pcs resource disable httpd" command for this. Query: What is the issue in our approach ? How we can achieve a resources failover? Further I will use this environment for testing the migration-threshold. Any suggestions regarding this also welcome. TIA Regards, Garima ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Unable to perform resource failover.
> I am trying to do resource failover by killing pid of httpd. > > Observation: > > I observed that resource failover is not happing after killing the pid. > Status of resource(Httpd) remain started on node1. > > We don’t want to use resource move ”pcs resource move Httpd” and resource > disable”pcs resource disable httpd” command for this. > There are many things to check. First of all, check if the service is being restarted by systemd or another process manager. Regards, Alberto Mijares ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Unable to perform resource failover.
Hi, >> There are many things to check. First of all, check if the service is being >> restarted by systemd or another process manager. We restarted the systemd and other process by using command mentioned below and also restarted the cluster nodes: Systemctl restart httpd.service Systemctl restart pacemaker.service Systemctl restart corosync.service Systemctl restart pcsd.service Does this impact in cluster? Regards, Garima -Original Message- From: Alberto Mijares [mailto:amijar...@gmail.com] Sent: 07 November 2017 16:25 To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] Unable to perform resource failover. > I am trying to do resource failover by killing pid of httpd. > > Observation: > > I observed that resource failover is not happing after killing the pid. > Status of resource(Httpd) remain started on node1. > > We don’t want to use resource move ”pcs resource move Httpd” and > resource disable”pcs resource disable httpd” command for this. > There are many things to check. First of all, check if the service is being restarted by systemd or another process manager. Regards, Alberto Mijares ___ Users mailing list: Users@clusterlabs.org<mailto:Users@clusterlabs.org> http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Unable to perform resource failover.
> > We restarted the systemd and other process by using command mentioned below > and also restarted the cluster nodes: > > > > Systemctl restart httpd.service > > Systemctl restart pacemaker.service > > Systemctl restart corosync.service > > Systemctl restart pcsd.service > > > > Does this impact in cluster? > You should systemctl disable httpd.service && systemctl stop httpd.service in all nodes of the cluster where Apache is installed. Then, start the resource by using any utility, for example pcs resource enable Httpd If no more specific configurations have been done, that should start Apache. Then, test again killing it. Regards, Alberto Mijares ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Unable to perform resource failover.
Hi , >> systemctl disable httpd.service && systemctl stop httpd.service in all nodes >> of the cluster where Apache is installed Executed both the command on nodes . Output is given below. [root@node2 ~]# systemctl disable httpd.service [root@node2 ~]# systemctl stop httpd.service [root@node2 ~]# systemctl status httpd.service ● httpd.service - The Apache HTTP Server Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: disabled) Active: inactive (dead) Docs: man:httpd(8) man:apachectl(8) Nov 07 16:59:17 node1 systemd[1]: Starting The Apache HTTP Server... Nov 07 16:59:17 node1 httpd[15462]: AH00558: httpd: Could not reliably determine the server's fully qualified domain na...message Nov 07 16:59:17 node1 systemd[1]: Started The Apache HTTP Server. Nov 07 16:59:37 node1 systemd[1]: Stopping The Apache HTTP Server... Nov 07 16:59:38 node1 systemd[1]: Stopped The Apache HTTP Server. Nov 07 17:01:10 node1 systemd[1]: Starting The Apache HTTP Server... Nov 07 17:01:10 node1 httpd[15720]: AH00558: httpd: Could not reliably determine the server's fully qualified domain na...message Nov 07 17:01:10 node1 systemd[1]: Started The Apache HTTP Server. Nov 07 17:02:14 node1 systemd[1]: Stopping The Apache HTTP Server... Nov 07 17:02:15 node1 systemd[1]: Stopped The Apache HTTP Server. [root@node2 ~]# pcs resource enable Httpd [root@node2 ~]# pcs status Cluster name: Cluster Stack: corosync Current DC: node2 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum Last updated: Tue Nov 7 17:06:58 2017 Last change: Tue Nov 7 16:02:05 2017 by root via crm_resource on node1 2 nodes configured 2 resources configured Online: [ node1 node2 ] Full list of resources: Cluster_VIP(ocf::heartbeat:IPaddr2): Started node2 Httpd (ocf::heartbeat:apache):Stopped Failed Actions: * Httpd_monitor_3 on node1 'not running' (7): call=21, status=complete, exitreason='none', last-rc-change='Tue Nov 7 15:26:27 2017', queued=0ms, exec=0ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled There is no change in resource status. TIA Regards, Garima -Original Message- From: Alberto Mijares [mailto:amijar...@gmail.com] Sent: 07 November 2017 16:53 To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] Unable to perform resource failover. > > We restarted the systemd and other process by using command mentioned > below and also restarted the cluster nodes: > > > > Systemctl restart httpd.service > > Systemctl restart pacemaker.service > > Systemctl restart corosync.service > > Systemctl restart pcsd.service > > > > Does this impact in cluster? > You should systemctl disable httpd.service && systemctl stop httpd.service in all nodes of the cluster where Apache is installed. Then, start the resource by using any utility, for example pcs resource enable Httpd If no more specific configurations have been done, that should start Apache. Then, test again killing it. Regards, Alberto Mijares ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Unable to perform resource failover.
On Tue, 2017-11-07 at 10:30 +, Garima wrote: > Hi All, > > I am new in pacemaker corosync. > > I have created a simple environment with 2 nodes(Active/Passive) > having 2 resources. > Resources: > One resource is added on VIP. > Other resource is added as Httpd apache service. > > [root@node1 ~]# pcs resource show Httpd > Resource: Httpd (class=ocf provider=heartbeat type=apache) > Attributes: configfile=/etc/httpd/conf/httpd.conf > Operations: monitor interval=30s (Httpd-monitor-interval-30s) > start interval=0s timeout=40s (Httpd-start-interval-0s) > stop interval=0s timeout=60s (Httpd-stop-interval-0s) > [root@node1 ~]# pcs resource show Cluster_VIP > Resource: Cluster_VIP (class=ocf provider=heartbeat type=IPaddr2) > Attributes: cidr_netmask=32 ip=10.0.4.99 > Operations: monitor interval=20s (Cluster_VIP-monitor-interval-20s) > start interval=0s timeout=20s (Cluster_VIP-start- > interval-0s) > stop interval=0s timeout=20s (Cluster_VIP-stop- > interval-0s) > > [root@node1 ~]# pcs status > Cluster name: Cluster > Stack: corosync > Current DC: node2 (version 1.1.16-12.el7_4.4-94ff4df) - partition > with quorum > Last updated: Tue Nov 7 15:09:40 2017 > Last change: Tue Nov 7 15:03:22 2017 by root via cibadmin on node1 > 2 nodes configured > 2 resources configured > Online: [ node1 node2 ] > Full list of resources: > Cluster_VIP (ocf::heartbeat:IPaddr2): Started node1 > Httpd (ocf::heartbeat:apache): Started node1 > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > To check and kill process ID(pid) of httpd by using command: > · ps –aef | grep httpd > > [root@node1 ~]# ps -aef | grep httpd > root 4392 1 0 15:03 ? 00:00:00 /sbin/httpd -DSTATUS > -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid > apache 4393 4392 0 15:03 ? 00:00:00 /sbin/httpd -DSTATUS > -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid > apache 4394 4392 0 15:03 ? 00:00:00 /sbin/httpd -DSTATUS > -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid > apache 4395 4392 0 15:03 ? 00:00:00 /sbin/httpd -DSTATUS > -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid > apache 4396 4392 0 15:03 ? 00:00:00 /sbin/httpd -DSTATUS > -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid > apache 4397 4392 0 15:03 ? 00:00:00 /sbin/httpd -DSTATUS > -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid > > [root@node1 ~]# kill -9 4392 > > I am trying to do resource failover by killing pid of httpd. > Observation: > I observed that resource failover is not happing after killing the > pid. Status of resource(Httpd) remain started on node1. > We don’t want to use resource move ”pcs resource move Httpd” and > resource disable”pcs resource disable httpd” command for this. > > Query: > What is the issue in our approach ? Pacemaker's default recovery behavior for service failures is not failover, but restart. Chances are, pacemaker restarted httpd in the above situation, and the outage was short enough that you didn't notice it. You could check the pid of httpd afterward to see if it's the same or a new one. As discussed elsewhere in this thread, you also want to make sure that your operating system is not managing the httpd process (via systemd, upstart, lsb init, etc.). > How we can achieve a resources failover? migration-threshold=1 > > Further I will use this environment for testing the migration- > threshold. > Any suggestions regarding this also welcome. > > TIA > > Regards, > Garima -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org