Re: [Pacemaker] [ha-wg] [ha-wg-technical] [Linux-HA] [RFC] Organizing HA Summit 2015
On 11/5/2014 4:16 PM, Lars Ellenberg wrote: On Sat, Nov 01, 2014 at 01:19:35AM -0400, Digimer wrote: All the cool kids will be there. You want to be a cool kid, right? Well, no. ;-) But I'll still be there, and a few other Linbit'ers as well. Fabio, let us know what we could do to help make it happen. I appreciate the offer. Assuming we achieve quorum to do the event, I´d say that I´ll take of the meeting rooms/hotel logistics and one lunch and learn pizza event. It would be nice if others could organize a dinner event. Cheers Fabio Lars On 01/11/14 01:06 AM, Fabio M. Di Nitto wrote: just a kind reminder. On 9/8/2014 12:30 PM, Fabio M. Di Nitto wrote: All, it's been almost 6 years since we had a face to face meeting for all developers and vendors involved in Linux HA. I'd like to try and organize a new event and piggy-back with DevConf in Brno [1]. DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices. My suggestion would be to have a 2 days dedicated HA summit the 4th and the 5th of February. The goal for this meeting is to, beside to get to know each other and all social aspect of those events, tune the directions of the various HA projects and explore common areas of improvements. I am also very open to the idea of extending to 3 days, 1 one dedicated to customers/users and 2 dedicated to developers, by starting the 3rd. Thoughts? Fabio PS Please hit reply all or include me in CC just to make sure I'll see an answer :) [1] http://devconf.cz/ Could you please let me know by end of Nov if you are interested or not? I have heard only from few people so far. Cheers Fabio ___ ha-wg mailing list ha...@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/ha-wg ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Daemon Start attempt on wrong Server
You should use an opt out cluster. Set the cluster option symmetrical=false. This will tell corosync not to place a resource anywhere on the cluster, unless a location rule explicitly tell the cluster where it should run. Corosync will still monitor sql resources on www hosts and return rc 5 but this is expected and works. Le 11 nov. 2014 13:22, Hauke Homburg hhomb...@w3-creative.de a écrit : Hello, I am installing a 6 Node pacemaker CLuster. 3 Nodes for Apache, 3 Nodes for Postgres. My Cluster Config is node kvm-node1 node sql-node1 node sql-node2 node sql-node3 node www-node1 node www-node2 node www-node3 primitive pri_kvm_ip ocf:heartbeat:IPaddr2 \ params ip=10.0.6.41 cidr_netmask=255.255.255.0 \ op monitor interval=10s timeout=20s primitive pri_sql_ip ocf:heartbeat:IPaddr2 \ params ip=10.0.6.31 cidr_netmask=255.255.255.0 \ op monitor interval=10s timeout=20s primitive pri_www_ip ocf:heartbeat:IPaddr2 \ params ip=10.0.6.21 cidr_netmask=255.255.255.0 \ op monitor interval=10s timeout=20s primitive res_apache ocf:heartbeat:apache \ params configfile=/etc/apache2/apache2.conf \ op start interval=0 timeout=40 \ op stop interval=0 timeout=60 \ op monitor interval=60 timeout=120 start-delay=0 \ meta target-role=Started primitive res_pgsql ocf:heartbeat:pgsql \ params pgctl=/usr/lib/postgresql/9.1/bin/pg_ctl psql=/usr/bin/psql start_opt= pgdata=/var/lib/postgresql/9.1/main config=/etc/postgresql/9.1/main/postgresql.conf pgdba=postgres \ op start interval=0 timeout=120s \ op stop interval=0 timeout=120s \ op monitor interval=30s timeout=30s depth=0 location loc_kvm_ip_node1 pri_kvm_ip 10001: kvm-node1 location loc_sql_ip_node1 pri_sql_ip inf: sql-node1 location loc_sql_ip_node2 pri_sql_ip inf: sql-node2 location loc_sql_ip_node3 pri_sql_ip inf: sql-node3 location loc_sql_srv_node1 res_pgsql inf: sql-node1 location loc_sql_srv_node2 res_pgsql inf: sql-node2 location loc_sql_srv_node3 res_pgsql inf: sql-node3 location loc_www_ip_node1 pri_www_ip inf: www-node1 location loc_www_ip_node2 pri_www_ip inf: www-node2 location loc_www_ip_node3 pri_www_ip inf: www-node3 location loc_www_srv_node1 res_apache inf: www-node1 location loc_www_srv_node2 res_apache inf: www-node2 location loc_www_srv_node3 res_apache inf: www-node3 property $id=cib-bootstrap-options \ dc-version=1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff \ cluster-infrastructurFailed actions: Why do i see in crm_mon the following output? res_pgsql_start_0 (node=www-node1, call=16, rc=5, status=complete): not installed res_pgsql_start_0 (node=www-node2, call=13, rc=5, status=complete): not installed pri_www_ip_monitor_1 (node=www-node3, call=22, rc=7, status=complete): not running res_pgsql_start_0 (node=www-node3, call=13, rc=5, status=complete): not installed res_apache_start_0 (node=sql-node2, call=18, rc=5, status=complete): not installed res_pgsql_start_0 (node=sql-node2, call=12, rc=5, status=complete): not installed res_apache_start_0 (node=sql-node3, call=12, rc=5, status=complete): not installed res_pgsql_start_0 (node=sql-node3, call=10, rc=5, status=complete): not installed res_apache_start_0 (node=kvm-node1, call=12, rc=5, status=complete): not installed res_pgsql_start_0 (node=kvm-node1, call=20, rc=5, status=complete): not installede=openais \ expected-quorum-votes=7 \ stonith-enabled=false I set the infinity for pgsql on all 3 sql nodes, but not! on the www nodes. Why tries Pacemaker to start the Postgres SQL Server on the www Node? In example? Thank for your Help greetings Hauke ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Daemon Start attempt on wrong Server
Am 11.11.2014 13:34, schrieb Alexandre: You should use an opt out cluster. Set the cluster option symmetrical=false. This will tell corosync not to place a resource anywhere on the cluster, unless a location rule explicitly tell the cluster where it should run. Corosync will still monitor sql resources on www hosts and return rc 5 but this is expected and works. Le 11 nov. 2014 13:22, Hauke Homburg hhomb...@w3-creative.de mailto:hhomb...@w3-creative.de a écrit : Hello, I am installing a 6 Node pacemaker CLuster. 3 Nodes for Apache, 3 Nodes for Postgres. My Cluster Config is node kvm-node1 node sql-node1 node sql-node2 node sql-node3 node www-node1 node www-node2 node www-node3 primitive pri_kvm_ip ocf:heartbeat:IPaddr2 \ params ip=10.0.6.41 cidr_netmask=255.255.255.0 \ op monitor interval=10s timeout=20s primitive pri_sql_ip ocf:heartbeat:IPaddr2 \ params ip=10.0.6.31 cidr_netmask=255.255.255.0 \ op monitor interval=10s timeout=20s primitive pri_www_ip ocf:heartbeat:IPaddr2 \ params ip=10.0.6.21 cidr_netmask=255.255.255.0 \ op monitor interval=10s timeout=20s primitive res_apache ocf:heartbeat:apache \ params configfile=/etc/apache2/apache2.conf \ op start interval=0 timeout=40 \ op stop interval=0 timeout=60 \ op monitor interval=60 timeout=120 start-delay=0 \ meta target-role=Started primitive res_pgsql ocf:heartbeat:pgsql \ params pgctl=/usr/lib/postgresql/9.1/bin/pg_ctl psql=/usr/bin/psql start_opt= pgdata=/var/lib/postgresql/9.1/main config=/etc/postgresql/9.1/main/postgresql.conf pgdba=postgres \ op start interval=0 timeout=120s \ op stop interval=0 timeout=120s \ op monitor interval=30s timeout=30s depth=0 location loc_kvm_ip_node1 pri_kvm_ip 10001: kvm-node1 location loc_sql_ip_node1 pri_sql_ip inf: sql-node1 location loc_sql_ip_node2 pri_sql_ip inf: sql-node2 location loc_sql_ip_node3 pri_sql_ip inf: sql-node3 location loc_sql_srv_node1 res_pgsql inf: sql-node1 location loc_sql_srv_node2 res_pgsql inf: sql-node2 location loc_sql_srv_node3 res_pgsql inf: sql-node3 location loc_www_ip_node1 pri_www_ip inf: www-node1 location loc_www_ip_node2 pri_www_ip inf: www-node2 location loc_www_ip_node3 pri_www_ip inf: www-node3 location loc_www_srv_node1 res_apache inf: www-node1 location loc_www_srv_node2 res_apache inf: www-node2 location loc_www_srv_node3 res_apache inf: www-node3 property $id=cib-bootstrap-options \ dc-version=1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff \ cluster-infrastructurFailed actions: Why do i see in crm_mon the following output? res_pgsql_start_0 (node=www-node1, call=16, rc=5, status=complete): not installed res_pgsql_start_0 (node=www-node2, call=13, rc=5, status=complete): not installed pri_www_ip_monitor_1 (node=www-node3, call=22, rc=7, status=complete): not running res_pgsql_start_0 (node=www-node3, call=13, rc=5, status=complete): not installed res_apache_start_0 (node=sql-node2, call=18, rc=5, status=complete): not installed res_pgsql_start_0 (node=sql-node2, call=12, rc=5, status=complete): not installed res_apache_start_0 (node=sql-node3, call=12, rc=5, status=complete): not installed res_pgsql_start_0 (node=sql-node3, call=10, rc=5, status=complete): not installed res_apache_start_0 (node=kvm-node1, call=12, rc=5, status=complete): not installed res_pgsql_start_0 (node=kvm-node1, call=20, rc=5, status=complete): not installede=openais \ expected-quorum-votes=7 \ stonith-enabled=false I set the infinity for pgsql on all 3 sql nodes, but not! on the www nodes. Why tries Pacemaker to start the Postgres SQL Server on the www Node? In example? Thank for your Help greetings Hauke ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org mailto:Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org Hello Alexandre, Why can't i set the infinity for the SQL servernodes to start the SQL Daemon only on the sql-nodes? I thought that has to be all? Greetings Hauke
Re: [Pacemaker] Daemon Start attempt on wrong Server
В Tue, 11 Nov 2014 16:19:56 +0100 Hauke Homburg hhomb...@w3-creative.de пишет: Am 11.11.2014 13:34, schrieb Alexandre: You should use an opt out cluster. Set the cluster option symmetrical=false. This will tell corosync not to place a resource anywhere on the cluster, unless a location rule explicitly tell the cluster where it should run. Corosync will still monitor sql resources on www hosts and return rc 5 but this is expected and works. Le 11 nov. 2014 13:22, Hauke Homburg hhomb...@w3-creative.de mailto:hhomb...@w3-creative.de a écrit : Hello, I am installing a 6 Node pacemaker CLuster. 3 Nodes for Apache, 3 Nodes for Postgres. My Cluster Config is node kvm-node1 node sql-node1 node sql-node2 node sql-node3 node www-node1 node www-node2 node www-node3 primitive pri_kvm_ip ocf:heartbeat:IPaddr2 \ params ip=10.0.6.41 cidr_netmask=255.255.255.0 \ op monitor interval=10s timeout=20s primitive pri_sql_ip ocf:heartbeat:IPaddr2 \ params ip=10.0.6.31 cidr_netmask=255.255.255.0 \ op monitor interval=10s timeout=20s primitive pri_www_ip ocf:heartbeat:IPaddr2 \ params ip=10.0.6.21 cidr_netmask=255.255.255.0 \ op monitor interval=10s timeout=20s primitive res_apache ocf:heartbeat:apache \ params configfile=/etc/apache2/apache2.conf \ op start interval=0 timeout=40 \ op stop interval=0 timeout=60 \ op monitor interval=60 timeout=120 start-delay=0 \ meta target-role=Started primitive res_pgsql ocf:heartbeat:pgsql \ params pgctl=/usr/lib/postgresql/9.1/bin/pg_ctl psql=/usr/bin/psql start_opt= pgdata=/var/lib/postgresql/9.1/main config=/etc/postgresql/9.1/main/postgresql.conf pgdba=postgres \ op start interval=0 timeout=120s \ op stop interval=0 timeout=120s \ op monitor interval=30s timeout=30s depth=0 location loc_kvm_ip_node1 pri_kvm_ip 10001: kvm-node1 location loc_sql_ip_node1 pri_sql_ip inf: sql-node1 location loc_sql_ip_node2 pri_sql_ip inf: sql-node2 location loc_sql_ip_node3 pri_sql_ip inf: sql-node3 location loc_sql_srv_node1 res_pgsql inf: sql-node1 location loc_sql_srv_node2 res_pgsql inf: sql-node2 location loc_sql_srv_node3 res_pgsql inf: sql-node3 location loc_www_ip_node1 pri_www_ip inf: www-node1 location loc_www_ip_node2 pri_www_ip inf: www-node2 location loc_www_ip_node3 pri_www_ip inf: www-node3 location loc_www_srv_node1 res_apache inf: www-node1 location loc_www_srv_node2 res_apache inf: www-node2 location loc_www_srv_node3 res_apache inf: www-node3 property $id=cib-bootstrap-options \ dc-version=1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff \ cluster-infrastructurFailed actions: Why do i see in crm_mon the following output? res_pgsql_start_0 (node=www-node1, call=16, rc=5, status=complete): not installed res_pgsql_start_0 (node=www-node2, call=13, rc=5, status=complete): not installed pri_www_ip_monitor_1 (node=www-node3, call=22, rc=7, status=complete): not running res_pgsql_start_0 (node=www-node3, call=13, rc=5, status=complete): not installed res_apache_start_0 (node=sql-node2, call=18, rc=5, status=complete): not installed res_pgsql_start_0 (node=sql-node2, call=12, rc=5, status=complete): not installed res_apache_start_0 (node=sql-node3, call=12, rc=5, status=complete): not installed res_pgsql_start_0 (node=sql-node3, call=10, rc=5, status=complete): not installed res_apache_start_0 (node=kvm-node1, call=12, rc=5, status=complete): not installed res_pgsql_start_0 (node=kvm-node1, call=20, rc=5, status=complete): not installede=openais \ expected-quorum-votes=7 \ stonith-enabled=false I set the infinity for pgsql on all 3 sql nodes, but not! on the www nodes. Why tries Pacemaker to start the Postgres SQL Server on the www Node? In example? Thank for your Help greetings Hauke ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org mailto:Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting
Re: [Pacemaker] Daemon Start attempt on wrong Server
Am 11.11.2014 16:25, schrieb Andrei Borzenkov: В Tue, 11 Nov 2014 16:19:56 +0100 Hauke Homburghhomb...@w3-creative.de пишет: Am 11.11.2014 13:34, schrieb Alexandre: You should use an opt out cluster. Set the cluster option symmetrical=false. This will tell corosync not to place a resource anywhere on the cluster, unless a location rule explicitly tell the cluster where it should run. Corosync will still monitor sql resources on www hosts and return rc 5 but this is expected and works. Le 11 nov. 2014 13:22, Hauke Homburghhomb...@w3-creative.de mailto:hhomb...@w3-creative.de a écrit : Hello, I am installing a 6 Node pacemaker CLuster. 3 Nodes for Apache, 3 Nodes for Postgres. My Cluster Config is node kvm-node1 node sql-node1 node sql-node2 node sql-node3 node www-node1 node www-node2 node www-node3 primitive pri_kvm_ip ocf:heartbeat:IPaddr2 \ params ip=10.0.6.41 cidr_netmask=255.255.255.0 \ op monitor interval=10s timeout=20s primitive pri_sql_ip ocf:heartbeat:IPaddr2 \ params ip=10.0.6.31 cidr_netmask=255.255.255.0 \ op monitor interval=10s timeout=20s primitive pri_www_ip ocf:heartbeat:IPaddr2 \ params ip=10.0.6.21 cidr_netmask=255.255.255.0 \ op monitor interval=10s timeout=20s primitive res_apache ocf:heartbeat:apache \ params configfile=/etc/apache2/apache2.conf \ op start interval=0 timeout=40 \ op stop interval=0 timeout=60 \ op monitor interval=60 timeout=120 start-delay=0 \ meta target-role=Started primitive res_pgsql ocf:heartbeat:pgsql \ params pgctl=/usr/lib/postgresql/9.1/bin/pg_ctl psql=/usr/bin/psql start_opt= pgdata=/var/lib/postgresql/9.1/main config=/etc/postgresql/9.1/main/postgresql.conf pgdba=postgres \ op start interval=0 timeout=120s \ op stop interval=0 timeout=120s \ op monitor interval=30s timeout=30s depth=0 location loc_kvm_ip_node1 pri_kvm_ip 10001: kvm-node1 location loc_sql_ip_node1 pri_sql_ip inf: sql-node1 location loc_sql_ip_node2 pri_sql_ip inf: sql-node2 location loc_sql_ip_node3 pri_sql_ip inf: sql-node3 location loc_sql_srv_node1 res_pgsql inf: sql-node1 location loc_sql_srv_node2 res_pgsql inf: sql-node2 location loc_sql_srv_node3 res_pgsql inf: sql-node3 location loc_www_ip_node1 pri_www_ip inf: www-node1 location loc_www_ip_node2 pri_www_ip inf: www-node2 location loc_www_ip_node3 pri_www_ip inf: www-node3 location loc_www_srv_node1 res_apache inf: www-node1 location loc_www_srv_node2 res_apache inf: www-node2 location loc_www_srv_node3 res_apache inf: www-node3 property $id=cib-bootstrap-options \ dc-version=1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff \ cluster-infrastructurFailed actions: Why do i see in crm_mon the following output? res_pgsql_start_0 (node=www-node1, call=16, rc=5, status=complete): not installed res_pgsql_start_0 (node=www-node2, call=13, rc=5, status=complete): not installed pri_www_ip_monitor_1 (node=www-node3, call=22, rc=7, status=complete): not running res_pgsql_start_0 (node=www-node3, call=13, rc=5, status=complete): not installed res_apache_start_0 (node=sql-node2, call=18, rc=5, status=complete): not installed res_pgsql_start_0 (node=sql-node2, call=12, rc=5, status=complete): not installed res_apache_start_0 (node=sql-node3, call=12, rc=5, status=complete): not installed res_pgsql_start_0 (node=sql-node3, call=10, rc=5, status=complete): not installed res_apache_start_0 (node=kvm-node1, call=12, rc=5, status=complete): not installed res_pgsql_start_0 (node=kvm-node1, call=20, rc=5, status=complete): not installede=openais \ expected-quorum-votes=7 \ stonith-enabled=false I set the infinity for pgsql on all 3 sql nodes, but not! on the www nodes. Why tries Pacemaker to start the Postgres SQL Server on the www Node? In example? Thank for your Help greetings Hauke ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org mailto:Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5
Hi, I'm fluent in English so I doubt it's a language barrier. I have reasonable user experience in Linux, though not extensive experience in the various system commands, and I have zero experience in HA. I'm in fact trying to make things as simple as possible by simply following the Clusters from Scratch guide step by step, and only modifying/omitting steps when they don't work. I know a block device (like /dev/sda) is simply a device (such as a hard disk) that appears like a file in Linux, allowing users buffered access to the device. I know a file system is like FAT/NTFS/ext2/etc. I know a mount point is a directory that you can mount an image file with a file system onto it. Once mounted, it would be as if the entire file system has the mount point as its root directory. I set up DRBD almost exactly like the instructions from Chapter 7 of Clusters from Scratch. The only differences are in our setups. The guide assumes Fedora 13, DRBD 8.3 while I'm using CentOS 6.5 and DRBD 8.4. Since I was following the guide from start to finish, /var/www/html already has index.html already in there. node01 has it's own index.html, and node02 has its own index.html, both with different content. The guide did not instruct me to delete these files, and seems to configure the mount point to be /var/www/html (Chapter 7.4) with an ext4 file system, hence mounting the image onto a directory that already has files in it. Is this a problem? On Tue, Nov 11, 2014 at 6:07 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Tue, Nov 11, 2014 at 12:27:23PM +0800, Sihan Goi wrote: Hi, DocumentRoot is still set to /var/www/html ls -al /var/www/html shows different things on the 2 nodes node01: total 28 drwxr-xr-x. 3 root root 4096 Nov 11 12:25 . drwxr-xr-x. 6 root root 4096 Jul 23 22:18 .. -rw-r--r--. 1 root root50 Oct 28 18:00 index.html drwx--. 2 root root 16384 Oct 28 17:59 lost+found node02 only has index.html, no lost+found, and it's a different version of the file. I'm unsure if there is just a language barrier, or if you just have not enough experience with linux in general, or if you try to make things more complicated as they are. Do you know * what a block device is? * what a file system is? * what a mount point is? * that a mount point may not be empty, even though it typically is? * what it means to mount a file system to a mount point? Assuming you set up DRBD in a sane way, and it is mounted on *one* node (the node where it is Primary), then on the *other* node, where it is NOT mounted, you will only see the mount point, and whatever happens to be in there. You probably should clear out the contents of that mount point, so that you'd have an empty mount point. Or, if you like, replace it with some dummy content that clearly shows that this is the mount point, and not the file system that is intended to be mounted there. Status URL is enabled in both nodes. As for the DocumentRoot must be a directory, please double check for typos... On Oct 30, 2014 11:14 AM, Andrew Beekhof and...@beekhof.net wrote: On 29 Oct 2014, at 1:01 pm, Sihan Goi gois...@gmail.com wrote: Hi, I've never used crm_report before. I just read the man file and generated a tarball from 1-2 hours before I reconfigured all the DRBD related resources. I've put the tarball here - https://www.dropbox.com/s/suj9pttjp403msv/unexplained-apache-failure.tar.bz2?dl=0 Hope you can help figure out what I'm doing wrong. Thanks for the help! Oct 28 18:13:38 node02 Filesystem(WebFS)[29940]: INFO: Running start for /dev/drbd/by-res/wwwdata on /var/www/html Oct 28 18:13:39 node02 kernel: EXT4-fs (drbd1): mounted filesystem with ordered data mode. Opts: Oct 28 18:13:39 node02 crmd[9870]: notice: process_lrm_event: LRM operation WebFS_start_0 (call=164, rc=0, cib-update=298, confirmed=true) ok Oct 28 18:13:39 node02 crmd[9870]: notice: te_rsc_command: Initiating action 7: start WebSite_start_0 on node02 (local) Oct 28 18:13:39 node02 apache(WebSite)[30007]: ERROR: Syntax error on line 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory Is DocumentRoot still set to /var/www/html? If so, what happens if you run 'ls -al /var/www/html' in a shell? Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: apache not running Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up Did you enable the status url? http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_enable_the_apache_status_url.html -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list:
[Pacemaker] Split Brain on DRBD Dual Primary
Hi All, On October archives, I saw the issue reported by Felix Zachlod on http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022653.html and the same is actually happens to me now on dual primary DRBD node. My current OS was RHEL 6.6 and software version that I used was pacemaker-1.1.12-4.el6.x86_64 corosync-1.4.7-1.el6.x86_64 cman-3.0.12.1-68.el6.x86_64 drbd84-utils-8.9.1-1.el6.elrepo.x86_64 kmod-drbd84-8.4.5-2.el6.elrepo.x86_64 gfs2-utils-3.0.12.1-68.el6.x86_64 First, I will explain my existing resource. I have 3 resource which are drbd, dlm for gfs2, and HomeFS. Master: HomeDataClone Meta Attrs: master-max=2 master-node-max=1 clone-max=2 clone-node-max=1 notify=true interval=0s Resource: HomeData (class=ocf provider=linbit type=drbd) Attributes: drbd_resource=homedata Operations: start interval=0s timeout=240 (HomeData-start-timeout-240) promote interval=0s (HomeData-promote-interval-0s) demote interval=0s timeout=90 (HomeData-demote-timeout-90) stop interval=0s timeout=100 (HomeData-stop-timeout-100) monitor interval=60s (HomeData-monitor-interval-60s) Clone: HomeFS-clone Meta Attrs: start-delay=30s target-role=Stopped Resource: HomeFS (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/drbd/by-res/homedata directory=/home fstype=gfs2 Operations: start interval=0s timeout=60 (HomeFS-start-timeout-60) stop interval=0s timeout=60 (HomeFS-stop-timeout-60) monitor interval=20 timeout=40 (HomeFS-monitor-interval-20) Clone: dlm-clone Meta Attrs: clone-max=2 clone-node-max=1 start-delay=0s Resource: dlm (class=ocf provider=pacemaker type=controld) Operations: start interval=0s timeout=90 (dlm-start-timeout-90) stop interval=0s timeout=100 (dlm-stop-timeout-100) monitor interval=60s (dlm-monitor-interval-60s) But when I try to start the cluster on normal condition, It will cause split brain on DRBD on each node. From the log I can see it was the same case with Felix which was caused by pacemaker promoting drbd to primary while it was still waiting for handshake connection on each node. Nov 12 11:37:32 node002 kernel: block drbd1: disk( Attaching - UpToDate ) Nov 12 11:37:32 node002 kernel: block drbd1: attached to UUIDs C9630089EC3B58CC::B4653C665EBC0DBB:B4643C665EBC0DBA Nov 12 11:37:32 node002 kernel: drbd homedata: conn( StandAlone - Unconnected ) Nov 12 11:37:32 node002 kernel: drbd homedata: Starting receiver thread (from drbd_w_homedata [22531]) Nov 12 11:37:32 node002 kernel: drbd homedata: receiver (re)started Nov 12 11:37:32 node002 kernel: drbd homedata: conn( Unconnected - WFConnection ) Nov 12 11:37:32 node002 attrd[22340]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-HomeData (1000) Nov 12 11:37:32 node002 attrd[22340]: notice: attrd_perform_update: Sent update 17: master-HomeData=1000 Nov 12 11:37:32 node002 crmd[22342]: notice: process_lrm_event: Operation HomeData_start_0: ok (node=node002, call=18, rc=0, cib-update=13, confirmed=true) Nov 12 11:37:33 node002 crmd[22342]: notice: process_lrm_event: Operation HomeData_notify_0: ok (node=node002, call=19, rc=0, cib-update=0, confirmed=true) Nov 12 11:37:33 node002 crmd[22342]: notice: process_lrm_event: Operation HomeData_notify_0: ok (node=node002, call=20, rc=0, cib-update=0, confirmed=true) Nov 12 11:37:33 node002 kernel: block drbd1: role( Secondary - Primary ) Nov 12 11:37:33 node002 kernel: block drbd1: new current UUID 58F02AE0E03C1C91:C9630089EC3B58CC:B4653C665EBC0DBB:B4643C665EBC0DBA Nov 12 11:37:33 node002 crmd[22342]: notice: process_lrm_event: Operation HomeData_promote_0: ok (node=node002, call=21, rc=0, cib-update=14, confirmed=true) Nov 12 11:37:33 node002 attrd[22340]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-HomeData (1) Nov 12 11:37:33 node002 attrd[22340]: notice: attrd_perform_update: Sent update 23: master-HomeData=1 Nov 12 11:37:33 node002 crmd[22342]: notice: process_lrm_event: Operation HomeData_notify_0: ok (node=node002, call=22, rc=0, cib-update=0, confirmed=true) Nov 12 11:37:33 node002 kernel: drbd homedata: Handshake successful: Agreed network protocol version 101 Nov 12 11:37:33 node002 kernel: drbd homedata: Agreed to support TRIM on protocol level Nov 12 11:37:33 node002 kernel: drbd homedata: Peer authenticated using 20 bytes HMAC Nov 12 11:37:33 node002 kernel: drbd homedata: conn( WFConnection - WFReportParams ) Nov 12 11:37:33 node002 kernel: drbd homedata: Starting asender thread (from drbd_r_homedata [22543]) Nov 12 11:37:33 node002 kernel: block drbd1: drbd_sync_handshake: Nov 12 11:37:33 node002 kernel: block drbd1: self 58F02AE0E03C1C91:C9630089EC3B58CC:B4653C665EBC0DBB:B4643C665EBC0DBA bits:0 flags:0 Nov 12 11:37:33 node002 kernel: block drbd1: peer
Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5
11.11.2014 07:27, Sihan Goi wrote: Hi, DocumentRoot is still set to /var/www/html ls -al /var/www/html shows different things on the 2 nodes node01: total 28 drwxr-xr-x. 3 root root 4096 Nov 11 12:25 . drwxr-xr-x. 6 root root 4096 Jul 23 22:18 .. -rw-r--r--. 1 root root50 Oct 28 18:00 index.html drwx--. 2 root root 16384 Oct 28 17:59 lost+found node02 only has index.html, no lost+found, and it's a different version of the file. It look like apache is unable to stat its document root. Could you please show output of two commands: getenforce ls -dZ /var/www/html on both nodes when fs is mounted on one of them? If you see 'Enforcing', and the last part of the selinux context of a mounted fs root is not httpd_sys_content_t, then run 'restorecon -R /var/www/html' on that node. Status URL is enabled in both nodes. On Oct 30, 2014 11:14 AM, Andrew Beekhof and...@beekhof.net mailto:and...@beekhof.net wrote: On 29 Oct 2014, at 1:01 pm, Sihan Goi gois...@gmail.com mailto:gois...@gmail.com wrote: Hi, I've never used crm_report before. I just read the man file and generated a tarball from 1-2 hours before I reconfigured all the DRBD related resources. I've put the tarball here - https://www.dropbox.com/s/suj9pttjp403msv/unexplained-apache-failure.tar.bz2?dl=0 Hope you can help figure out what I'm doing wrong. Thanks for the help! Oct 28 18:13:38 node02 Filesystem(WebFS)[29940]: INFO: Running start for /dev/drbd/by-res/wwwdata on /var/www/html Oct 28 18:13:39 node02 kernel: EXT4-fs (drbd1): mounted filesystem with ordered data mode. Opts: Oct 28 18:13:39 node02 crmd[9870]: notice: process_lrm_event: LRM operation WebFS_start_0 (call=164, rc=0, cib-update=298, confirmed=true) ok Oct 28 18:13:39 node02 crmd[9870]: notice: te_rsc_command: Initiating action 7: start WebSite_start_0 on node02 (local) Oct 28 18:13:39 node02 apache(WebSite)[30007]: ERROR: Syntax error on line 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory Is DocumentRoot still set to /var/www/html? If so, what happens if you run 'ls -al /var/www/html' in a shell? Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: apache not running Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up Did you enable the status url? http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_enable_the_apache_status_url.html ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org mailto:Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Loosing corosync communication clusterwide
On 11 Nov 2014, at 10:12 pm, Daniel Dehennin daniel.dehen...@baby-gnu.org wrote: Andrew Beekhof and...@beekhof.net writes: [...] I have fencing configured and working, modulo fencing VMs on dead host[1]. Are you saying that the host and the VMs running inside it are both part of the same cluster? Yes, one of the VM needs to access the GFS2 filesystem like the nodes, the other VM is a quorum node (standby=on). That sounds like a recipe for disaster to be honest. If you want VM's to be part of a cluster, it would be advisable to have their host(s) be in a different one. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org