Re: [ClusterLabs] corosync service not automatically started
On 10/11/2017 09:00 AM, Ferenc Wágner wrote: Václav Mach <ma...@cesnet.cz> writes: allow-hotplug eth0 iface eth0 inet dhcp Try replacing allow-hotplug with auto. Ifupdown simply runs ifup -a before network-online.target, which excludes allow-hotplug interfaces. That means allow-hotplug interfaces are not waited for before corosync is started during boot. That did the trick for network config using DHCP. Thanks for clarification. Do you know what is the reason, why allow-hotplug interfaces are excluded? It's obivous that if ifup (according to it's man) is run as 'ifup -a' it does ignore them, but I don't get why allow hotplug interfaces should be ignored by init system. -- Václav Mach CESNET, z.s.p.o. www.cesnet.cz smime.p7s Description: S/MIME Cryptographic Signature ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync service not automatically started
On 10/10/2017 11:40 AM, Valentin Vidic wrote: On Tue, Oct 10, 2017 at 11:26:24AM +0200, Václav Mach wrote: # The primary network interface allow-hotplug eth0 iface eth0 inet dhcp # This is an autoconfigured IPv6 interface iface eth0 inet6 auto allow-hotplug or dhcp could be causing problems. You can try disabling corosync and pacemaker so they don't start on boot and start them manually after a few minutes when the network is stable. If it works than you have some kind of a timing issue. You can try using 'auto eth0' or a static IP address to see if it helps... It seems that static network configuration really solved this issue. No further modifications of services were necessary. Thanks for help. -- Václav Mach CESNET, z.s.p.o. www.cesnet.cz smime.p7s Description: S/MIME Cryptographic Signature ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync service not automatically started
On 10/10/2017 11:04 AM, Valentin Vidic wrote: On Tue, Oct 10, 2017 at 10:35:17AM +0200, Václav Mach wrote: Oct 10 10:27:05 r1nren.et.cesnet.cz corosync[709]: [QB] Denied connection, is not ready (709-1337-18) Oct 10 10:27:06 r1nren.et.cesnet.cz corosync[709]: [QB] Denied connection, is not ready (709-1337-18) Oct 10 10:27:07 r1nren.et.cesnet.cz corosync[709]: [QB] Denied connection, is not ready (709-1337-18) Oct 10 10:27:08 r1nren.et.cesnet.cz corosync[709]: [QB] Denied connection, is not ready (709-1337-18) Oct 10 10:27:09 r1nren.et.cesnet.cz corosync[709]: [QB] Denied connection, is not ready (709-1337-18) Could it be that the network or the firewall takes some time to start on boot? I'm not sure about that. It seems to me that this should not be the issue - few lines above in my previous mail in log - the first line says the network interface is up: Oct 10 10:27:03 r1nren.et.cesnet.cz corosync[709]: [TOTEM ] The network interface [78.128.211.51] is now up. Oct 10 10:27:03 r1nren.et.cesnet.cz corosync[709]: [TOTEM ] adding new UDPU member {78.128.211.51} Oct 10 10:27:03 r1nren.et.cesnet.cz corosync[709]: [TOTEM ] adding new UDPU member {78.128.211.52} Oct 10 10:27:03 r1nren.et.cesnet.cz corosync[709]: [QB] Denied Network configuration (same for r2): root@r1nren:~# cat /etc/network/interfaces # This file describes the network interfaces available on your system # and how to activate them. For more information, see interfaces(5). source /etc/network/interfaces.d/* # The loopback network interface auto lo iface lo inet loopback # The primary network interface allow-hotplug eth0 iface eth0 inet dhcp # This is an autoconfigured IPv6 interface iface eth0 inet6 auto -- Václav Mach CESNET, z.s.p.o. www.cesnet.cz smime.p7s Description: S/MIME Cryptographic Signature ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] corosync service not automatically started
, is not ready (709-1337-18) Oct 10 10:27:10 r1nren.et.cesnet.cz corosync[709]: corosync: votequorum.c:2065: message_handler_req_exec_votequorum_nodeinfo: Assertion `sender_node != NULL' failed. Oct 10 10:27:10 r1nren.et.cesnet.cz systemd[1]: corosync.service: Main process exited, code=killed, status=6/ABRT Oct 10 10:27:10 r1nren.et.cesnet.cz systemd[1]: Failed to start Corosync Cluster Engine. Oct 10 10:27:10 r1nren.et.cesnet.cz systemd[1]: corosync.service: Unit entered failed state. Oct 10 10:27:10 r1nren.et.cesnet.cz systemd[1]: corosync.service: Failed with result 'signal'. corosync configuration: root@r1nren:~# cat /etc/corosync/corosync.conf totem { version: 2 transport: udpu cluster_name: eduroam.cz token: 3000 token_retransmits_before_loss_const: 10 clear_node_high_bit: yes crypto_cipher: aes256 crypto_hash: sha256 interface { ringnumber: 0 bindnetaddr: 78.128.211.51 ttl: 1 } } logging { fileline: off to_stderr: no to_logfile: no to_syslog: yes syslog_facility: daemon debug: off timestamp: on logger_subsys { subsys: QUORUM debug: off } } quorum { provider: corosync_votequorum expected_votes: 2 two_node: 1 } nodelist{ node { ring0_addr: 78.128.211.51 } node { ring0_addr: 78.128.211.52 } } Let me know if I can provide any more information about this (are there any corosync logs?). View from r2: root@r2nren:~# crm status Stack: corosync Current DC: r2nren.et.cesnet.cz (version 1.1.16-94ff4df) - partition with quorum Last updated: Tue Oct 10 10:29:45 2017 Last change: Tue Oct 10 10:25:32 2017 by root via crm_attribute on r1nren.et.cesnet.cz 2 nodes configured 8 resources configured Online: [ r2nren.et.cesnet.cz ] OFFLINE: [ r1nren.et.cesnet.cz ] Full list of resources: Clone Set: clone_ping_gw [ping_gw] Started: [ r2nren.et.cesnet.cz ] Stopped: [ r1nren.et.cesnet.cz ] Resource Group: group_eduroam.cz standby_ip (ocf::heartbeat:IPaddr2): Started r2nren.et.cesnet.cz offline_file (systemd:offline_file): Started r2nren.et.cesnet.cz racoon (systemd:racoon): Started r2nren.et.cesnet.cz radiator (systemd:radiator): Started r2nren.et.cesnet.cz eduroam_ping (systemd:eduroam_ping): Started r2nren.et.cesnet.cz mailto (ocf::heartbeat:MailTo):Started r2nren.et.cesnet.cz What could be the problem I encountered? Thanks for help. Regards, Vaclav -- Václav Mach CESNET, z.s.p.o. www.cesnet.cz smime.p7s Description: S/MIME Cryptographic Signature ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] strange cluster state
Hello, I am trying to setup simple 2 node cluster. The setup is done with ansible. The whole project is available on github at https://github.com/lager1/cesnet_HA (README is written in czech, but other parts may be relevant). The cluster consist of two servers - r1nren.et.cesnet.cz (r1, r1nren) and r2nret.et.cesnet.cz (r2, r2nren). Configuration uses group for resources to utilize created dependencies and colocation rules. The resources are: - ping_gw - standby_ip - offline_file - radiator - racoon - eduroam_ping - mailto Resource ping_gw is cloned to be run on both nodes. All the remainning resources are added to group. When testing cluster behavior I've managed to get the cluster in an a strange state: Node r2nren.et.cesnet.cz: standby Online: [ r1nren.et.cesnet.cz ] Full list of resources: Clone Set: clone_ping_gw [ping_gw] Started: [ r1nren.et.cesnet.cz ] Stopped: [ r2nren.et.cesnet.cz ] Resource Group: group_eduroam.cz standby_ip (ocf::heartbeat:IPaddr2): Started r2nren.et.cesnet.cz offline_file (systemd:offline_file): Stopped radiator (systemd:radiator): Started r1nren.et.cesnet.cz racoon (systemd:racoon): Stopped eduroam_ping (systemd:eduroam_ping): Stopped mailto (ocf::heartbeat:MailTo):Started r1nren.et.cesnet.cz How is this state even possible? According to the docs, the node may not run any resources when it is in standby state. Also all the resources should run on same node and all the resources should be started in the defined order. The output above does not match that. I'm not totally sure if the attached logs were created when this problem occured, but I think they should. Thanks for help. Regards, Vaclav -- Václav Mach CESNET, z.s.p.o. www.cesnet.cz ha_files.tar.gz Description: application/gzip smime.p7s Description: S/MIME Cryptographic Signature ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org