Re: [ClusterLabs] corosync service not automatically started

2017-10-11 Thread Václav Mach



On 10/11/2017 09:00 AM, Ferenc Wágner wrote:

Václav Mach <ma...@cesnet.cz> writes:


allow-hotplug eth0
iface eth0 inet dhcp


Try replacing allow-hotplug with auto.  Ifupdown simply runs ifup -a
before network-online.target, which excludes allow-hotplug interfaces.
That means allow-hotplug interfaces are not waited for before corosync
is started during boot.



That did the trick for network config using DHCP. Thanks for clarification.

Do you know what is the reason, why allow-hotplug interfaces are 
excluded? It's obivous that if ifup (according to it's man) is run as 
'ifup -a' it does ignore them, but I don't get why allow hotplug 
interfaces should be ignored by init system.


--
Václav Mach
CESNET, z.s.p.o.
www.cesnet.cz



smime.p7s
Description: S/MIME Cryptographic Signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync service not automatically started

2017-10-10 Thread Václav Mach


On 10/10/2017 11:40 AM, Valentin Vidic wrote:

On Tue, Oct 10, 2017 at 11:26:24AM +0200, Václav Mach wrote:

# The primary network interface
allow-hotplug eth0
iface eth0 inet dhcp
# This is an autoconfigured IPv6 interface
iface eth0 inet6 auto


allow-hotplug or dhcp could be causing problems.  You can try
disabling corosync and pacemaker so they don't start on boot
and start them manually after a few minutes when the network
is stable.  If it works than you have some kind of a timing
issue.  You can try using 'auto eth0' or a static IP address
to see if it helps...



It seems that static network configuration really solved this issue. No 
further modifications of services were necessary.


Thanks for help.

--
Václav Mach
CESNET, z.s.p.o.
www.cesnet.cz



smime.p7s
Description: S/MIME Cryptographic Signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync service not automatically started

2017-10-10 Thread Václav Mach


On 10/10/2017 11:04 AM, Valentin Vidic wrote:

On Tue, Oct 10, 2017 at 10:35:17AM +0200, Václav Mach wrote:

Oct 10 10:27:05 r1nren.et.cesnet.cz corosync[709]:   [QB] Denied
connection, is not ready (709-1337-18)
Oct 10 10:27:06 r1nren.et.cesnet.cz corosync[709]:   [QB] Denied
connection, is not ready (709-1337-18)
Oct 10 10:27:07 r1nren.et.cesnet.cz corosync[709]:   [QB] Denied
connection, is not ready (709-1337-18)
Oct 10 10:27:08 r1nren.et.cesnet.cz corosync[709]:   [QB] Denied
connection, is not ready (709-1337-18)
Oct 10 10:27:09 r1nren.et.cesnet.cz corosync[709]:   [QB] Denied
connection, is not ready (709-1337-18)


Could it be that the network or the firewall takes some time to start
on boot?



I'm not sure about that. It seems to me that this should not be the 
issue - few lines above in my previous mail in log - the first line says 
the network interface is up:


Oct 10 10:27:03 r1nren.et.cesnet.cz corosync[709]:   [TOTEM ] The 
network interface [78.128.211.51] is now up.
Oct 10 10:27:03 r1nren.et.cesnet.cz corosync[709]:   [TOTEM ] adding new 
UDPU member {78.128.211.51}
Oct 10 10:27:03 r1nren.et.cesnet.cz corosync[709]:   [TOTEM ] adding new 
UDPU member {78.128.211.52}

Oct 10 10:27:03 r1nren.et.cesnet.cz corosync[709]:   [QB] Denied

Network configuration (same for r2):
root@r1nren:~# cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
allow-hotplug eth0
iface eth0 inet dhcp
# This is an autoconfigured IPv6 interface
iface eth0 inet6 auto

--
Václav Mach
CESNET, z.s.p.o.
www.cesnet.cz



smime.p7s
Description: S/MIME Cryptographic Signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] corosync service not automatically started

2017-10-10 Thread Václav Mach
, is not ready (709-1337-18)
Oct 10 10:27:10 r1nren.et.cesnet.cz corosync[709]: corosync: 
votequorum.c:2065: message_handler_req_exec_votequorum_nodeinfo: 
Assertion `sender_node != NULL' failed.
Oct 10 10:27:10 r1nren.et.cesnet.cz systemd[1]: corosync.service: Main 
process exited, code=killed, status=6/ABRT
Oct 10 10:27:10 r1nren.et.cesnet.cz systemd[1]: Failed to start Corosync 
Cluster Engine.
Oct 10 10:27:10 r1nren.et.cesnet.cz systemd[1]: corosync.service: Unit 
entered failed state.
Oct 10 10:27:10 r1nren.et.cesnet.cz systemd[1]: corosync.service: Failed 
with result 'signal'.


corosync configuration:
root@r1nren:~# cat /etc/corosync/corosync.conf
totem {
version: 2
transport: udpu
cluster_name: eduroam.cz
token: 3000
token_retransmits_before_loss_const: 10
clear_node_high_bit: yes
crypto_cipher: aes256
crypto_hash: sha256
interface {
ringnumber: 0
bindnetaddr: 78.128.211.51
ttl: 1
}
}

logging {
fileline: off
to_stderr: no
to_logfile: no
to_syslog: yes
syslog_facility: daemon
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}

quorum {
provider: corosync_votequorum
expected_votes: 2
two_node: 1
}

nodelist{
node {
ring0_addr: 78.128.211.51
}
node {
ring0_addr: 78.128.211.52
}
}


Let me know if I can provide any more information about this (are there 
any corosync logs?).


View from r2:
root@r2nren:~# crm status
Stack: corosync
Current DC: r2nren.et.cesnet.cz (version 1.1.16-94ff4df) - partition 
with quorum

Last updated: Tue Oct 10 10:29:45 2017
Last change: Tue Oct 10 10:25:32 2017 by root via crm_attribute on 
r1nren.et.cesnet.cz


2 nodes configured
8 resources configured

Online: [ r2nren.et.cesnet.cz ]
OFFLINE: [ r1nren.et.cesnet.cz ]

Full list of resources:

 Clone Set: clone_ping_gw [ping_gw]
 Started: [ r2nren.et.cesnet.cz ]
 Stopped: [ r1nren.et.cesnet.cz ]
 Resource Group: group_eduroam.cz
 standby_ip (ocf::heartbeat:IPaddr2):   Started r2nren.et.cesnet.cz
 offline_file   (systemd:offline_file): Started r2nren.et.cesnet.cz
 racoon (systemd:racoon):   Started r2nren.et.cesnet.cz
 radiator   (systemd:radiator): Started r2nren.et.cesnet.cz
 eduroam_ping   (systemd:eduroam_ping): Started r2nren.et.cesnet.cz
 mailto (ocf::heartbeat:MailTo):Started r2nren.et.cesnet.cz


What could be the problem I encountered?

Thanks for help.

Regards,
Vaclav

--
Václav Mach
CESNET, z.s.p.o.
www.cesnet.cz



smime.p7s
Description: S/MIME Cryptographic Signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] strange cluster state

2017-09-29 Thread Václav Mach

Hello,

I am trying to setup simple 2 node cluster. The setup is done with 
ansible. The whole project is available on github at 
https://github.com/lager1/cesnet_HA (README is written in czech, but 
other parts may be relevant).


The cluster consist of two servers - r1nren.et.cesnet.cz (r1, r1nren) 
and r2nret.et.cesnet.cz (r2, r2nren). Configuration uses group for 
resources to utilize created dependencies and colocation rules.


The resources are:
- ping_gw
- standby_ip
- offline_file
- radiator
- racoon
- eduroam_ping
- mailto

Resource ping_gw is cloned to be run on both nodes.
All the remainning resources are added to group.

When testing cluster behavior I've managed to get the cluster in an a 
strange state:


Node r2nren.et.cesnet.cz: standby
Online: [ r1nren.et.cesnet.cz ]

Full list of resources:

 Clone Set: clone_ping_gw [ping_gw]
 Started: [ r1nren.et.cesnet.cz ]
 Stopped: [ r2nren.et.cesnet.cz ]
 Resource Group: group_eduroam.cz
 standby_ip (ocf::heartbeat:IPaddr2):   Started r2nren.et.cesnet.cz
 offline_file   (systemd:offline_file): Stopped
 radiator   (systemd:radiator): Started r1nren.et.cesnet.cz
 racoon (systemd:racoon):   Stopped
 eduroam_ping   (systemd:eduroam_ping): Stopped
 mailto (ocf::heartbeat:MailTo):Started r1nren.et.cesnet.cz

How is this state even possible?
According to the docs, the node may not run any resources when it is in 
standby state. Also all the resources should run on same node and all 
the resources should be started in the defined order. The output above 
does not match that.


I'm not totally sure if the attached logs were created when this problem 
occured, but I think they should.


Thanks for help.

Regards,
Vaclav

--
Václav Mach
CESNET, z.s.p.o.
www.cesnet.cz




ha_files.tar.gz
Description: application/gzip


smime.p7s
Description: S/MIME Cryptographic Signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org