Re: [ClusterLabs] Could not initialize corosync configuration API error 2

2023-04-03 Thread Jan Friesse

Hi,

On 31/03/2023 11:36, S Sathish S wrote:

Hi Team,

Please find the corosync version.

[root@node2 ~]# rpm -qa corosync
corosync-2.4.4-2.el7.x86_64.


RHEL 7 never got 2.4.4 - there was 2.4.3 in RHEL 7.7 and 2.4.5 in RHEL 
7.8/7.9. Is this self compiled version? If so, please consider updating 
to distro provided package - RHEL 7 package IS actively maintained.





Firewall in disable state only.

Please find the debug and trace logs

Mar 31 10:07:30 [17684] node2 corosync notice  [MAIN  ] Corosync Cluster Engine 
('UNKNOWN'): started and ready to provide service.
Mar 31 10:07:30 [17684] node2 corosync info[MAIN  ] Corosync built-in 
features: pie relro bindnow
Mar 31 10:07:30 [17684] node2 corosync warning [MAIN  ] Could not set SCHED_RR 
at priority 99: Operation not permitted (1)


This is weird - is corosync running as a root?


Mar 31 10:07:30 [17684] node2 corosync debug   [QB] shm size:8388621; 
real_size:8392704; rb->word_size:2098176
Mar 31 10:07:30 [17684] node2 corosync debug   [MAIN  ] Corosync TTY detached
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] waiting_trans_ack 
changed to 1
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] Token Timeout (5550 ms)



...


Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] entering GATHER state 
from 11(merge during join).



This is important. Usually this means there is forgotten node somewhere 
trying to connect to existing cluster or config files between nodes 
differs. Solution is:

1. Check corosync.conf is equal on all nodes
2. Update to distro package (2.4.5) which contains block_unlisted_ips 
functionality/option (enabled by default) and/or generate new crypto 
key, distribute it only to nodes within cluster (so node1 .. node9) and 
turn on crypto,




Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] entering GATHER state 
from 11(merge during join).
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] entering GATHER state from 


...






Please find the corosync conf file.

[root@node2 ~]# cat /etc/corosync/corosync.conf
totem {
 version: 2
 cluster_name: OCC
 secauth: off


it's really good idea to turn on crypto


 transport: udpu
}



nodelist {
 node {
 ring0_addr: node1
 nodeid: 1
 }



 node {
 ring0_addr: node2
 nodeid: 2
 }



 node {
 ring0_addr: node3
 nodeid: 3
 }



 node {
 ring0_addr: node4
 nodeid: 4
 }



 node {
 ring0_addr: node5
 nodeid: 5
 }



 node {
 ring0_addr: node6
 nodeid: 6
 }



 node {
 ring0_addr: node7
 nodeid: 7
 }



 node {
 ring0_addr: node8
 nodeid: 8
 }



 node {
 ring0_addr: node9
 nodeid: 9
 }
}



quorum {
 provider: corosync_votequorum
}



logging {
 to_logfile: yes
 logfile: /var/log/cluster/corosync.log
 to_syslog: no
timestamp:on
}



Regards,
  Honza


Thanks and Regards,
S Sathish S



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Could not initialize corosync configuration API error 2

2023-03-31 Thread S Sathish S via Users
Hi Team,

Please find the corosync version.

[root@node2 ~]# rpm -qa corosync
corosync-2.4.4-2.el7.x86_64.

Firewall in disable state only.

Please find the debug and trace logs

Mar 31 10:07:30 [17684] node2 corosync notice  [MAIN  ] Corosync Cluster Engine 
('UNKNOWN'): started and ready to provide service.
Mar 31 10:07:30 [17684] node2 corosync info[MAIN  ] Corosync built-in 
features: pie relro bindnow
Mar 31 10:07:30 [17684] node2 corosync warning [MAIN  ] Could not set SCHED_RR 
at priority 99: Operation not permitted (1)
Mar 31 10:07:30 [17684] node2 corosync debug   [QB] shm size:8388621; 
real_size:8392704; rb->word_size:2098176
Mar 31 10:07:30 [17684] node2 corosync debug   [MAIN  ] Corosync TTY detached
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] waiting_trans_ack 
changed to 1
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] Token Timeout (5550 ms) 
retransmit timeout (1321 ms)
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] token hold (1046 ms) 
retransmits before loss (4 retrans)
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] join (50 ms) send_join 
(0 ms) consensus (6660 ms) merge (200 ms)
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] downcheck (1000 ms) 
fail to recv const (2500 msgs)
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] seqno unchanged const 
(30 rotations) Maximum network MTU 1401
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] window size per 
rotation (50 messages) maximum messages per rotation (17 messages)
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] missed count const (5 
messages)
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] send threads (0 threads)
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] RRP token expired 
timeout (1321 ms)
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] RRP token problem 
counter (2000 ms)
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] RRP threshold (10 
problem count)
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] RRP multicast threshold 
(100 problem count)
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] RRP automatic recovery 
check timeout (1000 ms)
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] RRP mode set to none.
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] 
heartbeat_failures_allowed (0)
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] max_network_delay (50 
ms)
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] HeartBeat is Disabled. 
To enable set heartbeat_failures_allowed > 0
Mar 31 10:07:30 [17684] node2 corosync notice  [TOTEM ] Initializing transport 
(UDP/IP Unicast).
Mar 31 10:07:30 [17684] node2 corosync notice  [TOTEM ] Initializing 
transmit/receive security (NSS) crypto: none hash: none
Mar 31 10:07:30 [17684] node2 corosync trace   [QB] grown poll array to 2 
for FD 8
Mar 31 10:07:30 [17684] node2 corosync notice  [TOTEM ] The network interface 
[10.33.59.175] is now up.
Mar 31 10:07:30 [17684] node2 corosync debug   [TOTEM ] Created or loaded 
sequence id 540.10.33.59.175 for this ring.
Mar 31 10:07:30 [17684] node2 corosync notice  [SERV  ] Service engine loaded: 
corosync configuration map access [0]
Mar 31 10:07:30 [17684] node2 corosync debug   [MAIN  ] Initializing IPC on 
cmap [0]
Mar 31 10:07:30 [17684] node2 corosync debug   [MAIN  ] No configured 
qb.ipc_type. Using native ipc
Mar 31 10:07:30 [17684] node2 corosync info[QB] server name: cmap
Mar 31 10:07:30 [17684] node2 corosync trace   [QB] grown poll array to 3 
for FD 9
Mar 31 10:07:30 [17684] node2 corosync notice  [SERV  ] Service engine loaded: 
corosync configuration service [1]
Mar 31 10:07:30 [17684] node2 corosync debug   [MAIN  ] Initializing IPC on cfg 
[1]
Mar 31 10:07:30 [17684] node2 corosync debug   [MAIN  ] No configured 
qb.ipc_type. Using native ipc
Mar 31 10:07:30 [17684] node2 corosync info[QB] server name: cfg
Mar 31 10:07:30 [17684] node2 corosync trace   [QB] grown poll array to 4 
for FD 10
Mar 31 10:07:30 [17684] node2 corosync notice  [SERV  ] Service engine loaded: 
corosync cluster closed process group service v1.01 [2]
Mar 31 10:07:30 [17684] node2 corosync debug   [MAIN  ] Initializing IPC on cpg 
[2]
Mar 31 10:07:30 [17684] node2 corosync debug   [MAIN  ] No configured 
qb.ipc_type. Using native ipc
Mar 31 10:07:30 [17684] node2 corosync info[QB] server name: cpg
Mar 31 10:07:30 [17684] node2 corosync trace   [QB] grown poll array to 5 
for FD 11
Mar 31 10:07:30 [17684] node2 corosync notice  [SERV  ] Service engine loaded: 
corosync profile loading service [4]
Mar 31 10:07:30 [17684] node2 corosync debug   [MAIN  ] NOT Initializing IPC on 
pload [4]
Mar 31 10:07:30 [17684] node2 corosync notice  [QUORUM] Using quorum provider 
corosync_votequorum
Mar 31 10:07:30 [17684] node2 corosync trace   [VOTEQ ] ENTERING 
votequorum_init()
Mar 31 10:07:30 [17684] node2 corosync trace   [VOTEQ ] ENTERING 
votequorum_exec_init_fn()
Mar 31 

Re: [ClusterLabs] Could not initialize corosync configuration API error 2

2023-03-31 Thread Jan Friesse

Hi,
more information would be needed to really find out real reason, so:
- double check corosync.conf (ip addresses)
- check firewall (mainly local one)
- what is the version of corosync
- try to set debug:on (or trace)
- paste config file
- paste full log - since corosync was started

Also keep in mind if it is version 2.x it's no longer supported by  
upstream and you have to contact your distribution provider support.


Regards,
  Honza

On 30/03/2023 12:08, S Sathish S via Users wrote:

Hi Team,

we are unable to start corosync service which is already part of existing 
cluster same is running fine for longer time. Now we are seeing corosync
server unable to join "Could not initialize corosync configuration API error 
2". Please find the below logs.

[root@node1 ~]# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; vendor 
preset: disabled)
Active: failed (Result: exit-code) since Thu 2023-03-30 10:49:58 WAT; 7min 
ago
  Docs: man:corosync
man:corosync.conf
man:corosync_overview
   Process: 9922 ExecStop=/usr/share/corosync/corosync stop (code=exited, 
status=0/SUCCESS)
   Process: 9937 ExecStart=/usr/share/corosync/corosync start (code=exited, 
status=1/FAILURE)



Mar 30 10:48:57 node1 systemd[1]: Starting Corosync Cluster Engine...
Mar 30 10:49:58 node1 corosync[9937]: Starting Corosync Cluster Engine 
(corosync): [FAILED]
Mar 30 10:49:58 node1 systemd[1]: corosync.service: control process exited, 
code=exited status=1
Mar 30 10:49:58 node1 systemd[1]: Failed to start Corosync Cluster Engine.
Mar 30 10:49:58 node1 systemd[1]: Unit corosync.service entered failed state.
Mar 30 10:49:58 node1 systemd[1]: corosync.service failed.

Please find the corosync logs error:

Mar 30 10:49:52 [9947] node1 corosync debug   [MAIN  ] Denied connection, 
corosync is not ready
Mar 30 10:49:52 [9947] node1 corosync warning [QB] Denied connection, is 
not ready (9948-10497-23)
Mar 30 10:49:52 [9947] node1 corosync debug   [MAIN  ] 
cs_ipcs_connection_destroyed()
Mar 30 10:49:52 [9947] node1 corosync debug   [MAIN  ] Denied connection, 
corosync is not ready
Mar 30 10:49:57 [9947] node1 corosync debug   [MAIN  ] 
cs_ipcs_connection_destroyed()
Mar 30 10:49:58 [9947] node1 corosync notice  [MAIN  ] Node was shut down by a 
signal
Mar 30 10:49:58 [9947] node1 corosync notice  [SERV  ] Unloading all Corosync 
service engines.
Mar 30 10:49:58 [9947] node1 corosync info[QB] withdrawing server 
sockets
Mar 30 10:49:58 [9947] node1 corosync debug   [QB] qb_ipcs_unref() - 
destroying
Mar 30 10:49:58 [9947] node1 corosync notice  [SERV  ] Service engine unloaded: 
corosync vote quorum service v1.0
Mar 30 10:49:58 [9947] node1 corosync info[QB] withdrawing server 
sockets
Mar 30 10:49:58 [9947] node1 corosync debug   [QB] qb_ipcs_unref() - 
destroying
Mar 30 10:49:58 [9947] node1 corosync notice  [SERV  ] Service engine unloaded: 
corosync configuration map access
Mar 30 10:49:58 [9947] node1 corosync info[QB] withdrawing server 
sockets
Mar 30 10:49:58 [9947] node1 corosync debug   [QB] qb_ipcs_unref() - 
destroying
Mar 30 10:49:58 [9947] node1 corosync notice  [SERV  ] Service engine unloaded: 
corosync configuration service
Mar 30 10:49:58 [9947] node1 corosync info[QB] withdrawing server 
sockets
Mar 30 10:49:58 [9947] node1 corosync debug   [QB] qb_ipcs_unref() - 
destroying
Mar 30 10:49:58 [9947] node1 corosync notice  [SERV  ] Service engine unloaded: 
corosync cluster closed process group service v1.01
Mar 30 10:49:58 [9947] node1 corosync info[QB] withdrawing server 
sockets
Mar 30 10:49:58 [9947] node1 corosync debug   [QB] qb_ipcs_unref() - 
destroying
Mar 30 10:49:58 [9947] node1 corosync notice  [SERV  ] Service engine unloaded: 
corosync cluster quorum service v0.1
Mar 30 10:49:58 [9947] node1 corosync notice  [SERV  ] Service engine unloaded: 
corosync profile loading service
Mar 30 10:49:58 [9947] node1 corosync debug   [TOTEM ] sending join/leave 
message
Mar 30 10:49:58 [9947] node1 corosync notice  [MAIN  ] Corosync Cluster Engine 
exiting normally


While try manually start corosync service also getting below error.


[root@node1 ~]# bash -x /usr/share/corosync/corosync start
+ desc='Corosync Cluster Engine'
+ prog=corosync
+ PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/sbin
+ '[' -f /etc/sysconfig/corosync ']'
+ . /etc/sysconfig/corosync
++ COROSYNC_INIT_TIMEOUT=60
++ COROSYNC_OPTIONS=
+ case '/etc/sysconfig' in
+ '[' -f /etc/init.d/functions ']'
+ . /etc/init.d/functions
++ TEXTDOMAIN=initscripts
++ umask 022
++ PATH=/sbin:/usr/sbin:/bin:/usr/bin
++ export PATH
++ '[' 28864 -ne 1 -a -z '' ']'
++ '[' -d /run/systemd/system ']'
++ case "$0" in
++ '[' -z '' ']'
++ COLUMNS=80
++ '[' -z '' ']'
++ '[' -c /dev/stderr -a -r /dev/stderr ']'
+++ /sbin/consoletype
++ CONSOLETYPE=pty
++ '[' -z '' ']'
++ '[' -z '' ']'
++ '[' -f 

[ClusterLabs] Could not initialize corosync configuration API error 2

2023-03-30 Thread S Sathish S via Users
Hi Team,

we are unable to start corosync service which is already part of existing 
cluster same is running fine for longer time. Now we are seeing corosync
server unable to join "Could not initialize corosync configuration API error 
2". Please find the below logs.

[root@node1 ~]# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; vendor 
preset: disabled)
   Active: failed (Result: exit-code) since Thu 2023-03-30 10:49:58 WAT; 7min 
ago
 Docs: man:corosync
   man:corosync.conf
   man:corosync_overview
  Process: 9922 ExecStop=/usr/share/corosync/corosync stop (code=exited, 
status=0/SUCCESS)
  Process: 9937 ExecStart=/usr/share/corosync/corosync start (code=exited, 
status=1/FAILURE)



Mar 30 10:48:57 node1 systemd[1]: Starting Corosync Cluster Engine...
Mar 30 10:49:58 node1 corosync[9937]: Starting Corosync Cluster Engine 
(corosync): [FAILED]
Mar 30 10:49:58 node1 systemd[1]: corosync.service: control process exited, 
code=exited status=1
Mar 30 10:49:58 node1 systemd[1]: Failed to start Corosync Cluster Engine.
Mar 30 10:49:58 node1 systemd[1]: Unit corosync.service entered failed state.
Mar 30 10:49:58 node1 systemd[1]: corosync.service failed.

Please find the corosync logs error:

Mar 30 10:49:52 [9947] node1 corosync debug   [MAIN  ] Denied connection, 
corosync is not ready
Mar 30 10:49:52 [9947] node1 corosync warning [QB] Denied connection, is 
not ready (9948-10497-23)
Mar 30 10:49:52 [9947] node1 corosync debug   [MAIN  ] 
cs_ipcs_connection_destroyed()
Mar 30 10:49:52 [9947] node1 corosync debug   [MAIN  ] Denied connection, 
corosync is not ready
Mar 30 10:49:57 [9947] node1 corosync debug   [MAIN  ] 
cs_ipcs_connection_destroyed()
Mar 30 10:49:58 [9947] node1 corosync notice  [MAIN  ] Node was shut down by a 
signal
Mar 30 10:49:58 [9947] node1 corosync notice  [SERV  ] Unloading all Corosync 
service engines.
Mar 30 10:49:58 [9947] node1 corosync info[QB] withdrawing server 
sockets
Mar 30 10:49:58 [9947] node1 corosync debug   [QB] qb_ipcs_unref() - 
destroying
Mar 30 10:49:58 [9947] node1 corosync notice  [SERV  ] Service engine unloaded: 
corosync vote quorum service v1.0
Mar 30 10:49:58 [9947] node1 corosync info[QB] withdrawing server 
sockets
Mar 30 10:49:58 [9947] node1 corosync debug   [QB] qb_ipcs_unref() - 
destroying
Mar 30 10:49:58 [9947] node1 corosync notice  [SERV  ] Service engine unloaded: 
corosync configuration map access
Mar 30 10:49:58 [9947] node1 corosync info[QB] withdrawing server 
sockets
Mar 30 10:49:58 [9947] node1 corosync debug   [QB] qb_ipcs_unref() - 
destroying
Mar 30 10:49:58 [9947] node1 corosync notice  [SERV  ] Service engine unloaded: 
corosync configuration service
Mar 30 10:49:58 [9947] node1 corosync info[QB] withdrawing server 
sockets
Mar 30 10:49:58 [9947] node1 corosync debug   [QB] qb_ipcs_unref() - 
destroying
Mar 30 10:49:58 [9947] node1 corosync notice  [SERV  ] Service engine unloaded: 
corosync cluster closed process group service v1.01
Mar 30 10:49:58 [9947] node1 corosync info[QB] withdrawing server 
sockets
Mar 30 10:49:58 [9947] node1 corosync debug   [QB] qb_ipcs_unref() - 
destroying
Mar 30 10:49:58 [9947] node1 corosync notice  [SERV  ] Service engine unloaded: 
corosync cluster quorum service v0.1
Mar 30 10:49:58 [9947] node1 corosync notice  [SERV  ] Service engine unloaded: 
corosync profile loading service
Mar 30 10:49:58 [9947] node1 corosync debug   [TOTEM ] sending join/leave 
message
Mar 30 10:49:58 [9947] node1 corosync notice  [MAIN  ] Corosync Cluster Engine 
exiting normally


While try manually start corosync service also getting below error.


[root@node1 ~]# bash -x /usr/share/corosync/corosync start
+ desc='Corosync Cluster Engine'
+ prog=corosync
+ PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/sbin
+ '[' -f /etc/sysconfig/corosync ']'
+ . /etc/sysconfig/corosync
++ COROSYNC_INIT_TIMEOUT=60
++ COROSYNC_OPTIONS=
+ case '/etc/sysconfig' in
+ '[' -f /etc/init.d/functions ']'
+ . /etc/init.d/functions
++ TEXTDOMAIN=initscripts
++ umask 022
++ PATH=/sbin:/usr/sbin:/bin:/usr/bin
++ export PATH
++ '[' 28864 -ne 1 -a -z '' ']'
++ '[' -d /run/systemd/system ']'
++ case "$0" in
++ '[' -z '' ']'
++ COLUMNS=80
++ '[' -z '' ']'
++ '[' -c /dev/stderr -a -r /dev/stderr ']'
+++ /sbin/consoletype
++ CONSOLETYPE=pty
++ '[' -z '' ']'
++ '[' -z '' ']'
++ '[' -f /etc/sysconfig/i18n -o -f /etc/locale.conf ']'
++ . /etc/profile.d/lang.sh
++ unset LANGSH_SOURCED
++ '[' -z '' ']'
++ '[' -f /etc/sysconfig/init ']'
++ . /etc/sysconfig/init
+++ BOOTUP=color
+++ RES_COL=60
+++ MOVE_TO_COL='echo -en \033[60G'
+++ SETCOLOR_SUCCESS='echo -en \033[0;32m'
+++ SETCOLOR_FAILURE='echo -en \033[0;31m'
+++ SETCOLOR_WARNING='echo -en \033[0;33m'
+++ SETCOLOR_NORMAL='echo -en \033[0;39m'
++ '[' pty = serial ']'
++