Re: [ClusterLabs] Stack: unknown and all nodes offline

Ken Gaillot Thu, 10 Dec 2015 12:43:49 -0800

On 12/10/2015 12:45 PM, Louis Munro wrote:
> Hello all,
> 
> I am trying to get a Corosync 2 cluster going on CentOS 6.7 but I am running 
> in a bit of a problem with either Corosync or Pacemaker.
> crm reports that all my nodes are offline and the stack is unknown (I am not 
> sure if that is relevant).
> 
> I believe both nodes are actually present and seen in corosync, but they may 
> not be considered as such by pacemaker.
> I have messages in the logs saying that the processes cannot get the node 
> name and default to uname -n: 
> 
> Dec 10 13:38:53 [2236] hack1.example.com       crmd:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513528
> Dec 10 13:38:53 [2236] hack1.example.com       crmd:   notice: get_node_name: 
> Defaulting to uname -n for the local corosync node name
> Dec 10 13:38:53 [2236] hack1.example.com       crmd:     info: crm_get_peer:  
> Node 739513528 is now known as hack1.example.com
> 
> The uname -n is correct as far that is concerned.
> 
> 
> Does this mean anything to anyone here? 
> 
> 
> [Lots of details to follow]...
> 
> I compiled my own versions of Corosync, Pacemaker, crm and the 
> resource-agents seemingly without problems.
> 
> Here is what I currently have installed:
> 
> # corosync -v
> Corosync Cluster Engine, version '2.3.5'
> Copyright (c) 2006-2009 Red Hat, Inc.
> 
> # pacemakerd -F
> Pacemaker 1.1.13 (Build: 5b41ae1)
>  Supporting v3.0.10:  generated-manpages agent-manpages ascii-docs ncurses 
> libqb-logging libqb-ipc lha-fencing upstart nagios  corosync-native 
> atomic-attrd libesmtp acls
> 
> # crm --version
> crm 2.2.0-rc3
> 
> 
> 
> Here is the output of crm status:
> 
> # crm status
> Last updated: Thu Dec 10 12:47:50 2015                Last change: Thu Dec 10 
> 12:02:33 2015 by root via cibadmin on hack1.example.com
> Stack: unknown
> Current DC: NONE
> 2 nodes and 0 resources configured
> 
> OFFLINE: [ hack1.example.com hack2.example.com ]
> 
> Full list of resources:
> 
> {nothing to see here}
> 
> 
> 
> # corosync-cmapctl | grep members
> runtime.totem.pg.mrp.srp.members.739513528.config_version (u64) = 0
> runtime.totem.pg.mrp.srp.members.739513528.ip (str) = r(0) ip(172.20.20.184)
> runtime.totem.pg.mrp.srp.members.739513528.join_count (u32) = 1
> runtime.totem.pg.mrp.srp.members.739513528.status (str) = joined
> runtime.totem.pg.mrp.srp.members.739513590.config_version (u64) = 0
> runtime.totem.pg.mrp.srp.members.739513590.ip (str) = r(0) ip(172.20.20.246)
> runtime.totem.pg.mrp.srp.members.739513590.join_count (u32) = 1
> runtime.totem.pg.mrp.srp.members.739513590.status (str) = joined
> 
> 
> # uname -n
> hack1.example.com
> 
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 739513528
> RING ID 0
>       id      = 172.20.20.184
>       status  = ring 0 active with no faults
> 
> 
> # uname -n
> hack2.example.com
> 
> 
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 739513590
> RING ID 0
>       id      = 172.20.20.246
>       status  = ring 0 active with no faults
> 
> 
> 
> 
> Shouldn’t I see both nodes in the same ring?


They are in the same ring, but the cfgtool will only print the local id.

> My corosync config is currently defined as:
> 
> # egrep -v '#' /etc/corosync/corosync.conf
> totem {
>       version: 2
> 
>       crypto_cipher: none
>       crypto_hash: none
>       clear_node_high_bit: yes
>       cluster_name: hack_cluster
>       interface {
>               ringnumber: 0
>               bindnetaddr: 172.20.0.0
>               mcastaddr: 239.255.1.1
>               mcastport: 5405
>               ttl: 1
>       }
> 
> }
> 
> logging {
>       fileline: on
>       to_stderr: no
>       to_logfile: yes
>       logfile: /var/log/cluster/corosync.log
>       to_syslog: yes
>       debug: off
>       timestamp: on
>       logger_subsys {
>               subsys: QUORUM
>               debug: off
>       }
> }
> 
> # cat /etc/corosync/service.d/pacemaker
> service {
>     name: pacemaker
>     ver: 1
> }

You don't want this section if you're using corosync 2. That's the old
"plugin" used with corosync 1.

> 
> 
> And here is my pacemaker configuration:
> 
> # crm config show xml
> <?xml version="1.0" ?>
> <cib num_updates="0" update-origin="hack1.example.com" 
> crm_feature_set="3.0.10" validate-with="pacemaker-2.4" 
> update-client="cibadmin" epoch="13" admin_epoch="0" update-user="root" 
> cib-last-written="Thu Dec 10 13:35:06 2015">
>   <configuration>
>     <crm_config>
>       <cluster_property_set id="cib-bootstrap-options">
>         <nvpair name="stonith-enabled" value="false" 
> id="cib-bootstrap-options-stonith-enabled"/>
>         <nvpair name="no-quorum-policy" value="ignore" 
> id="cib-bootstrap-options-no-quorum-policy"/>
>       </cluster_property_set>
>     </crm_config>
>     <nodes>
>       <node uname="hack1.example.com" id="hack1.example.com">
>         <instance_attributes id="hack1.example.com-instance_attributes">
>           <nvpair name="standby" value="off" 
> id="hack1.example.com-instance_attributes-standby"/>
>         </instance_attributes>
>       </node>
>       <node uname="hack2.example.com" id="hack2.example.com">
>         <instance_attributes id="hack2.example.com-instance_attributes">
>           <nvpair name="standby" value="off" 
> id="hack2.example.com-instance_attributes-standby"/>
>         </instance_attributes>
>       </node>
>     </nodes>
>     <resources/>
>     <constraints/>
>   </configuration>
> </cib>
> 
> 
> 
> 
> 
> 
> And finally some logs that might be relevant: 
> 
> Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [MAIN  ] 
> main.c:1227 Corosync Cluster Engine ('2.3.5'): started and ready to provide 
> service.
> Dec 10 13:38:50 [2227] hack1.example.com corosync info    [MAIN  ] 
> main.c:1228 Corosync built-in features: pie relro bindnow
> Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [TOTEM ] 
> totemnet.c:248 Initializing transport (UDP/IP Multicast).
> Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [TOTEM ] 
> totemcrypto.c:579 Initializing transmit/receive security (NSS) crypto: none 
> hash: none
> Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [TOTEM ] 
> totemudp.c:671 The network interface [172.20.20.184] is now up.
> Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [SERV  ] 
> service.c:174 Service engine loaded: corosync configuration map access [0]
> Dec 10 13:38:50 [2227] hack1.example.com corosync info    [QB    ] 
> ipc_setup.c:377 server name: cmap
> Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [SERV  ] 
> service.c:174 Service engine loaded: corosync configuration service [1]
> Dec 10 13:38:50 [2227] hack1.example.com corosync info    [QB    ] 
> ipc_setup.c:377 server name: cfg
> Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [SERV  ] 
> service.c:174 Service engine loaded: corosync cluster closed process group 
> service v1.01 [2]
> Dec 10 13:38:50 [2227] hack1.example.com corosync info    [QB    ] 
> ipc_setup.c:377 server name: cpg
> Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [SERV  ] 
> service.c:174 Service engine loaded: corosync profile loading service [4]
> Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [SERV  ] 
> service.c:174 Service engine loaded: corosync cluster quorum service v0.1 [3]
> Dec 10 13:38:50 [2227] hack1.example.com corosync info    [QB    ] 
> ipc_setup.c:377 server name: quorum
> Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [TOTEM ] 
> totemsrp.c:2095 A new membership (172.20.20.184:300) was formed. Members 
> joined: 739513528
> Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [MAIN  ] main.c:305 
> Completed service synchronization, ready to provide service.
> Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [TOTEM ] 
> totemsrp.c:2095 A new membership (172.20.20.184:304) was formed. Members 
> joined: 739513590
> Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [MAIN  ] main.c:305 
> Completed service synchronization, ready to provide service.
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:   notice: 
> mcp_read_config:       Configured corosync to accept connections from group 
> 500: OK (1)
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:   notice: main:  
> Starting Pacemaker 1.1.13 (Build: 5b41ae1):  generated-manpages 
> agent-manpages ascii-docs ncurses libqb-logging libqb-ipc lha-fencing upstart 
> nagios  corosync-native atomic-attrd libesmtp acls
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: main:  Maximum 
> core file size is: 18446744073709551615
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: 
> qb_ipcs_us_publish:    server name: pacemakerd
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513528
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:   notice: get_node_name: 
> Could not obtain a node name for corosync nodeid 739513528
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: crm_get_peer:  
> Created entry 212e3751-e79f-4a72-927b-6e0176a9b35c/0x1b68e50 for node 
> (null)/739513528 (1 total)
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: crm_get_peer:  
> Node 739513528 has uuid 739513528
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: 
> crm_update_peer_proc:  cluster_connect_cpg: Node (null)[739513528] - 
> corosync-cpg is now online
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:    error: 
> cluster_connect_quorum:        Corosync quorum is not configured
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513528
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:   notice: get_node_name: 
> Defaulting to uname -n for the local corosync node name
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: crm_get_peer:  
> Node 739513528 is now known as hack1.example.com
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: start_child:   
> Using uid=500 and group=500 for process cib
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: start_child:   
> Forked child 2231 for process cib
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: start_child:   
> Forked child 2232 for process stonith-ng
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: start_child:   
> Forked child 2233 for process lrmd
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: start_child:   
> Using uid=500 and group=500 for process attrd
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: start_child:   
> Forked child 2234 for process attrd
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: start_child:   
> Using uid=500 and group=500 for process pengine
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: start_child:   
> Forked child 2235 for process pengine
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: start_child:   
> Using uid=500 and group=500 for process crmd
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: start_child:   
> Forked child 2236 for process crmd
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: main:  
> Starting mainloop
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: 
> pcmk_cpg_membership:   Node 739513528 joined group pacemakerd (counter=0.0)
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: 
> pcmk_cpg_membership:   Node 739513528 still member of group pacemakerd 
> (peer=hack1.example.com, counter=0.0)
> Dec 10 13:38:52 [2231] hack1.example.com        cib:     info: crm_log_init:  
> Changed active directory to /var/lib/heartbeat/cores/hacluster
> Dec 10 13:38:52 [2231] hack1.example.com        cib:  warning: 
> crm_is_writable:       /var/lib/pacemaker/cib should be owned and r/w by 
> group haclient
> Dec 10 13:38:52 [2231] hack1.example.com        cib:     info: 
> get_cluster_type:      Verifying cluster type: 'corosync'
> Dec 10 13:38:52 [2231] hack1.example.com        cib:     info: 
> get_cluster_type:      Assuming an active 'corosync' cluster
> Dec 10 13:38:52 [2231] hack1.example.com        cib:     info: retrieveCib:   
> Reading cluster configuration file /var/lib/pacemaker/cib/cib.xml (digest: 
> /var/lib/pacemaker/cib/cib.xml.sig)
> Dec 10 13:38:52 [2231] hack1.example.com        cib:     info: 
> validate_with_relaxng: Creating RNG parser context
> Dec 10 13:38:52 [2234] hack1.example.com      attrd:     info: crm_log_init:  
> Changed active directory to /var/lib/heartbeat/cores/hacluster
> Dec 10 13:38:52 [2234] hack1.example.com      attrd:     info: main:  
> Starting up
> Dec 10 13:38:52 [2234] hack1.example.com      attrd:     info: 
> get_cluster_type:      Verifying cluster type: 'corosync'
> Dec 10 13:38:52 [2234] hack1.example.com      attrd:     info: 
> get_cluster_type:      Assuming an active 'corosync' cluster
> Dec 10 13:38:52 [2234] hack1.example.com      attrd:   notice: 
> crm_cluster_connect:   Connecting to cluster infrastructure: corosync
> Dec 10 13:38:52 [2236] hack1.example.com       crmd:     info: crm_log_init:  
> Changed active directory to /var/lib/heartbeat/cores/hacluster
> Dec 10 13:38:52 [2236] hack1.example.com       crmd:   notice: main:  CRM Git 
> Version: 1.1.13 (5b41ae1)
> Dec 10 13:38:52 [2232] hack1.example.com stonith-ng:     info: crm_log_init:  
> Changed active directory to /var/lib/heartbeat/cores/root
> Dec 10 13:38:52 [2232] hack1.example.com stonith-ng:     info: 
> get_cluster_type:      Verifying cluster type: 'corosync'
> Dec 10 13:38:52 [2232] hack1.example.com stonith-ng:     info: 
> get_cluster_type:      Assuming an active 'corosync' cluster
> Dec 10 13:38:52 [2232] hack1.example.com stonith-ng:   notice: 
> crm_cluster_connect:   Connecting to cluster infrastructure: corosync
> Dec 10 13:38:52 [2236] hack1.example.com       crmd:  warning: 
> crm_is_writable:       /var/lib/pacemaker/pengine should be owned and r/w by 
> group haclient
> Dec 10 13:38:52 [2236] hack1.example.com       crmd:  warning: 
> crm_is_writable:       /var/lib/pacemaker/cib should be owned and r/w by 
> group haclient

You definitely want to take care of these permissions issues.

> Dec 10 13:38:52 [2236] hack1.example.com       crmd:     info: do_log:        
> FSA: Input I_STARTUP from crmd_init() received in state S_STARTING
> Dec 10 13:38:52 [2236] hack1.example.com       crmd:     info: 
> get_cluster_type:      Verifying cluster type: 'corosync'
> Dec 10 13:38:52 [2236] hack1.example.com       crmd:     info: 
> get_cluster_type:      Assuming an active 'corosync' cluster
> Dec 10 13:38:52 [2231] hack1.example.com        cib:     info: startCib:      
> CIB Initialization completed successfully
> Dec 10 13:38:52 [2231] hack1.example.com        cib:   notice: 
> crm_cluster_connect:   Connecting to cluster infrastructure: corosync
> Dec 10 13:38:52 [2235] hack1.example.com    pengine:     info: crm_log_init:  
> Changed active directory to /var/lib/heartbeat/cores/hacluster
> Dec 10 13:38:52 [2235] hack1.example.com    pengine:  warning: 
> crm_is_writable:       /var/lib/pacemaker/pengine should be owned and r/w by 
> group haclient
> Dec 10 13:38:52 [2235] hack1.example.com    pengine:     info: 
> qb_ipcs_us_publish:    server name: pengine
> Dec 10 13:38:52 [2235] hack1.example.com    pengine:     info: main:  
> Starting pengine
> Dec 10 13:38:52 [2233] hack1.example.com       lrmd:     info: crm_log_init:  
> Changed active directory to /var/lib/heartbeat/cores/root
> Dec 10 13:38:52 [2233] hack1.example.com       lrmd:     info: 
> qb_ipcs_us_publish:    server name: lrmd
> Dec 10 13:38:52 [2233] hack1.example.com       lrmd:     info: main:  Starting
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513590
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:   notice: get_node_name: 
> Could not obtain a node name for corosync nodeid 739513590
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: crm_get_peer:  
> Created entry f34a4388-23fc-4f86-8932-43c814eb5ad7/0x1b6a920 for node 
> (null)/739513590 (2 total)
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: crm_get_peer:  
> Node 739513590 has uuid 739513590
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: 
> pcmk_cpg_membership:   Node 739513590 still member of group pacemakerd 
> (peer=(null), counter=0.1)
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: 
> crm_update_peer_proc:  pcmk_cpg_membership: Node (null)[739513590] - 
> corosync-cpg is now online
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: crm_get_peer:  
> Node 739513590 is now known as hack2.example.com
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: 
> mcp_cpg_deliver:       Ignoring process list sent by peer for local node
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: 
> mcp_cpg_deliver:       Ignoring process list sent by peer for local node
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: 
> mcp_cpg_deliver:       Ignoring process list sent by peer for local node
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: 
> mcp_cpg_deliver:       Ignoring process list sent by peer for local node
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: 
> mcp_cpg_deliver:       Ignoring process list sent by peer for local node
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: 
> mcp_cpg_deliver:       Ignoring process list sent by peer for local node
> Dec 10 13:38:52 [2230] hack1.example.com pacemakerd:     info: 
> mcp_cpg_deliver:       Ignoring process list sent by peer for local node
> Dec 10 13:38:53 [2234] hack1.example.com      attrd:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513528
> Dec 10 13:38:53 [2234] hack1.example.com      attrd:   notice: get_node_name: 
> Could not obtain a node name for corosync nodeid 739513528
> Dec 10 13:38:53 [2234] hack1.example.com      attrd:     info: crm_get_peer:  
> Created entry 5fbd08d1-fdbf-4f8c-85d0-bbbed1240969/0x180eb70 for node 
> (null)/739513528 (1 total)
> Dec 10 13:38:53 [2234] hack1.example.com      attrd:     info: crm_get_peer:  
> Node 739513528 has uuid 739513528
> Dec 10 13:38:53 [2234] hack1.example.com      attrd:     info: 
> crm_update_peer_proc:  cluster_connect_cpg: Node (null)[739513528] - 
> corosync-cpg is now online
> Dec 10 13:38:53 [2234] hack1.example.com      attrd:   notice: 
> crm_update_peer_state_iter:    crm_update_peer_proc: Node (null)[739513528] - 
> state is now member (was (null))
> Dec 10 13:38:53 [2234] hack1.example.com      attrd:     info: 
> init_cs_connection_once:       Connection to 'corosync': established
> Dec 10 13:38:53 [2232] hack1.example.com stonith-ng:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513528
> Dec 10 13:38:53 [2232] hack1.example.com stonith-ng:   notice: get_node_name: 
> Could not obtain a node name for corosync nodeid 739513528
> Dec 10 13:38:53 [2232] hack1.example.com stonith-ng:     info: crm_get_peer:  
> Created entry c2d2ccef-c87c-4b3c-8004-659f09113048/0x11fa8e0 for node 
> (null)/739513528 (1 total)
> Dec 10 13:38:53 [2232] hack1.example.com stonith-ng:     info: crm_get_peer:  
> Node 739513528 has uuid 739513528
> Dec 10 13:38:53 [2232] hack1.example.com stonith-ng:     info: 
> crm_update_peer_proc:  cluster_connect_cpg: Node (null)[739513528] - 
> corosync-cpg is now online
> Dec 10 13:38:53 [2232] hack1.example.com stonith-ng:   notice: 
> crm_update_peer_state_iter:    crm_update_peer_proc: Node (null)[739513528] - 
> state is now member (was (null))
> Dec 10 13:38:53 [2232] hack1.example.com stonith-ng:     info: 
> init_cs_connection_once:       Connection to 'corosync': established
> Dec 10 13:38:53 [2234] hack1.example.com      attrd:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513528
> Dec 10 13:38:53 [2234] hack1.example.com      attrd:   notice: get_node_name: 
> Defaulting to uname -n for the local corosync node name
> Dec 10 13:38:53 [2234] hack1.example.com      attrd:     info: crm_get_peer:  
> Node 739513528 is now known as hack1.example.com
> Dec 10 13:38:53 [2232] hack1.example.com stonith-ng:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513528
> Dec 10 13:38:53 [2232] hack1.example.com stonith-ng:   notice: get_node_name: 
> Defaulting to uname -n for the local corosync node name
> Dec 10 13:38:53 [2232] hack1.example.com stonith-ng:     info: crm_get_peer:  
> Node 739513528 is now known as hack1.example.com
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513528
> Dec 10 13:38:53 [2231] hack1.example.com        cib:   notice: get_node_name: 
> Could not obtain a node name for corosync nodeid 739513528
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: crm_get_peer:  
> Created entry 97e43f18-db5c-4496-b8fc-bd7b4a6ea711/0x1deba80 for node 
> (null)/739513528 (1 total)
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: crm_get_peer:  
> Node 739513528 has uuid 739513528
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: 
> crm_update_peer_proc:  cluster_connect_cpg: Node (null)[739513528] - 
> corosync-cpg is now online
> Dec 10 13:38:53 [2231] hack1.example.com        cib:   notice: 
> crm_update_peer_state_iter:    crm_update_peer_proc: Node (null)[739513528] - 
> state is now member (was (null))
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: 
> init_cs_connection_once:       Connection to 'corosync': established
> Dec 10 13:38:53 [2234] hack1.example.com      attrd:     info: main:  Cluster 
> connection active
> Dec 10 13:38:53 [2234] hack1.example.com      attrd:     info: 
> qb_ipcs_us_publish:    server name: attrd
> Dec 10 13:38:53 [2234] hack1.example.com      attrd:     info: main:  
> Accepting attribute updates
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513528
> Dec 10 13:38:53 [2231] hack1.example.com        cib:   notice: get_node_name: 
> Defaulting to uname -n for the local corosync node name
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: crm_get_peer:  
> Node 739513528 is now known as hack1.example.com
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: 
> qb_ipcs_us_publish:    server name: cib_ro
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: 
> qb_ipcs_us_publish:    server name: cib_rw
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: 
> qb_ipcs_us_publish:    server name: cib_shm
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: cib_init:      
> Starting cib mainloop
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: 
> cib_file_backup:       Archived previous version as 
> /var/lib/pacemaker/cib/cib-26.raw
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: 
> cib_file_write_with_digest:    Wrote version 0.13.0 of the CIB to disk 
> (digest: d37e6c1873e854e52dd9b2caaa5e4beb)
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: 
> cib_file_write_with_digest:    Reading cluster configuration file 
> /var/lib/pacemaker/cib/cib.dqtKlA (digest: /var/lib/pacemaker/cib/cib.BkV3eX)
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: 
> pcmk_cpg_membership:   Node 739513528 joined group cib (counter=0.0)
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: 
> pcmk_cpg_membership:   Node 739513528 still member of group cib 
> (peer=hack1.example.com, counter=0.0)
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513590
> Dec 10 13:38:53 [2231] hack1.example.com        cib:   notice: get_node_name: 
> Could not obtain a node name for corosync nodeid 739513590
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: crm_get_peer:  
> Created entry 87459e91-7cff-4781-9c9a-5f78385e4b7c/0x1dec1d0 for node 
> (null)/739513590 (2 total)
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: crm_get_peer:  
> Node 739513590 has uuid 739513590
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: 
> pcmk_cpg_membership:   Node 739513590 still member of group cib (peer=(null), 
> counter=0.1)
> Dec 10 13:38:53 [2231] hack1.example.com        cib:     info: 
> crm_update_peer_proc:  pcmk_cpg_membership: Node (null)[739513590] - 
> corosync-cpg is now online
> Dec 10 13:38:53 [2231] hack1.example.com        cib:   notice: 
> crm_update_peer_state_iter:    crm_update_peer_proc: Node (null)[739513590] - 
> state is now member (was (null))
> Dec 10 13:38:53 [2236] hack1.example.com       crmd:     info: 
> do_cib_control:        CIB connection established
> Dec 10 13:38:53 [2236] hack1.example.com       crmd:   notice: 
> crm_cluster_connect:   Connecting to cluster infrastructure: corosync
> Dec 10 13:38:53 [2236] hack1.example.com       crmd:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513528
> Dec 10 13:38:53 [2236] hack1.example.com       crmd:   notice: get_node_name: 
> Could not obtain a node name for corosync nodeid 739513528
> Dec 10 13:38:53 [2236] hack1.example.com       crmd:     info: crm_get_peer:  
> Created entry 09d2eabe-fe50-4b40-b614-4f550dcbc677/0x156d0c0 for node 
> (null)/739513528 (1 total)
> Dec 10 13:38:53 [2236] hack1.example.com       crmd:     info: crm_get_peer:  
> Node 739513528 has uuid 739513528
> Dec 10 13:38:53 [2236] hack1.example.com       crmd:     info: 
> crm_update_peer_proc:  cluster_connect_cpg: Node (null)[739513528] - 
> corosync-cpg is now online
> Dec 10 13:38:53 [2236] hack1.example.com       crmd:     info: 
> init_cs_connection_once:       Connection to 'corosync': established
> Dec 10 13:38:53 [2236] hack1.example.com       crmd:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513528
> Dec 10 13:38:53 [2236] hack1.example.com       crmd:   notice: get_node_name: 
> Defaulting to uname -n for the local corosync node name
> Dec 10 13:38:53 [2236] hack1.example.com       crmd:     info: crm_get_peer:  
> Node 739513528 is now known as hack1.example.com
> Dec 10 13:38:53 [2236] hack1.example.com       crmd:     info: 
> peer_update_callback:  hack1.example.com is now in unknown state
> Dec 10 13:38:54 [2236] hack1.example.com       crmd:    error: 
> cluster_connect_quorum:        Corosync quorum is not configured
> Dec 10 13:38:54 [2234] hack1.example.com      attrd:     info: 
> attrd_cib_connect:     Connected to the CIB after 2 attempts
> Dec 10 13:38:54 [2234] hack1.example.com      attrd:     info: main:  CIB 
> connection active
> Dec 10 13:38:54 [2234] hack1.example.com      attrd:     info: 
> pcmk_cpg_membership:   Node 739513528 joined group attrd (counter=0.0)
> Dec 10 13:38:54 [2234] hack1.example.com      attrd:     info: 
> pcmk_cpg_membership:   Node 739513528 still member of group attrd 
> (peer=hack1.example.com, counter=0.0)
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:   notice: setup_cib:     
> Watching for stonith topology changes
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:     info: 
> qb_ipcs_us_publish:    server name: stonith-ng
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:     info: main:  
> Starting stonith-ng mainloop
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:     info: 
> pcmk_cpg_membership:   Node 739513528 joined group stonith-ng (counter=0.0)
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:     info: 
> pcmk_cpg_membership:   Node 739513528 still member of group stonith-ng 
> (peer=hack1.example.com, counter=0.0)
> Dec 10 13:38:54 [2236] hack1.example.com       crmd:     info: do_ha_control: 
> Connected to the cluster
> Dec 10 13:38:54 [2236] hack1.example.com       crmd:     info: 
> lrmd_ipc_connect:      Connecting to lrmd
> Dec 10 13:38:54 [2234] hack1.example.com      attrd:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513590
> Dec 10 13:38:54 [2234] hack1.example.com      attrd:   notice: get_node_name: 
> Could not obtain a node name for corosync nodeid 739513590
> Dec 10 13:38:54 [2234] hack1.example.com      attrd:     info: crm_get_peer:  
> Created entry 1edc2c5f-1438-4f5c-b5aa-e95861c4205e/0x1814190 for node 
> (null)/739513590 (2 total)
> Dec 10 13:38:54 [2234] hack1.example.com      attrd:     info: crm_get_peer:  
> Node 739513590 has uuid 739513590
> Dec 10 13:38:54 [2234] hack1.example.com      attrd:     info: 
> pcmk_cpg_membership:   Node 739513590 still member of group attrd 
> (peer=(null), counter=0.1)
> Dec 10 13:38:54 [2234] hack1.example.com      attrd:     info: 
> crm_update_peer_proc:  pcmk_cpg_membership: Node (null)[739513590] - 
> corosync-cpg is now online
> Dec 10 13:38:54 [2234] hack1.example.com      attrd:   notice: 
> crm_update_peer_state_iter:    crm_update_peer_proc: Node (null)[739513590] - 
> state is now member (was (null))
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513590
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:   notice: get_node_name: 
> Could not obtain a node name for corosync nodeid 739513590
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:     info: crm_get_peer:  
> Created entry 81f08c2d-6517-4b38-8fdc-51a4ada76af2/0x11fc5c0 for node 
> (null)/739513590 (2 total)
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:     info: crm_get_peer:  
> Node 739513590 has uuid 739513590
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:     info: 
> pcmk_cpg_membership:   Node 739513590 still member of group stonith-ng 
> (peer=(null), counter=0.1)
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:     info: 
> crm_update_peer_proc:  pcmk_cpg_membership: Node (null)[739513590] - 
> corosync-cpg is now online
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:   notice: 
> crm_update_peer_state_iter:    crm_update_peer_proc: Node (null)[739513590] - 
> state is now member (was (null))
> Dec 10 13:38:54 [2231] hack1.example.com        cib:     info: 
> cib_process_request:   Forwarding cib_modify operation for section nodes to 
> master (origin=local/crmd/3)
> Dec 10 13:38:54 [2236] hack1.example.com       crmd:     info: 
> do_lrm_control:        LRM connection established
> Dec 10 13:38:54 [2236] hack1.example.com       crmd:     info: do_started:    
> Delaying start, no membership data (0000000000100000)
> Dec 10 13:38:54 [2236] hack1.example.com       crmd:     info: 
> pcmk_cpg_membership:   Node 739513528 joined group crmd (counter=0.0)
> Dec 10 13:38:54 [2236] hack1.example.com       crmd:     info: 
> pcmk_cpg_membership:   Node 739513528 still member of group crmd 
> (peer=hack1.example.com, counter=0.0)
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513528
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:   notice: get_node_name: 
> Defaulting to uname -n for the local corosync node name
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:     info: 
> init_cib_cache_cb:     Updating device list from the cib: init
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:     info: 
> cib_devices_update:    Updating devices to version 0.13.0
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:   notice: unpack_config: 
> On loss of CCM Quorum: Ignore
> Dec 10 13:38:54 [2232] hack1.example.com stonith-ng:     info: crm_get_peer:  
> Node 739513590 is now known as hack2.example.com
> Dec 10 13:38:54 [2231] hack1.example.com        cib:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513528
> Dec 10 13:38:54 [2231] hack1.example.com        cib:   notice: get_node_name: 
> Defaulting to uname -n for the local corosync node name
> Dec 10 13:38:54 [2236] hack1.example.com       crmd:     info: 
> corosync_node_name:    Unable to get node name for nodeid 739513590
> Dec 10 13:38:54 [2236] hack1.example.com       crmd:   notice: get_node_name: 
> Could not obtain a node name for corosync nodeid 739513590
> Dec 10 13:38:54 [2236] hack1.example.com       crmd:     info: crm_get_peer:  
> Created entry b34ca282-9c15-47de-8c98-cb59d1d8bed4/0x16b5130 for node 
> (null)/739513590 (2 total)
> Dec 10 13:38:54 [2236] hack1.example.com       crmd:     info: crm_get_peer:  
> Node 739513590 has uuid 739513590
> Dec 10 13:38:54 [2236] hack1.example.com       crmd:     info: 
> pcmk_cpg_membership:   Node 739513590 still member of group crmd 
> (peer=(null), counter=0.1)
> Dec 10 13:38:54 [2236] hack1.example.com       crmd:     info: 
> crm_update_peer_proc:  pcmk_cpg_membership: Node (null)[739513590] - 
> corosync-cpg is now online
> Dec 10 13:38:54 [2236] hack1.example.com       crmd:   notice: 
> crmd_enable_notifications:     Notifications disabled
> Dec 10 13:38:54 [2236] hack1.example.com       crmd:     info: do_started:    
> Delaying start, no membership data (0000000000100000)
> Dec 10 13:38:54 [2231] hack1.example.com        cib:     info: 
> cib_process_request:   Completed cib_modify operation for section nodes: OK 
> (rc=0, origin=hack1.example.com/crmd/3, version=0.13.0)
> 
> 
> Any help would be much appreciated.
> 
> Best  regards,
> --
> Louis Munro
> 
> 
> 
> 
> 
> _______________________________________________
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Stack: unknown and all nodes offline

Reply via email to