[ClusterLabs] The crmd process exited: Generic Pacemaker error (201)

2018-09-29 Thread lkxjtu


Version information
[root@paas-controller-172-167-40-24:~]$ rpm -q corosync
corosync-2.4.0-9.el7_4.2.x86_64
[root@paas-controller-172-167-40-24:~]$ rpm -q pacemaker
pacemaker-1.1.16-12.el7_4.2.x86_64

The crmd process exited with error code of 201. The pacemakerd process tried to 
fork 100 times, exceeding the threshold, and the crmd process exited forever. 
Here is the last attempt log of forking the crmd process.

I have two questions. The first one is why the crmd process exits? And the 
second question is whether I can set the threshold for retry times? Thank you 
very much!



Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd:error: 
pcmk_child_exit:The crmd process (83749) exited: Generic Pacemaker error 
(201)
Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd:   notice: 
pcmk_process_exit:  Respawning failed child process: crmd
Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd: info: 
start_child:Using uid=189 and group=189 for process crmd
Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd: info: 
start_child:Forked child 88033 for process crmd
Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd: info: 
mcp_cpg_deliver:Ignoring process list sent by peer for local node
Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd: info: 
mcp_cpg_deliver:Ignoring process list sent by peer for local node
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
crm_log_init:   Changed active directory to /var/lib/pacemaker/cores
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
main:   CRM Git Version: 1.1.16-12.el7_4.2 (94ff4df)
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
do_log: Input I_STARTUP received in state S_STARTING from crmd_init
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
get_cluster_type:   Verifying cluster type: 'corosync'
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
get_cluster_type:   Assuming an active 'corosync' cluster
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
do_cib_control: CIB connection established
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:   notice: 
crm_cluster_connect:Connecting to cluster infrastructure: corosync
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
crm_get_peer:   Created entry 
ebd1fc7d-5c48-4c81-85ec-bad8a3f6fcb1/0x7fe04dec49a0 for node 
172.167.40.24/167040024 (1 total)
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
crm_get_peer:   Node 167040024 is now known as 172.167.40.24
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
peer_update_callback:   172.167.40.24 is now in unknown state
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
crm_get_peer:   Node 167040024 has uuid 167040024
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
crm_update_peer_proc:   cluster_connect_cpg: Node 172.167.40.24[167040024] 
- corosync-cpg is now online
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
peer_update_callback:   Client 172.167.40.24/peer now has status [online] 
(DC=, changed=400)
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
init_cs_connection_once:Connection to 'corosync': established
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:   notice: 
cluster_connect_quorum: Quorum acquired
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
do_ha_control:  Connected to the cluster
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
lrmd_ipc_connect:   Connecting to lrmd
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
do_lrm_control: LRM connection established
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
do_started: Delaying start, no membership data (0010)
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
do_started: Delaying start, no membership data (0010)
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
pcmk_quorum_notification:   Quorum retained  membership=4 members=1
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:   notice: 
crm_update_peer_state_iter: Node 172.167.40.24 state is now member  
nodeid=167040024 previous=unknown source=pcmk_quorum_notification
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
peer_update_callback:   Cluster node 172.167.40.24 is now member (was in 
unknown state)
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
do_started: Delaying start, Config not read (0040)

Re: [ClusterLabs] The crmd process exited: Generic Pacemaker error (201)

2018-10-01 Thread Ken Gaillot
On Sat, 2018-09-29 at 22:42 +0800, lkxjtu wrote:
> 
> Version information
> [root@paas-controller-172-167-40-24:~]$ rpm -q corosync
> corosync-2.4.0-9.el7_4.2.x86_64
> [root@paas-controller-172-167-40-24:~]$ rpm -q pacemaker
> pacemaker-1.1.16-12.el7_4.2.x86_64
> 
> The crmd process exited with error code of 201. The pacemakerd
> process tried to fork 100 times, exceeding the threshold, and the
> crmd process exited forever. Here is the last attempt log of forking
> the crmd process.
> 
> I have two questions. The first one is why the crmd process exits?
> And the second question is whether I can set the threshold for retry
> times? Thank you very much!

The cause is unclear from these logs. You'll have to look at the first
sign in the logs before this of any warning or error condition.

The limit is hardcoded.

> 
> 
> 
> Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd:   
> error: pcmk_child_exit:    The crmd process (83749) exited: Generic
> Pacemaker error (201)
> Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd:  
> notice: pcmk_process_exit:  Respawning failed child process: crmd
> Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd:
> info: start_child:    Using uid=189 and group=189 for process
> crmd
> Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd:
> info: start_child:    Forked child 88033 for process crmd
> Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd:
> info: mcp_cpg_deliver:    Ignoring process list sent by peer for
> local node
> Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd:
> info: mcp_cpg_deliver:    Ignoring process list sent by peer for
> local node
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: crm_log_init:   Changed active directory to
> /var/lib/pacemaker/cores
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: main:   CRM Git Version: 1.1.16-12.el7_4.2 (94ff4df)
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: do_log: Input I_STARTUP received in state S_STARTING from
> crmd_init
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: get_cluster_type:   Verifying cluster type: 'corosync'
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: get_cluster_type:   Assuming an active 'corosync' cluster
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: do_cib_control: CIB connection established
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:  
> notice: crm_cluster_connect:    Connecting to cluster
> infrastructure: corosync
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: crm_get_peer:   Created entry ebd1fc7d-5c48-4c81-85ec-
> bad8a3f6fcb1/0x7fe04dec49a0 for node 172.167.40.24/167040024 (1
> total)
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: crm_get_peer:   Node 167040024 is now known as
> 172.167.40.24
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: peer_update_callback:   172.167.40.24 is now in unknown
> state
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: crm_get_peer:   Node 167040024 has uuid 167040024
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: crm_update_peer_proc:   cluster_connect_cpg: Node
> 172.167.40.24[167040024] - corosync-cpg is now online
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: peer_update_callback:   Client 172.167.40.24/peer now has
> status [online] (DC=, changed=400)
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: init_cs_connection_once:    Connection to 'corosync':
> established
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:  
> notice: cluster_connect_quorum: Quorum acquired
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: do_ha_control:  Connected to the cluster
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: lrmd_ipc_connect:   Connecting to lrmd
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: do_lrm_control: LRM connection established
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: do_started: Delaying start, no membership data
> (0010)
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: do_started: Delaying start, no membership data
> (0010)
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:
> info: pcmk_quorum_notification:   Quorum retained  membership=4
> members=1
> Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:  
> notice: crm_update_peer_state_iter: Node 172.167.40.24 state is now
> member  nodeid=167040024 p