[Linux-HA] resource unmanaged/failed

Aleksey V. Kashin Thu, 08 Dec 2011 02:31:39 -0800

Hello.

I have two servers (radius1, radius2). I've set up the cluster resource 
- IPaddr2. I used next commands to set up this resource:


# crm configure property stonith-enabled="false"
# crm configure property no-quorum-policy="ignore"
# crm configure primitive raddb_ip ocf:heartbeat:IPaddr2 params 
ip="10.99.2.57" cidr_netmask="32" op monitor interval="15s"
# crm configure group raddb raddb_ip
# crm configure location raddb-prefers-radius1 raddb inf: radius1
# crm configure rsc_defaults resource-stickiness=1000001

All ok.

But sometimes on server radius1 the load is increasing and server is 
swapping and at that moment resource becomes "(unmanaged) FAILED". Below 
I've presented example "unmanaged" resource:

# crm_mon
============
Last updated: Wed Dec  7 14:56:20 2011
Stack: openais
Current DC: radius1 - partition with quorum
Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ radius2 radius1 ]

  Resource Group: raddb
      raddb_ip   (ocf::heartbeat:IPaddr2):       Started radius1 
(unmanaged) FAILED

Failed actions:
     raddb_ip_monitor_15000 (node=radius1, call=4, rc=-2, status=Timed 
Out): unknown exec error
     raddb_ip_stop_0 (node=radius1, call=5, rc=-2, status=Timed Out): 
unknown exec error


I've presented part of /var/log/syslog (radius1) here - 
http://paste.org/41963


In that moment ip address 10.99.2.57 is alive and server responds to 
requests coming to this ip. However sometimes this resource becomes 
completely unavailable and I restart corosync. It's very bad.

I think resource becomes unmanaged because server is using swap and part 
of corosync processes is in swap. I tested this suggestion and when 
server is using a lot of swap resource becomes "unmanaged".

I use debian gnu/linux 5.x and this packages - 
http://people.debian.org/~madkiss/ha/:

# dpkg -l |grep cluster
ii  cluster-glue                                      
1.0.7+hg2618-2~bpo50+1          The reusable cluster components for Linux HA
ii  corosync                                          
1.4.2-1~bpo50+1                 Standards-based cluster framework (daemon an
ii  libcluster-glue                                   
1.0.7+hg2618-2~bpo50+1          Reusable cluster libraries (transitional pac
ii  libcorosync4                                      
1.4.2-1~bpo50+1                 Standards-based cluster framework (libraries
ii  libcrmcluster1                                    
1.1.5-3~bpo50+1                 Pacemaker libraries - CRM
ii  liblrm2                                           
1.0.7+hg2618-2~bpo50+1          Reusable cluster libraries -- liblrm2
ii  libpils2                                          
1.0.7+hg2618-2~bpo50+1          Reusable cluster libraries -- libpils2
ii  libplumb2                                         
1.0.7+hg2618-2~bpo50+1          Reusable cluster libraries -- libplumb2
ii  libplumbgpl2                                      
1.0.7+hg2618-2~bpo50+1          Reusable cluster libraries -- libplumbgpl2
ii  libstonith1                                       
1.0.7+hg2618-2~bpo50+1          Reusable cluster libraries -- libstonith1
ii  pacemaker                                         
1.1.5-3~bpo50+1                 HA cluster resource manager



I can't increase ram on this servers. How can I do that resource isn't 
becomes "unmanaged/failed" ?


With Best Regards.
Aleksey V. Kashin
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] resource unmanaged/failed

Reply via email to