Hi Brett,

My question is this: Would /var being full on the passive node have played a 
role in the cluster's inability to failover during the soft lockup condition on 
the active node? Or perhaps we hit a condition in which our configuration of 
pacemaker was unable to detect this type of failure? I'm basically trying to 
figure out if /var being full on the passive node played a role in the lack of 
failover or if our configuration is inadequate at detecting the type of failure 
we experienced.

I'd say absolutely yes. /var being full probably stopped cluster
traffic or at the least, changes to the cib from being accepted (from
memory cib changes are written to temp files in /var/lib/heartbeat/crm/...).

Thanks for the feedback. This is what I suspected but I wasn't sure if my suspicions were correct. Too bad I don't have a test/dev pacemaker environment to test this situation with, otherwise I could be 100% sure instead of 99% sure.

It can certainly stop ssh sessions from being established.

That it did!


Thoughts?

Just for the list (since I'm sure you've done this or similar already)
I'd suggest you use SNMP monitoring and add an SNMP trap for /var
being 95% full.

Yep, it's something we're on top of.

A useful addition is to mount /var/log on a different
disk/partition/logical volume from /var, that way even if your logs
fill up, the system should still continue to function for a while.

We have /var mounted separately, but not /var/log. Interesting idea. Part of our /var problem was two fold: We had enabled debug logging and iptables logging to diagnose a previous problem and neglected to turn them off again after the diagnosis session which caused unusually high log volume, plus we never enabled logrotate for the firewall so it just grew and grew without being rotated out. Tough way to be reminded of improper configuration...

--Ryan

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to