Hi all, I'm trying to use Pacemaker on a Amazon Web Services' EC2 to automatically reassign elastic IPs (Amazon's equivalent to floating or virtual IPs) in the event of a node failure. The setup I'm testing with is two elastic IPs which will be assigned to any pair of hosts in a three node cluster. The full config is below for reference.
Things usually work very well -- the IP of a failed machine is usually automatically reassigned to another host. However, sometimes the cluster seems to get stuck trying to "start" the IP resource on the other host. When this happens, crm_mon will show a line something like this: elastic-ip-A (ocf::ec2:elastic-ip): Started test1 (unmanaged) FAILED After a bit of time, it will change to: elastic-ip-A (ocf::ec2:elastic-ip): Stopped Once the system hits this state, the failed resource will remain stopped until I do a "crm_resource -P", at which point the elastic IP will be assigned to a node and started. I suspect what's happening here is that the start action in the elastic IP resource agent sometimes times out. Rather than arbitrarily increasing the timeout value, though, I'd rather Pacemaker simply retry failed start requests. I've tried setting monitor operations and the "failure-timeout" parameter on the elastic IP resources, but this doesn't seem to fix the problem. Any ideas? Has anyone had success getting Pacemaker to reliably switch elastic IPs on EC2? Thanks, Andrew ========= > crm configure show node $id="8780681b-f79f-47c5-9f92-ad3fc7a3584e" test3 \ attributes class="www" node $id="8be1d887-d333-4a15-b72d-ac1950973e2c" test2 \ attributes class="www" node $id="b3a4cc48-7707-4e47-aa10-3cd34230cebc" test1 \ attributes class="www" primitive elastic-ip-A ocf:ec2:elastic-ip \ op monitor on-fail="restart" interval="30" \ params ip="184.73.193.93" \ meta is-managed="true" resource-stickiness="10" failure-timeout="90" primitive elastic-ip-A-email ocf:heartbeat:MailTo \ params email="ad...@pagerduty.com" subject="TEST Elastic IP A flip!" primitive elastic-ip-B ocf:ec2:elastic-ip \ op monitor on-fail="restart" interval="30" \ params ip="184.72.236.247" \ meta is-managed="true" resource-stickiness="10" failure-timeout="90" primitive elastic-ip-B-email ocf:heartbeat:MailTo \ params email="ad...@pagerduty.com" subject="TEST Elastic IP B flip!" location location-elastic-ip-A elastic-ip-A \ rule $id="elastic-ip-A-only-on-www" -inf: class ne www location location-elastic-ip-B elastic-ip-B \ rule $id="elastic-ip-B-only-on-www" -inf: class ne www colocation elastic-ip-A-email-colo 10: elastic-ip-A-email elastic-ip-A colocation elastic-ip-B-email-colo 10: elastic-ip-B-email elastic-ip-B colocation elastic-ip-colo-1 -inf: elastic-ip-B elastic-ip-A property $id="cib-bootstrap-options" \ dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \ cluster-infrastructure="Heartbeat" \ stonith-enabled="false" \ last-lrm-refresh="1290062815" _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems