Hi all,

I'm trying to use Pacemaker on a Amazon Web Services' EC2 to  
automatically reassign elastic IPs (Amazon's equivalent to floating or  
virtual IPs) in the event of a node failure.  The setup I'm testing  
with is two elastic IPs which will be assigned to any pair of hosts in  
a three node cluster.  The full config is below for reference.

Things usually work very well -- the IP of a failed machine is usually  
automatically reassigned to another host.  However, sometimes the  
cluster seems to get stuck trying to "start" the IP resource on the  
other host.

When this happens, crm_mon will show a line something like this:
elastic-ip-A    (ocf::ec2:elastic-ip):  Started test1 (unmanaged) FAILED

After a bit of time, it will change to:
elastic-ip-A    (ocf::ec2:elastic-ip):  Stopped

Once the system hits this state, the failed resource will remain  
stopped until I do a "crm_resource -P", at which point the elastic IP  
will be assigned to a node and started.


I suspect what's happening here is that the start action in the  
elastic IP resource agent sometimes times out.  Rather than  
arbitrarily increasing the timeout value, though, I'd rather Pacemaker  
simply retry failed start requests.

I've tried setting monitor operations and the "failure-timeout"  
parameter on the elastic IP resources, but this doesn't seem to fix  
the problem.

Any ideas?  Has anyone had success getting Pacemaker to reliably  
switch elastic IPs on EC2?



Thanks,


Andrew


=========
 > crm configure show
node $id="8780681b-f79f-47c5-9f92-ad3fc7a3584e" test3 \
        attributes class="www"
node $id="8be1d887-d333-4a15-b72d-ac1950973e2c" test2 \
        attributes class="www"
node $id="b3a4cc48-7707-4e47-aa10-3cd34230cebc" test1 \
        attributes class="www"
primitive elastic-ip-A ocf:ec2:elastic-ip \
        op monitor on-fail="restart" interval="30" \
        params ip="184.73.193.93" \
        meta is-managed="true" resource-stickiness="10" failure-timeout="90"
primitive elastic-ip-A-email ocf:heartbeat:MailTo \
        params email="ad...@pagerduty.com" subject="TEST Elastic IP A flip!"
primitive elastic-ip-B ocf:ec2:elastic-ip \
        op monitor on-fail="restart" interval="30" \
        params ip="184.72.236.247" \
        meta is-managed="true" resource-stickiness="10" failure-timeout="90"
primitive elastic-ip-B-email ocf:heartbeat:MailTo \
        params email="ad...@pagerduty.com" subject="TEST Elastic IP B flip!"
location location-elastic-ip-A elastic-ip-A \
        rule $id="elastic-ip-A-only-on-www" -inf: class ne www
location location-elastic-ip-B elastic-ip-B \
        rule $id="elastic-ip-B-only-on-www" -inf: class ne www
colocation elastic-ip-A-email-colo 10: elastic-ip-A-email elastic-ip-A
colocation elastic-ip-B-email-colo 10: elastic-ip-B-email elastic-ip-B
colocation elastic-ip-colo-1 -inf: elastic-ip-B elastic-ip-A
property $id="cib-bootstrap-options" \
        dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
        cluster-infrastructure="Heartbeat" \
        stonith-enabled="false" \
        last-lrm-refresh="1290062815"

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to