On Sat, Nov 20, 2010 at 8:02 AM, Andrew Miklas <and...@pagerduty.com> wrote:
> Hi all,
> I'm trying to use Pacemaker on a Amazon Web Services' EC2 to
> automatically reassign elastic IPs (Amazon's equivalent to floating or
> virtual IPs) in the event of a node failure.  The setup I'm testing
> with is two elastic IPs which will be assigned to any pair of hosts in
> a three node cluster.  The full config is below for reference.
> Things usually work very well -- the IP of a failed machine is usually
> automatically reassigned to another host.  However, sometimes the
> cluster seems to get stuck trying to "start" the IP resource on the
> other host.
> When this happens, crm_mon will show a line something like this:
> elastic-ip-A    (ocf::ec2:elastic-ip):  Started test1 (unmanaged) FAILED
> After a bit of time, it will change to:
> elastic-ip-A    (ocf::ec2:elastic-ip):  Stopped
> Once the system hits this state, the failed resource will remain
> stopped until I do a "crm_resource -P", at which point the elastic IP
> will be assigned to a node and started.
> I suspect what's happening here is that the start action in the
> elastic IP resource agent sometimes times out.  Rather than
> arbitrarily increasing the timeout value, though, I'd rather Pacemaker
> simply retry failed start requests.

What do you think you gain by not increasing the timeout?
We don't sit around doing nothing if it completes in only a fraction
of the allocated time.

In any case, if you really want, check out the start-failure-is-fatal
option (man pengine).

> I've tried setting monitor operations and the "failure-timeout"
> parameter on the elastic IP resources, but this doesn't seem to fix
> the problem.

Did you set cluster-recheck-interval appropriately? (man crmd)

> Any ideas?  Has anyone had success getting Pacemaker to reliably
> switch elastic IPs on EC2?
> Thanks,
> Andrew
> =========
>  > crm configure show
> node $id="8780681b-f79f-47c5-9f92-ad3fc7a3584e" test3 \
>        attributes class="www"
> node $id="8be1d887-d333-4a15-b72d-ac1950973e2c" test2 \
>        attributes class="www"
> node $id="b3a4cc48-7707-4e47-aa10-3cd34230cebc" test1 \
>        attributes class="www"
> primitive elastic-ip-A ocf:ec2:elastic-ip \
>        op monitor on-fail="restart" interval="30" \
>        params ip="" \
>        meta is-managed="true" resource-stickiness="10" failure-timeout="90"
> primitive elastic-ip-A-email ocf:heartbeat:MailTo \
>        params email="ad...@pagerduty.com" subject="TEST Elastic IP A flip!"
> primitive elastic-ip-B ocf:ec2:elastic-ip \
>        op monitor on-fail="restart" interval="30" \
>        params ip="" \
>        meta is-managed="true" resource-stickiness="10" failure-timeout="90"
> primitive elastic-ip-B-email ocf:heartbeat:MailTo \
>        params email="ad...@pagerduty.com" subject="TEST Elastic IP B flip!"
> location location-elastic-ip-A elastic-ip-A \
>        rule $id="elastic-ip-A-only-on-www" -inf: class ne www
> location location-elastic-ip-B elastic-ip-B \
>        rule $id="elastic-ip-B-only-on-www" -inf: class ne www
> colocation elastic-ip-A-email-colo 10: elastic-ip-A-email elastic-ip-A
> colocation elastic-ip-B-email-colo 10: elastic-ip-B-email elastic-ip-B
> colocation elastic-ip-colo-1 -inf: elastic-ip-B elastic-ip-A
> property $id="cib-bootstrap-options" \
>        dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
>        cluster-infrastructure="Heartbeat" \
>        stonith-enabled="false" \
>        last-lrm-refresh="1290062815"
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
Linux-HA mailing list
See also: http://linux-ha.org/ReportingProblems

Reply via email to