Re: [Pacemaker] Pacemaker and LDAP (389 Directory Service)
Serge Dubrouski writes: > On Mon, Jun 27, 2011 at 3:33 PM, veghead studyblue.com> wrote: > If I remove the co-location, won't the elastic_ip resource just stay where it > is? Regardless of what happens to LDAP? > > Right. That's why I think that you don't really want to do it. You have > to make sure that your IP is up where you LDAP is up. Okay. So I took a step and revamped the configuration to test the elastic_ip less frequently and with a long timeout. I committed the changes, but "crm status" doesn't reflect the resources in question. Here's the new config: ---snip--- # crm configure show node $id="d2b294cf-328f-4481-aa2f-cc7b553e6cde" ldap1.example.ec2 node $id="e2a2e42e-1644-4f7d-8e54-71e1f7531e08" ldap2.example.ec2 primitive elastic_ip lsb:elastic-ip \ op monitor interval="30" timeout="300" on-fail="ignore" requires="nothing" primitive ldap lsb:dirsrv \ op monitor interval="15s" on-fail="standby" requires="nothing" clone ldap-clone ldap colocation ldap-with-eip inf: elastic_ip ldap-clone order ldap-after-eip inf: elastic_ip ldap-clone property $id="cib-bootstrap-options" \ dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \ cluster-infrastructure="Heartbeat" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ stop-all-resources="true" rsc_defaults $id="rsc-options" \ resource-stickiness="100" ---snip--- And here's the output from "crm status": ---snip--- # crm status Last updated: Mon Jun 27 18:50:14 2011 Stack: Heartbeat Current DC: ldap2.studyblue.ec2 (e2a2e42e-1644-4f7d-8e54-71e1f7531e08) - partition with quorum Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87 2 Nodes configured, unknown expected votes 2 Resources configured. Online: [ ldap1.example.ec2 ldap2.example.ec2 ] ---snip--- I restarted the nodes one at a time - first I restarted ldap2, then I restarted ldap1. When ldap1 went down, ldap2 stopped the ldap resource and didn't make any attempt to start the elastic_ip resource: ---snip--- pengine: [12910]: notice: unpack_config: On loss of CCM Quorum: Ignore pengine: [12910]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 pengine: [12910]: info: determine_online_status: Node ldap2.example.ec2 is online pengine: [12910]: notice: native_print: elastic_ip (lsb:elastic-ip): Stopped pengine: [12910]: notice: clone_print: Clone Set: ldap-clone pengine: [12910]: notice: short_print: Stopped: [ ldap:0 ldap:1 ] pengine: [12910]: notice: LogActions: Leave resource elastic_ip (Stopped) pengine: [12910]: notice: LogActions: Leave resource ldap:0(Stopped) pengine: [12910]: notice: LogActions: Leave resource ldap:1(Stopped) ---snip--- After heartbeat/pacemaker came back up on ldap1, it terminated the ldap service on ldap1. Now I'm just confused. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Pacemaker and LDAP (389 Directory Service)
Sorry for the questions. Some days my brain is just slow. :) Serge Dubrouski writes: > If you want to make your LDAP independent from IP just remove your > collocation:colocation ldap-with-eip inf: elastic_ip ldap-clone Is that really what I want to do? I mean, I need the elastic ip assigned to ~one~ of the machines... And if LDAP fails on that machine, I need Pacemaker to start the Elastic IP on the other machine. If I remove the co-location, won't the elastic_ip resource just stay where it is? Regardless of what happens to LDAP? > But I'd rather try to find out why monitoring for IP fails. May bet > it just needs an increased timeout on monitor operation, though it > looks like you've already increased it. What's in your log files > when that monitor fails? Originally, I had the monitor on the elastic_ip resource set to 10 seconds. The error in the logs was: ---snip--- pengine: [16980]: notice: unpack_rsc_op: Operation elastic_ip_monitor_0 found resource elastic_ip active on ldap1.example.ec2 pengine: [16980]: WARN: unpack_rsc_op: Processing failed op elastic_ip_monitor_1 on ldap1.example.ec2: unknown exec error (-2) pengine: [16980]: WARN: unpack_rsc_op: Processing failed op elastic_ip_stop_0 on ldap1.example.ec2: unknown exec error (-2) pengine: [16980]: info: native_add_running: resource elastic_ip isnt managed pengine: [16980]: notice: unpack_rsc_op: Operation ldap:1_monitor_0 found resource ldap:1 active on ldap2.example.ec2 pengine: [16980]: WARN: unpack_rsc_op: Processing failed op elastic_ip_start_0 on ldap2.example.ec2: unknown exec error (-2) pengine: [16980]: notice: native_print: elastic_ip (lsb:elastic-ip): Started ldap1.example.ec2 (unmanaged) FAILED pengine: [16980]: notice: clone_print: Clone Set: ldap-clone pengine: [16980]: notice: short_print: Stopped: [ ldap:0 ldap:1 ] pengine: [16980]: info: get_failcount: elastic_ip has failed INFINITY times on ldap1.example.ec2 pengine: [16980]: WARN: common_apply_stickiness: Forcing elastic_ip away from ldap1.example.ec2 after 100 failures (max=100) pengine: [16980]: info: get_failcount: elastic_ip has failed INFINITY times on ldap2.example.ec2 pengine: [16980]: WARN: common_apply_stickiness: Forcing elastic_ip away from ldap2.example.ec2 after 100 failures (max=100) pengine: [16980]: info: native_color: Unmanaged resource elastic_ip allocated to 'nowhere': failed pengine: [16980]: notice: RecurringOp: Start recurring monitor (15s) for ldap:0 on ldap1.example.ec2 pengine: [16980]: notice: RecurringOp: Start recurring monitor (15s) for ldap:1 on ldap2.example.ec2 pengine: [16980]: notice: LogActions: Leave resource elastic_ip (Started unmanaged) pengine: [16980]: notice: LogActions: Start ldap:0 (ldap1.example.ec2) pengine: [16980]: notice: LogActions: Start ldap:1 (ldap2.example.ec2) ---snip--- Now that I have set the monitor interval for the elastic_ip resource to "0", it keeps thinking everything is either stopped or should be stopped: ---snip--- pengine: [7287]: notice: unpack_rsc_op: Operation elastic_ip_monitor_0 found resource elastic_ip active on ldap1.example.ec2 pengine: [7287]: notice: unpack_rsc_op: Operation ldap:0_monitor_0 found resource ldap:0 active on ldap2.example.ec2 pengine: [7287]: notice: native_print: elastic_ip (lsb:elastic-ip): Stopped pengine: [7287]: notice: clone_print: Clone Set: ldap-clone pengine: [7287]: notice: short_print: Stopped: [ ldap:0 ldap:1 ] pengine: [7287]: notice: LogActions: Leave resource elastic_ip (Stopped) pengine: [7287]: notice: LogActions: Leave resource ldap:0 (Stopped) pengine: [7287]: notice: LogActions: Leave resource ldap:1 (Stopped) ---snip--- Very strange. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Pacemaker and LDAP (389 Directory Service)
veghead writes: > Pair of LDAP servers running 389 (formerly Fedora DS) in > high availability using Pacemaker with a floating IP. > In addition, 389 supports multi-master replication, > where all changes on one node are automatically > replicated on one or more other nodes. I'm so close, but I'm still having issues. I'm running these on EC2 using an ElasticIP as the "floating" ip. Unfortunately, I have found that requests for the status of the ElasticIP occasionally fail for no apparent reason, even thought he ElasticIP is actually working fine. Once they fail, that triggers a failover and creates a mess. What I'd like to do is: * Run LDAP service on both nodes * Ignore the status of the ElasticIP resource and only trigger a fail-over when the LDAP service fails. I feel like my config is close, but the cluster keeps wanting to stop the resources. Here's my current config: ---snip--- primitive elastic_ip lsb:elastic-ip \ op monitor interval="0" timeout="300" on-fail="ignore" requires="nothing" primitive ldap lsb:dirsrv \ op monitor interval="15s" on-fail="standby" requires="nothing" clone ldap-clone ldap colocation ldap-with-eip inf: elastic_ip ldap-clone order ldap-after-eip inf: elastic_ip ldap-clone property $id="cib-bootstrap-options" \ dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \ cluster-infrastructure="Heartbeat" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ stop-all-resources="true" rsc_defaults $id="rsc-options" \ resource-stickiness="100" ---snip--- Any suggestions as to what I'm doing wrong? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Pacemaker and LDAP (389 Directory Service)
Dejan Muhamedagic writes: > lsb:dirsrv doesn't understand master/slave. That's OK, none of > LSB agents do. You can only try to use clones (clone ldap-clone > ldap ...). That worked perfectly. I was getting master/slave and basic clone stuff mixed up. Thanks! ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Pacemaker and LDAP (389 Directory Service)
I'm trying to setup a pair of LDAP servers running 389 (formerly Fedora DS) in high availability using Pacemaker with a floating IP. In addition, 389 supports multi-master replication, where all changes on one node are automatically replicated on one or more other nodes. I'm fairly close to having everything working. Failover works just fine. And multi-master replication works fine. However, my current Pacemaker config stops the directory service on the non-active node. Which means that the backup node is not receiving replication data from the other node. What is the right way to setup Pacemaker so that: 1) LDAP directory services are always running on both nodes 2) Floating IP is assigned to one of the nodes 3) Failover occurs if the master node dies or LDAP service stops running on the master Initially, my Pacemaker config looked like the following: ---snip--- property stonith-enabled=false property no-quorum-policy=ignore rsc_defaults resource-stickiness=100 primitive elastic_ip lsb:elastic-ip op monitor interval="10s" primitive dirsrv lsb:dirsrv op monitor interval="10s" order dirsrv-after-eip inf: elastic_ip dirsrv colocation dirsrv-with-eip inf: dirsrv elastic_ip ---snip--- I then explored using Pacemaker clones: ---snip--- property stonith-enabled=false property no-quorum-policy=ignore rsc_defaults resource-stickiness=100 primitive elastic_ip lsb:elastic-ip op monitor interval="10s" primitive ldap lsb:dirsrv op monitor interval="15s" role="Slave" timeout="10s" op monitor interval="16s" role="Master" timeout="10s" ms ldap-clone ldap meta master-max=1 master-node-max=1 clone-max=3 clone-node- max=1 notify-true colocation ldap-with-eip inf: elastic_ip ldap-clone:Master order eip-after-promote inf: ldap-clone:promote elastic_ip:start order ldap-after-eip inf: elastic_ip ldap-clone ---snip--- Unfortunately, that doesn't quite work. pengine complains that "ldap-clone: Promoted 0 instances of a possible 1 to master" and then stops the LDAP service. I'm sure I'm missing something simple... any suggestions would be greatly appreciated. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Automating Pacemaker Setup
Dejan Muhamedagic writes: > On Fri, May 27, 2011 at 08:21:08PM +0000, veghead wrote: > > 1) Is there a way to force crm to accept my configuration request > > ~before~ starting the second node? > > No before the DC is elected. There are two settings: dc-deadtime > and startup-fencing which can reduce the time for DC election. > Note that disabling startup fencing is not recommended. But I > don't know what's your use case. YMMV. Well, I'm probably not quite the typical use case. We're using Amazon EC2 to setup and tear down testing environments. I have automated the entire process except for setting up Pacemaker. Beyond testing environments, I'd like to automate Pacemaker setup to cover the scenario where all nodes in a Pacemaker cluster crash and the entire configuration is lost. Obviously, once one node is running, setting up additional nodes becomes easy. It's just the bootstrap phase that's a challenge to automate. > > 2) Is there a way to tell Pacemaker to ignore quorum requirements > > ~before~ starting additional nodes? > > > > 2) Is there an alternate way to configure Pacemaker? > > Yes, you can modify the CIB _before_ starting pacemaker. Sth > like: > > CIB_file=/var/lib/heartbeat/crm/cib.xml crm configure ... > > But in that case you need to remove cib.xml.sig. Then you have to > make sure that pacemaker starts first on this node. Consider this > only if everything else fails. I'll give that a shot. Thanks. -S ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Automating Pacemaker Setup
veghead writes: > Todd Nine writes: > Wow. The example pacemaker config and the trick of starting > heartbeat before Bah. So close. But I still don't have it completely automated. If I start heatbeat on the first node and then run: crm configure < myconfigure.txt That fails. If I start heartbeat on the second node and wait for the two nodes to connect to each other (so that we have a quorum), then I can run "crm configure" and it works. So that leaves me with couple questions: 1) Is there a way to force crm to accept my configuration request ~before~ starting the second node? 2) Is there a way to tell Pacemaker to ignore quorum requirements ~before~ starting additional nodes? 2) Is there an alternate way to configure Pacemaker? -Sean ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Automating Pacemaker Setup
Todd Nine writes: > I have a setup nearly working. Would you be willing to share recipes? ... > It's not quite working yet, but it's close. Since you've managed to get > this working, you may be able to finish these off. I have everything > working except the init start/stop hooks for pacemaker to set the > Elastic IP automatically and then run chef-client to reconfigure > everything on all the other nodes. Wow. The example pacemaker config and the trick of starting heartbeat before using crm configure were the last steps I needed. Thanks! So, here's how I got Elastic IP failover working. I can't claim credit for the idea... I found a basic example here: https://forums.aws.amazon.com/thread.jspa? messageID=195373. That didn't quite work for me, so I rewrote the LSB script in pure ruby and leveraged the amazon-ec2 gem (https://github.com/grempe/amazon- ec2) to handle associating the EIP with the current instance. I have included my script below. A couple of key things. First, I found that when an instance loses it's elastic ip (whether through "disassociate" or another instance grabbed eip), it loses public internet connectivity for 1-3 minutes. Apparently this is expected, according to AWS Support: https://forums.aws.amazon.com/message.jspa?messageID=250571#250571. As a result, I decided it didn't make any sense to have the "stop" method for the EIP LSB script do anything. Second, my Pacemaker configure is pretty close to yours. I setup the nodes with ucast in almost the exact same manner. The key differences are all in setting up the primitives with the correct order and colocation: primitive elastic_ip lsb:elastic-ip op monitor interval="10s" primitive haproxy lsb:haproxy op monitor interval="10s" order haproxy-after-eip inf: elastic_ip haproxy colocation haproxy-with-eip inf: haproxy elastic_ip Third, here's my elastic-ip.rb LSB script that handles the Elastic IP. Since LSB scripts can't take any parameters other than the usual start/stop/status/etc, I treat the script as a Chef template and inject the desired EIP into the template. The other secret is that I created a special user using AWS IAM with a policy that only allows the user to associate/disassociate EIP addresses. I store the AccessKey and SecretAccessKey in a file in /etc/aws/pacemaker_keys. Let me know if you have any questions. And thanks for the tip on using crm to initialize pacemaker. #!/usr/bin/ruby # Follows the LSB Spec: http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core- generic/LSB-Core-generic/iniscrptact.html require 'rubygems' require 'AWS' ELASTIC_IP="<%= @elastic_ip %>" EC2_INSTANCE_ID=`wget -T 5 -q -O - http://169.254.169.254/latest/meta- data/instance-id` # Load the AWS access keys properties = {} File.open("/etc/aws/pacemaker_keys", 'r') do |file| file.read.each_line do |line| line.strip! if (line[0] != ?# and line[0] != ?=) i = line.index('=') if (i) properties[line[0..i - 1].strip] = line[i + 1..-1].strip else properties[line] = '' end end end end AWS_ACCESS_KEY = properties["AWS_ACCESS_KEY"].delete "\"" AWS_SECRET_ACCESS_KEY = properties["AWS_SECRET_ACCESS_KEY"].strip.delete "\"" [ ELASTIC_IP, EC2_INSTANCE_ID, AWS_ACCESS_KEY, AWS_SECRET_ACCESS_KEY ].each do |value| if value.nil? || value.length == 0 exit case ARGV[0] when "status" then 4 else 1 end end end def status(ec2) # Typical responses look like the following: # {"requestId"=>"065d1661-31b1-455d-8f63-ba086b8104de", "addressesSet"=> {"item"=>[{"instanceId"=>"i-22e93a4d", "publicIp"=>"50.19.93.215"}]}, "xmlns"=>"http://ec2.amazonaws.com/doc/2010-08-31/"} # or # {"requestId"=>"9cd3ab7e-1c03-4821-9565-1791dd1bb0fc", "addressesSet"=> {"item"=>[{"instanceId"=>nil, "publicIp"=>"174.129.34.161"}]}, "xmlns"=>"http://ec2.amazonaws.com/doc/2010-08-31/"} response = ec2.describe_addresses({:public_ip => ELASTIC_IP}) retval = 4 if ! response.nil? if ! response["addressesSet"].nil? if ! response["addressesSet"]["item"].nil? && response["addressesSet"] ["item"].length >= 1 if response["addressesSet"]["item"][0]["instanceId"] == EC2_INSTANCE_ID retval = 0 else retval = 3 end end end end retval end def start(ec2) # Throws exception if the instance does not exist or the address does not belong to us retval = 1 begin response = ec2.associate_address({ :public_ip => ELASTIC_IP, :instance_id => EC2_INSTANCE_ID }) retval = 0 rescue => e puts "Error attempting to associate address: " + e end retval end def stop(ec2) 0 end def reload(ec2) start(ec2) end def force_reload(ec2) reload(ec2) end def restart(ec2) start(ec2) end def try_restart(ec2) start(ec2) end ec2 = AWS::EC2::Base.new(:access_key_id => AWS_ACCESS_KEY, :secret_access_key => AWS_SECRET_ACCESS_KEY) retval = case ARGV[0] when "status" then status(ec2) when "start" then
[Pacemaker] Automating Pacemaker Setup
Does anyone have any links to documentation on automating Pacemaker setup with tools like Chef or Puppet? I have a working two node cluster for HAProxy on EC2, but I'd like to fully automate the setup process for future nodes. Specifically so I can spin up a new instance w/out any manual intervention. Thanks. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker