http://git-wip-us.apache.org/repos/asf/qpid-site/blob/fb1899b6/content/releases/qpid-cpp-0.34/cpp-broker/book/chapter-ha.html ---------------------------------------------------------------------- diff --git a/content/releases/qpid-cpp-0.34/cpp-broker/book/chapter-ha.html b/content/releases/qpid-cpp-0.34/cpp-broker/book/chapter-ha.html deleted file mode 100644 index 0b44121..0000000 --- a/content/releases/qpid-cpp-0.34/cpp-broker/book/chapter-ha.html +++ /dev/null @@ -1,930 +0,0 @@ -<!DOCTYPE html> -<!-- - - - - Licensed to the Apache Software Foundation (ASF) under one - - or more contributor license agreements. See the NOTICE file - - distributed with this work for additional information - - regarding copyright ownership. The ASF licenses this file - - to you under the Apache License, Version 2.0 (the - - "License"); you may not use this file except in compliance - - with the License. You may obtain a copy of the License at - - - - http://www.apache.org/licenses/LICENSE-2.0 - - - - Unless required by applicable law or agreed to in writing, - - software distributed under the License is distributed on an - - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - - KIND, either express or implied. See the License for the - - specific language governing permissions and limitations - - under the License. - - ---> -<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> - <head> - <title>1.12. Active-Passive Messaging Clusters - Apache Qpid™</title> - <meta http-equiv="X-UA-Compatible" content="IE=edge"/> - <meta name="viewport" content="width=device-width, initial-scale=1.0"/> - <link rel="stylesheet" href="/site.css" type="text/css" async="async"/> - <link rel="stylesheet" href="/deferred.css" type="text/css" defer="defer"/> - <script type="text/javascript">var _deferredFunctions = [];</script> - <script type="text/javascript" src="/deferred.js" defer="defer"></script> - <!--[if lte IE 8]> - <link rel="stylesheet" href="/ie.css" type="text/css"/> - <script type="text/javascript" src="/html5shiv.js"></script> - <![endif]--> - - <!-- Redirects for `go get` and godoc.org --> - <meta name="go-import" - content="qpid.apache.org git https://git-wip-us.apache.org/repos/asf/qpid-proton.git"/> - <meta name="go-source" - content="qpid.apache.org -https://github.com/apache/qpid-proton/blob/go1/README.md -https://github.com/apache/qpid-proton/tree/go1{/dir} -https://github.com/apache/qpid-proton/blob/go1{/dir}/{file}#L{line}"/> - </head> - <body> - <div id="-content"> - <div id="-top" class="panel"> - <a id="-menu-link"><img width="16" height="16" src="" alt="Menu"/></a> - - <a id="-search-link"><img width="22" height="16" src="" alt="Search"/></a> - - <ul id="-global-navigation"> - <li><a id="-logotype" href="/index.html">Apache Qpid<sup>™</sup></a></li> - <li><a href="/documentation.html">Documentation</a></li> - <li><a href="/download.html">Download</a></li> - <li><a href="/discussion.html">Discussion</a></li> - </ul> - </div> - - <div id="-menu" class="panel" style="display: none;"> - <div class="flex"> - <section> - <h3>Project</h3> - - <ul> - <li><a href="/overview.html">Overview</a></li> - <li><a href="/components/index.html">Components</a></li> - <li><a href="/releases/index.html">Releases</a></li> - </ul> - </section> - - <section> - <h3>Messaging APIs</h3> - - <ul> - <li><a href="/proton/index.html">Qpid Proton</a></li> - <li><a href="/components/jms/index.html">Qpid JMS</a></li> - <li><a href="/components/messaging-api/index.html">Qpid Messaging API</a></li> - </ul> - </section> - - <section> - <h3>Servers and tools</h3> - - <ul> - <li><a href="/components/broker-j/index.html">Broker-J</a></li> - <li><a href="/components/cpp-broker/index.html">C++ broker</a></li> - <li><a href="/components/dispatch-router/index.html">Dispatch router</a></li> - </ul> - </section> - - <section> - <h3>Resources</h3> - - <ul> - <li><a href="/dashboard.html">Dashboard</a></li> - <li><a href="https://cwiki.apache.org/confluence/display/qpid/Index">Wiki</a></li> - <li><a href="/resources.html">More resources</a></li> - </ul> - </section> - </div> - </div> - - <div id="-search" class="panel" style="display: none;"> - <form action="http://www.google.com/search" method="get"> - <input type="hidden" name="sitesearch" value="qpid.apache.org"/> - <input type="text" name="q" maxlength="255" autofocus="autofocus" tabindex="1"/> - <button type="submit">Search</button> - <a href="/search.html">More ways to search</a> - </form> - </div> - - <div id="-middle" class="panel"> - <ul id="-path-navigation"><li><a href="/index.html">Home</a></li><li><a href="/releases/index.html">Releases</a></li><li><a href="/releases/qpid-cpp-0.34/index.html">Qpid C++ 0.34</a></li><li><a href="/releases/qpid-cpp-0.34/cpp-broker/book/index.html">AMQP Messaging Broker (Implemented in C++)</a></li><li>1.12. Active-Passive Messaging Clusters</li></ul> - - <div id="-middle-content"> - <div class="docbook"><div class="navheader"><table summary="Navigation header" width="100%"><tr><th align="center" colspan="3">1.12. Active-Passive Messaging Clusters</th></tr><tr><td align="left" width="20%"><a accesskey="p" href="Using-message-groups.html">Prev</a> </td><th align="center" width="60%">Chapter 1.  - Running the AMQP Messaging Broker - </th><td align="right" width="20%"> <a accesskey="n" href="ha-queue-replication.html">Next</a></td></tr></table><hr /></div><div class="section"><div class="titlepage"><div><div><h2 class="title"><a id="chapter-ha"></a>1.12. Active-Passive Messaging Clusters</h2></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="ha-overview"></a>1.12.1. Overview</h3></div></div></div><p> - - The High Availability (HA) module provides - <em class="firstterm">active-passive</em>, <em class="firstterm">hot-standby</em> - messaging clusters to provide fault tolerant message delivery. - </p><p> - In an active-passive cluster only one broker, known as the - <em class="firstterm">primary</em>, is active and serving clients at a time. The other - brokers are standing by as <em class="firstterm">backups</em>. Changes on the primary - are replicated to all the backups so they are always up-to-date or "hot". Backup - brokers reject client connection attempts, to enforce the requirement that clients - only connect to the primary. - </p><p> - If the primary fails, one of the backups is promoted to take over as the new - primary. Clients fail-over to the new primary automatically. If there are multiple - backups, the other backups also fail-over to become backups of the new primary. - </p><p> - This approach relies on an external <em class="firstterm">cluster resource manager</em> - to detect failures, choose the new primary and handle network partitions. <a class="ulink" href="https://fedorahosted.org/cluster/wiki/RGManager" target="_top">rgmanager</a> is supported - initially, but others may be supported in the future. - </p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="ha-at-least-once"></a>1.12.1.1. Avoiding message loss</h4></div></div></div><p> - In order to avoid message loss, the primary broker <span class="emphasis"><em>delays - acknowledgement</em></span> of messages received from clients until the - message has been replicated and acknowledged by all of the back-up - brokers, or has been consumed from the primary queue. - </p><p> - This ensures that all acknowledged messages are safe: they have either - been consumed or backed up to all backup brokers. Messages that are - consumed <span class="emphasis"><em>before</em></span> they are replicated do not need to - be replicated. This reduces the work load when replicating a queue with - active consumers. - </p><p> - Clients keep <span class="emphasis"><em>unacknowledged</em></span> messages in a buffer - <a class="footnote" href="#ftn.idm221065919424" id="idm221065919424"><sup class="footnote">[1]</sup></a> - until they are acknowledged by the primary. If the primary fails, clients will - fail-over to the new primary and <span class="emphasis"><em>re-send</em></span> all their - unacknowledged messages. - <a class="footnote" href="#ftn.idm221066427664" id="idm221066427664"><sup class="footnote">[2]</sup></a> - </p><p> - If the primary crashes, all the <span class="emphasis"><em>acknowledged</em></span> - messages will be available on the backup that takes over as the new - primary. The <span class="emphasis"><em>unacknowledged</em></span> messages will be - re-sent by the clients. Thus no messages are lost. - </p><p> - Note that this means it is possible for messages to be - <span class="emphasis"><em>duplicated</em></span>. In the event of a failure it is possible for a - message to received by the backup that becomes the new primary - <span class="emphasis"><em>and</em></span> re-sent by the client. The application must take steps - to identify and eliminate duplicates. - </p><p> - When a new primary is promoted after a fail-over it is initially in - "recovering" mode. In this mode, it delays acknowledgement of messages - on behalf of all the backups that were connected to the previous - primary. This protects those messages against a failure of the new - primary until the backups have a chance to connect and catch up. - </p><p> - Not all messages need to be replicated to the back-up brokers. If a - message is consumed and acknowledged by a regular client before it has - been replicated to a backup, then it doesn't need to be replicated. - </p><div class="variablelist"><a id="ha-broker-states"></a><p class="title"><strong>HA Broker States</strong></p><dl class="variablelist"><dt><span class="term">Stand-alone</span></dt><dd><p> - Broker is not part of a HA cluster. - </p></dd><dt><span class="term">Joining</span></dt><dd><p> - Newly started broker, not yet connected to any existing primary. - </p></dd><dt><span class="term">Catch-up</span></dt><dd><p> - A backup broker that is connected to the primary and downloading - existing state (queues, messages etc.) - </p></dd><dt><span class="term">Ready</span></dt><dd><p> - A backup broker that is fully caught-up and ready to take over as - primary. - </p></dd><dt><span class="term">Recovering</span></dt><dd><p> - Newly-promoted primary, waiting for backups to connect and catch up. - Clients can connect but they are stalled until the primary is active. - </p></dd><dt><span class="term">Active</span></dt><dd><p> - The active primary broker with all backups connected and caught-up. - </p></dd></dl></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="limitations"></a>1.12.1.2. Limitations</h4></div></div></div><p> - There are a some known limitations in the current implementation. These - will be fixed in future versions. - </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p> - Transactional changes to queue state are not replicated atomically. If - the primary crashes during a transaction, it is possible that the - backup could contain only part of the changes introduced by a - transaction. - </p></li><li class="listitem"><p> - Configuration changes (creating or deleting queues, exchanges and - bindings) are replicated asynchronously. Management tools used to - make changes will consider the change complete when it is complete - on the primary, it may not yet be replicated to all the backups. - </p></li><li class="listitem"><p> - Federation links <span class="emphasis"><em>to</em></span> the primary will fail over - correctly. Federated links <span class="emphasis"><em>from</em></span> the primary - will be lost in fail over, they will not be re-connected to the new - primary. It is possible to work around this by replacing the - <code class="literal">qpidd-primary</code> start up script with a script that - re-creates federation links when the primary is promoted. - </p></li></ul></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="ha-virtual-ip"></a>1.12.2. Virtual IP Addresses</h3></div></div></div><p> - Some resource managers (including <span class="command"><strong>rgmanager</strong></span>) support - <em class="firstterm">virtual IP addresses</em>. A virtual IP address is an IP - address that can be relocated to any of the nodes in a cluster. The - resource manager associates this address with the primary node in the - cluster, and relocates it to the new primary when there is a failure. This - simplifies configuration as you can publish a single IP address rather - than a list. - </p><p> - A virtual IP address can be used by clients to connect to the primary. The - following sections will explain how to configure virtual IP addresses for - clients or brokers. - </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="ha-broker-config"></a>1.12.3. Configuring the Brokers</h3></div></div></div><p> - The broker must load the <code class="filename">ha</code> module, it is loaded by - default. The following broker options are available for the HA module. - </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p> - Broker management is required for HA to operate, it is enabled by - default. The option <code class="literal">mgmt-enable</code> must not be set to - "no" - </p></div><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p> - Incorrect security settings are a common cause of problems when - getting started, see <a class="xref" href="chapter-ha.html#ha-security" title="1.12.9. Security and Access Control.">Section 1.12.9, “Security and Access Control.”</a>. - </p></div><div class="table"><a id="ha-broker-options"></a><p class="title"><strong>Table 1.28. Broker Options for High Availability Messaging Cluster</strong></p><div class="table-contents"><table border="1" summary="Broker Options for High Availability Messaging Cluster"><colgroup><col align="left" class="c1" /><col align="left" class="c2" /></colgroup><thead><tr><th align="center" colspan="2"> - Options for High Availability Messaging Cluster - </th></tr></thead><tbody><tr><td align="left"> - <code class="literal">ha-cluster <em class="replaceable"><code>yes|no</code></em></code> - </td><td align="left"> - Set to "yes" to have the broker join a cluster. - </td></tr><tr><td align="left"> - <code class="literal">ha-queue-replication <em class="replaceable"><code>yes|no</code></em></code> - </td><td align="left"> - Enable replication of specific queues without joining a cluster, see <a class="xref" href="ha-queue-replication.html" title="1.13. Replicating Queues with the HA module">Section 1.13, “Replicating Queues with the HA module”</a>. - </td></tr><tr><td align="left"> - <code class="literal">ha-brokers-url <em class="replaceable"><code>URL</code></em></code> - </td><td align="left"> - <p> - The URL - <a class="footnote" href="#ftn.ha-url-grammar" id="ha-url-grammar"><sup class="footnote">[a]</sup></a> - used by cluster brokers to connect to each other. The URL should - contain a comma separated list of the broker addresses, rather than a - virtual IP address. - </p> - </td></tr><tr><td align="left"><code class="literal">ha-public-url <em class="replaceable"><code>URL</code></em></code> </td><td align="left"> - <p> - This option is only needed for backwards compatibility if you - have been using the <code class="literal">amq.failover</code> exchange. - This exchange is now obsolete, it is recommended to use a - virtual IP address instead. - </p> - <p> - If set, this URL is advertised by the - <code class="literal">amq.failover</code> exchange and overrides the - broker option <code class="literal">known-hosts-url</code> - </p> - </td></tr><tr><td align="left"><code class="literal">ha-replicate </code><em class="replaceable"><code>VALUE</code></em></td><td align="left"> - <p> - Specifies whether queues and exchanges are replicated by default. - <em class="replaceable"><code>VALUE</code></em> is one of: <code class="literal">none</code>, - <code class="literal">configuration</code>, <code class="literal">all</code>. - For details see <a class="xref" href="chapter-ha.html#ha-replicate-values" title="1.12.7. Controlling replication of queues and exchanges">Section 1.12.7, “Controlling replication of queues and exchanges”</a>. - </p> - </td></tr><tr><td align="left"> - <p><code class="literal">ha-username <em class="replaceable"><code>USER</code></em></code></p> - <p><code class="literal">ha-password <em class="replaceable"><code>PASS</code></em></code></p> - <p><code class="literal">ha-mechanism <em class="replaceable"><code>MECHANISM</code></em></code></p> - </td><td align="left"> - Authentication settings used by HA brokers to connect to each other, - see <a class="xref" href="chapter-ha.html#ha-security" title="1.12.9. Security and Access Control.">Section 1.12.9, “Security and Access Control.”</a> - </td></tr><tr><td align="left"><code class="literal">ha-backup-timeout<em class="replaceable"><code>SECONDS</code></em></code> - <a class="footnote" href="#ftn.ha-seconds-spec" id="ha-seconds-spec"><sup class="footnote">[b]</sup></a> - </td><td align="left"> - <p> - Maximum time that a recovering primary will wait for an expected - backup to connect and become ready. - </p> - </td></tr><tr><td align="left"> - <code class="literal">link-maintenance-interval <em class="replaceable"><code>SECONDS</code></em></code> - <a class="footnoteref" href="chapter-ha.html#ftn.ha-seconds-spec"><sup class="footnoteref">[b]</sup></a> - </td><td align="left"> - <p> - HA uses federation links to connect from backup to primary. - Backup brokers check the link to the primary on this interval - and re-connect if need be. Default 2 seconds. Set lower for - faster failover, e.g. 0.1 seconds. Setting too low will result - in excessive link-checking on the backups. - </p> - </td></tr><tr><td align="left"> - <code class="literal">link-heartbeat-interval <em class="replaceable"><code>SECONDS</code></em></code> - <a class="footnoteref" href="chapter-ha.html#ftn.ha-seconds-spec"><sup class="footnoteref">[b]</sup></a> - </td><td align="left"> - <p> - HA uses federation links to connect from backup to primary. - If no heart-beat is received for twice this interval the primary will consider that - backup dead (e.g. if backup is hung or partitioned.) - This interval is also used to time-out for broker status checks, - it may take up to this interval for rgmanager to detect a hung or partitioned broker. - Clients sending messages may be held up during this time. - Default 120 seconds: you will probably want to set this to a lower value e.g. 10. - If set too low rgmanager may consider a slow broker to have failed and kill it. - </p> - </td></tr></tbody><tbody class="footnotes"><tr><td colspan="2"><div class="footnote" id="ftn.ha-url-grammar"><p><a class="para" href="#ha-url-grammar"><sup class="para">[a] </sup></a> - The full format of the URL is given by this grammar: - </p><pre class="programlisting"> -url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)* -addr = tcp_addr / rmda_addr / ssl_addr / ... -tcp_addr = ["tcp:"] host [":" port] -rdma_addr = "rdma:" host [":" port] -ssl_addr = "ssl:" host [":" port]' - </pre><p> - </p></div><div class="footnote" id="ftn.ha-seconds-spec"><p><a class="para" href="#ha-seconds-spec"><sup class="para">[b] </sup></a> - Values specified as <em class="replaceable"><code>SECONDS</code></em> can be a - fraction of a second, e.g. "0.1" for a tenth of a second. - They can also have an explicit unit, - e.g. 10s (seconds), 10ms (milliseconds), 10us (microseconds), 10ns (nanoseconds) - </p></div></td></tr></tbody></table></div></div><br class="table-break" /><p> - To configure a HA cluster you must set at least <code class="literal">ha-cluster</code> and - <code class="literal">ha-brokers-url</code>. - </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="ha-rm"></a>1.12.4. The Cluster Resource Manager</h3></div></div></div><p> - Broker fail-over is managed by a <em class="firstterm">cluster resource - manager</em>. An integration with <a class="ulink" href="https://fedorahosted.org/cluster/wiki/RGManager" target="_top">rgmanager</a> is - provided, but it is possible to integrate with other resource managers. - </p><p> - The resource manager is responsible for starting the <span class="command"><strong>qpidd</strong></span> broker - on each node in the cluster. The resource manager then <em class="firstterm">promotes</em> - one of the brokers to be the primary. The other brokers connect to the primary as - backups, using the URL provided in the <code class="literal">ha-brokers-url</code> configuration - option. - </p><p> - Once connected, the backup brokers synchronize their state with the - primary. When a backup is synchronized, or "hot", it is ready to take - over if the primary fails. Backup brokers continually receive updates - from the primary in order to stay synchronized. - </p><p> - If the primary fails, backup brokers go into fail-over mode. The resource - manager must detect the failure and promote one of the backups to be the - new primary. The other backups connect to the new primary and synchronize - their state with it. - </p><p> - The resource manager is also responsible for protecting the cluster from - <em class="firstterm">split-brain</em> conditions resulting from a network partition. A - network partition divide a cluster into two sub-groups which cannot see each other. - Usually a <em class="firstterm">quorum</em> voting algorithm is used that disables nodes - in the inquorate sub-group. - </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="ha-rm-config"></a>1.12.5. Configuring with <span class="command"><strong>rgmanager</strong></span> as resource manager</h3></div></div></div><p> - This section assumes that you are already familiar with setting up and configuring - clustered services using <span class="command"><strong>cman</strong></span> and - <span class="command"><strong>rgmanager</strong></span>. It will show you how to configure an active-passive, - hot-standby <span class="command"><strong>qpidd</strong></span> HA cluster with <span class="command"><strong>rgmanager</strong></span>. - </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p> - Once all components are installed it is important to take the following step: - </p><pre class="programlisting"> -chkconfig rgmanager on -chkconfig cman on -chkconfig qpidd <span class="emphasis"><em>off</em></span> - </pre><p> - </p><p> - The qpidd service must be <span class="emphasis"><em>off</em></span> in - <code class="literal">chkconfig</code> because <code class="literal">rgmanager</code> will - start and stop <code class="literal">qpidd</code>. If the normal system init - process also attempts to start and stop qpidd it can cause rgmanager to - lose track of qpidd processes. The symptom when this happens is that - <code class="literal">clustat</code> shows a <code class="literal">qpidd</code> service to - be stopped when in fact there is a <code class="literal">qpidd</code> process - running. The <code class="literal">qpidd</code> log will show errors like this: - </p><pre class="programlisting"> -critical Unexpected error: Daemon startup failed: Cannot lock /var/lib/qpidd/lock: Resource temporarily unavailable - </pre><p> - </p></div><p> - You must provide a <code class="literal">cluster.conf</code> file to configure - <span class="command"><strong>cman</strong></span> and <span class="command"><strong>rgmanager</strong></span>. Here is - an example <code class="literal">cluster.conf</code> file for a cluster of 3 nodes named - node1, node2 and node3. We will go through the configuration step-by-step. - </p><pre class="programlisting"> - -<?xml version="1.0"?> -<!-- -This is an example of a cluster.conf file to run qpidd HA under rgmanager. -This example assumes a 3 node cluster, with nodes named node1, node2 and node3. - -NOTE: fencing is not shown, you must configure fencing appropriately for your cluster. ---> - -<cluster name="qpid-test" config_version="18"> - <!-- The cluster has 3 nodes. Each has a unique nodeid and one vote - for quorum. --> - <clusternodes> - <clusternode name="node1.example.com" nodeid="1"/> - <clusternode name="node2.example.com" nodeid="2"/> - <clusternode name="node3.example.com" nodeid="3"/> - </clusternodes> - - <!-- Resouce Manager configuration. --> - - status_poll_interval is the interval in seconds that the resource manager checks the status - of managed services. This affects how quickly the manager will detect failed services. - --> - <rm status_poll_interval="1"> - <!-- - There is a failoverdomain for each node containing just that node. - This lets us stipulate that the qpidd service should always run on each node. - --> - <failoverdomains> - <failoverdomain name="node1-domain" restricted="1"> - <failoverdomainnode name="node1.example.com"/> - </failoverdomain> - <failoverdomain name="node2-domain" restricted="1"> - <failoverdomainnode name="node2.example.com"/> - </failoverdomain> - <failoverdomain name="node3-domain" restricted="1"> - <failoverdomainnode name="node3.example.com"/> - </failoverdomain> - </failoverdomains> - - <resources> - <!-- This script starts a qpidd broker acting as a backup. --> - <script file="/etc/init.d/qpidd" name="qpidd"/> - - <!-- This script promotes the qpidd broker on this node to primary. --> - <script file="/etc/init.d/qpidd-primary" name="qpidd-primary"/> - - <!-- - This is a virtual IP address for client traffic. - monitor_link="yes" means monitor the health of the NIC used for the VIP. - sleeptime="0" means don't delay when failing over the VIP to a new address. - --> - <ip address="20.0.20.200" monitor_link="yes" sleeptime="0"/> - </resources> - - <!-- There is a qpidd service on each node, it should be restarted if it fails. --> - <service name="node1-qpidd-service" domain="node1-domain" recovery="restart"> - <script ref="qpidd"/> - </service> - <service name="node2-qpidd-service" domain="node2-domain" recovery="restart"> - <script ref="qpidd"/> - </service> - <service name="node3-qpidd-service" domain="node3-domain" recovery="restart"> - <script ref="qpidd"/> - </service> - - <!-- There should always be a single qpidd-primary service, it can run on any node. --> - <service name="qpidd-primary-service" autostart="1" exclusive="0" recovery="relocate"> - <script ref="qpidd-primary"/> - <!-- The primary has the IP addresses for brokers and clients to connect. --> - <ip ref="20.0.20.200"/> - </service> - </rm> -</cluster> - - </pre><p> - There is a <code class="literal">failoverdomain</code> for each node containing just that - one node. This lets us stipulate that the qpidd service should always run on all - nodes. - </p><p> - The <code class="literal">resources</code> section defines the <span class="command"><strong>qpidd</strong></span> - script used to start the <span class="command"><strong>qpidd</strong></span> service. It also defines the - <span class="command"><strong>qpid-primary</strong></span> script which does not - actually start a new service, rather it promotes the existing - <span class="command"><strong>qpidd</strong></span> broker to primary status. - </p><p> - The <code class="literal">resources</code> section also defines a virtual IP - address for clients: <code class="literal">20.0.20.200</code>. - </p><p> - <code class="filename">qpidd.conf</code> should contain these lines: - </p><pre class="programlisting"> -ha-cluster=yes -ha-brokers-url=20.0.20.1,20.0.20.2,20.0.20.3 - </pre><p> - The brokers connect to each other directly via the addresses - listed in <span class="command"><strong>ha-brokers-url</strong></span>. Note the client and broker - addresses are on separate sub-nets, this is recommended but not required. - </p><p> - The <code class="literal">service</code> section defines 3 <code class="literal">qpidd</code> - services, one for each node. Each service is in a restricted fail-over - domain containing just that node, and has the <code class="literal">restart</code> - recovery policy. The effect of this is that rgmanager will run - <span class="command"><strong>qpidd</strong></span> on each node, restarting if it fails. - </p><p> - There is a single <code class="literal">qpidd-primary-service</code> using the - <span class="command"><strong>qpidd-primary</strong></span> script which is not restricted to a - domain and has the <code class="literal">relocate</code> recovery policy. This means - rgmanager will start <span class="command"><strong>qpidd-primary</strong></span> on one of the nodes - when the cluster starts and will relocate it to another node if the - original node fails. Running the <code class="literal">qpidd-primary</code> script - does not start a new broker process, it promotes the existing broker to - become the primary. - </p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="ha-rm-shutdown-node"></a>1.12.5.1. Shutting down qpidd on a HA node</h4></div></div></div><p> - As explained above both the per-node <code class="literal">qpidd</code> service - and the re-locatable <code class="literal">qpidd-primary</code> service are - implemented by the same <code class="literal">qpidd</code> daemon. - </p><p> - As a result, stopping the <code class="literal">qpidd</code> service will not stop - a <code class="literal">qpidd</code> daemon that is acting as primary, and - stopping the <code class="literal">qpidd-primary</code> service will not stop a - <code class="literal">qpidd</code> process that is acting as backup. - </p><p> - To shut down a node that is acting as primary you need to shut down the - <code class="literal">qpidd</code> service <span class="emphasis"><em>and</em></span> relocate the - primary: - </p><p> - </p><pre class="programlisting"> -clusvcadm -d somenode-qpidd-service -clusvcadm -r qpidd-primary-service - </pre><p> - </p><p> - This will shut down the <code class="literal">qpidd</code> daemon on that node and - prevent the primary service service from relocating back to the node - because the qpidd service is no longer running there. - </p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="ha-broker-admin"></a>1.12.6. Broker Administration Tools</h3></div></div></div><p> - Normally, clients are not allowed to connect to a backup broker. However - management tools are allowed to connect to a backup brokers. If you use - these tools you <span class="emphasis"><em>must not</em></span> add or remove messages from - replicated queues, nor create or delete replicated queues or exchanges as - this will disrupt the replication process and may cause message loss. - </p><p> - <span class="command"><strong>qpid-ha</strong></span> allows you to view and change HA configuration settings. - </p><p> - The tools <span class="command"><strong>qpid-config</strong></span>, <span class="command"><strong>qpid-route</strong></span> and - <span class="command"><strong>qpid-stat</strong></span> will connect to a backup if you pass the flag <span class="command"><strong>ha-admin</strong></span> on the - command line. - </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="ha-replicate-values"></a>1.12.7. Controlling replication of queues and exchanges</h3></div></div></div><p> - By default, queues and exchanges are not replicated automatically. You can change - the default behaviour by setting the <code class="literal">ha-replicate</code> configuration - option. It has one of the following values: - </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p> - <em class="firstterm">all</em>: Replicate everything automatically: queues, - exchanges, bindings and messages. - </p></li><li class="listitem"><p> - <em class="firstterm">configuration</em>: Replicate the existence of queues, - exchange and bindings but don't replicate messages. - </p></li><li class="listitem"><p> - <em class="firstterm">none</em>: Don't replicate anything, this is the default. - </p></li></ul></div><p> - </p><p> - You can over-ride the default for a particular queue or exchange by passing the - argument <code class="literal">qpid.replicate</code> when creating the queue or exchange. It - takes the same values as <code class="literal">ha-replicate</code> - </p><p> - Bindings are automatically replicated if the queue and exchange being bound both - have replication <code class="literal">all</code> or <code class="literal">configuration</code>, they - are not replicated otherwise. - </p><p> - You can create replicated queues and exchanges with the - <span class="command"><strong>qpid-config</strong></span> management tool like this: - </p><pre class="programlisting"> -qpid-config add queue myqueue --replicate all - </pre><p> - To create replicated queues and exchanges via the client API, add a - <code class="literal">node</code> entry to the address like this: - </p><pre class="programlisting"> -"myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}" - </pre><p> - There are some built-in exchanges created automatically by the broker, these - exchanges are never replicated. The built-in exchanges are the default (nameless) - exchange, the AMQP standard exchanges (<code class="literal">amq.direct, amq.topic, amq.fanout</code> and - <code class="literal">amq.match</code>) and the management exchanges (<code class="literal">qpid.management, qmf.default.direct</code> and - <code class="literal">qmf.default.topic</code>) - </p><p> - Note that if you bind a replicated queue to one of these exchanges, the - binding will <span class="emphasis"><em>not</em></span> be replicated, so the queue will not - have the binding after a fail-over. - </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="ha-failover"></a>1.12.8. Client Connection and Fail-over</h3></div></div></div><p> - Clients can only connect to the primary broker. Backup brokers reject any - connection attempt by a client. Clients rejected by a backup broker will - automatically fail-over until they connect to the primary. - </p><p> - Clients are configured with the URL for the cluster (details below for - each type of client). There are two possibilities - </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p> - The URL contains multiple addresses, one for each broker in the cluster. - </p></li><li class="listitem"><p> - The URL contains a single <em class="firstterm">virtual IP address</em> - that is assigned to the primary broker by the resource manager. - This is the recommended configuration. - </p></li></ul></div><p> - In the first case, clients will repeatedly re-try each address in the URL - until they successfully connect to the primary. In the second case the - resource manager will assign the virtual IP address to the primary broker, - so clients only need to re-try on a single address. - </p><p> - When the primary broker fails, clients re-try all known cluster addresses - until they connect to the new primary. The client re-sends any messages - that were previously sent but not acknowledged by the broker at the time - of the failure. Similarly messages that have been sent by the broker, but - not acknowledged by the client, are re-queued. - </p><p> - TCP can be slow to detect connection failures. A client can configure a - connection to use a <em class="firstterm">heartbeat</em> to detect connection - failure, and can specify a time interval for the heartbeat. If heartbeats - are in use, failures will be detected no later than twice the heartbeat - interval. The following sections explain how to enable heartbeat in each - client. - </p><p> - Note: the following sections explain how to configure clients with - multiple dresses, but if you are using a virtual IP address you only need - to configure that one address for clients, you don't need to list all the - addresses. - </p><p> - Suppose your cluster has 3 nodes: <code class="literal">node1</code>, - <code class="literal">node2</code> and <code class="literal">node3</code> all using the - default AMQP port, and you are not using a virtual IP address. To connect - a client you need to specify the address(es) and set the - <code class="literal">reconnect</code> property to <code class="literal">true</code>. The - following sub-sections show how to connect each type of client. - </p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="ha-clients"></a>1.12.8.1. C++ clients</h4></div></div></div><p> - With the C++ client, you specify multiple cluster addresses in a single URL - <a class="footnote" href="#ftn.idm221065028880" id="idm221065028880"><sup class="footnote">[3]</sup></a> - You also need to specify the connection option - <code class="literal">reconnect</code> to be true. For example: - </p><pre class="programlisting"> -qpid::messaging::Connection c("node1,node2,node3","{reconnect:true}"); - </pre><p> - Heartbeats are disabled by default. You can enable them by specifying a - heartbeat interval (in seconds) for the connection via the - <code class="literal">heartbeat</code> option. For example: - </p><pre class="programlisting"> -qpid::messaging::Connection c("node1,node2,node3","{reconnect:true,heartbeat:10}"); - </pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="ha-python-client"></a>1.12.8.2. Python clients</h4></div></div></div><p> - With the python client, you specify <code class="literal">reconnect=True</code> - and a list of <em class="replaceable"><code>host:port</code></em> addresses as - <code class="literal">reconnect_urls</code> when calling - <code class="literal">Connection.establish</code> or - <code class="literal">Connection.open</code> - </p><pre class="programlisting"> -connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"]) - </pre><p> - Heartbeats are disabled by default. You can - enable them by specifying a heartbeat interval (in seconds) for the - connection via the 'heartbeat' option. For example: - </p><pre class="programlisting"> -connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"], heartbeat=10) - </pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="ha-jms-client"></a>1.12.8.3. Java JMS Clients</h4></div></div></div><p> - In Java JMS clients, client fail-over is handled automatically if it is - enabled in the connection. You can configure a connection to use - fail-over using the <span class="command"><strong>failover</strong></span> property: - </p><pre class="screen"> - connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672'&failover='failover_exchange' - </pre><p> - This property can take three values: - </p><div class="variablelist"><p class="title"><strong>Fail-over Modes</strong></p><dl class="variablelist"><dt><span class="term">failover_exchange</span></dt><dd><p> - If the connection fails, fail over to any other broker in the cluster. - </p></dd><dt><span class="term">roundrobin</span></dt><dd><p> - If the connection fails, fail over to one of the brokers specified in the <span class="command"><strong>brokerlist</strong></span>. - </p></dd><dt><span class="term">singlebroker</span></dt><dd><p> - Fail-over is not supported; the connection is to a single broker only. - </p></dd></dl></div><p> - In a Connection URL, heartbeat is set using the <span class="command"><strong>heartbeat</strong></span> property, which is an integer corresponding to the heartbeat period in seconds. For instance, the following line from a JNDI properties file sets the heartbeat time out to 3 seconds: - </p><pre class="screen"> - connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672'&heartbeat='3' - </pre></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="ha-security"></a>1.12.9. Security and Access Control.</h3></div></div></div><p> - This section outlines the HA specific aspects of security configuration. - Please see <a class="xref" href="chap-Messaging_User_Guide-Security.html" title="1.5. Security">Section 1.5, “Security”</a> for - more details on enabling authentication and setting up Access Control Lists. - </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p> - Unless you disable authentication with <code class="literal">auth=no</code> in - your configuration, you <span class="emphasis"><em>must</em></span> set the options below - and you <span class="emphasis"><em>must</em></span> have an ACL file with at least the - entry described below. - </p><p> - Backups will be <span class="emphasis"><em>unable to connect to the primary</em></span> if - the security configuration is incorrect. See also <a class="xref" href="chapter-ha.html#ha-troubleshoot-security" title="1.12.12.2. Authentication and ACL failures">Section 1.12.12.2, “Authentication and ACL failures”</a> - </p></div><p> - When authentication is enabled you must set the credentials used by HA - brokers with following options: - </p><div class="table"><a id="ha-security-options"></a><p class="title"><strong>Table 1.29. HA Security Options</strong></p><div class="table-contents"><table border="1" summary="HA Security Options"><colgroup><col align="left" class="c1" /><col align="left" class="c2" /></colgroup><thead><tr><th align="center" colspan="2"> - HA Security Options - </th></tr></thead><tbody><tr><td align="left"><p><code class="literal">ha-username</code> <em class="replaceable"><code>USER</code></em></p></td><td align="left"><p>User name for HA brokers. Note this must <span class="emphasis"><em>not</em></span> include the <code class="literal">@QPID</code> suffix.</p></td></tr><tr><td align="left"><p><code class="literal">ha-password</code> <em class="replaceable"><code>PASS</code></em></p></td><td align="left"><p>Password for HA brokers.</p></td></tr><tr><td align="left"><p><code class="literal">ha-mechanism</code> <em class="replaceable"><code>MECHANISM</code></em></p></td><td align="left"> - <p> - Mechanism for HA brokers. Any mechanism you enable for - broker-to-broker communication can also be used by a client, so - do not use ha-mechanism=ANONYMOUS in a secure environment. - </p> - </td></tr></tbody></table></div></div><br class="table-break" /><p> - This identity is used to authorize federation links from backup to - primary. It is also used to authorize actions on the backup to replicate - primary state, for example creating queues and exchanges. - </p><p> - When authorization is enabled you must have an Access Control List with the - following rule to allow HA replication to function. Suppose - <code class="literal">ha-username</code>=<em class="replaceable"><code>USER</code></em> - </p><pre class="programlisting"> -acl allow <em class="replaceable"><code>USER</code></em>@QPID all all - </pre></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="ha-other-rm"></a>1.12.10. Integrating with other Cluster Resource Managers</h3></div></div></div><p> - To integrate with a different resource manager you must configure it to: - </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>Start a qpidd process on each node of the cluster.</p></li><li class="listitem"><p>Restart qpidd if it crashes.</p></li><li class="listitem"><p>Promote exactly one of the brokers to primary.</p></li><li class="listitem"><p>Detect a failure and promote a new primary.</p></li></ul></div><p> - </p><p> - The <span class="command"><strong>qpid-ha</strong></span> command allows you to check if a broker is - primary, and to promote a backup to primary. - </p><p> - To test if a broker is the primary: - </p><pre class="programlisting">qpid-ha -b <em class="replaceable"><code>broker-address</code></em> status --expect=primary</pre><p> - This will return 0 if the broker at <em class="replaceable"><code>broker-address</code></em> is the primary, - non-0 otherwise. - </p><p> - To promote a broker to primary: - </p><pre class="programlisting">qpid-ha --cluster-manager -b <em class="replaceable"><code>broker-address</code></em> promote</pre><p> - </p><p> - Note that <code class="literal">promote</code> is considered a "cluster manager - only" command. Incorrect use of <code class="literal">promote</code> outside of the - cluster manager could create a cluster with multiple primaries. Such a - cluster will malfunction and lose data. "Cluster manager only" commands - are not accessible in <span class="command"><strong>qpid-ha</strong></span> without the - <code class="literal">--cluster-manager</code> option. - </p><p> - To list the full set of commands use: - </p><pre class="programlisting"> -qpid-ha --cluster-manager --help - </pre></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="ha-store"></a>1.12.11. Using a message store in a cluster</h3></div></div></div><p> - If you use a persistent store for your messages then each broker in a - cluster will have its own store. If the entire cluster fails and is - restarted, the *first* broker that becomes primary will recover from its - store. All the other brokers will clear their stores and get an update - from the primary to ensure consistency. - </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="ha-troubleshoot"></a>1.12.12. Troubleshooting a cluster</h3></div></div></div><p> - This section applies to clusters that are using rgmanager as the - cluster manager. - </p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="ha-troubleshoot-no-primary"></a>1.12.12.1. No primary broker</h4></div></div></div><p> - When you initially start a HA cluster, all brokers are in - <code class="literal">joining</code> mode. The brokers do not automatically select - a primary, they rely on the cluster manager <code class="literal">rgmanager</code> - to do so. If <code class="literal">rgmanager</code> is not running or is not - configured correctly, brokers will remain in the - <code class="literal">joining</code> state. See <a class="xref" href="chapter-ha.html#ha-rm-config" title="1.12.5. Configuring with rgmanager as resource manager">Section 1.12.5, “Configuring with <span class="command"><strong>rgmanager</strong></span> as resource manager”</a> - </p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="ha-troubleshoot-security"></a>1.12.12.2. Authentication and ACL failures</h4></div></div></div><p> - If a broker is unable to establish a connection to another broker in the - cluster due to authentication or ACL problems the logs may contain - errors like the following: - </p><pre class="programlisting"> -info SASL: Authentication failed: SASL(-13): user not found: Password verification failed - </pre><p> - </p><pre class="programlisting"> -warning Client closed connection with 320: User anonymous@QPID federation connection denied. Systems with authentication enabled must specify ACL create link rules. - </pre><p> - </p><pre class="programlisting"> -warning Client closed connection with 320: ACL denied anonymous@QPID creating a federation link. - </pre><p> - </p><p> - Set the HA security configuration and ACL file as described in <a class="xref" href="chapter-ha.html#ha-security" title="1.12.9. Security and Access Control.">Section 1.12.9, “Security and Access Control.”</a>. Once the cluster is running and the primary is - promoted , run: - </p><pre class="programlisting">qpid-ha status --all</pre><p> - to make sure that the brokers are running as one cluster. - </p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="ha-troubleshoot-slow-recovery"></a>1.12.12.3. Slow recovery times</h4></div></div></div><p> - The following configuration settings affect recovery time. The - values shown are examples that give fast recovery on a lightly - loaded system. You should run tests to determine if the values are - appropriate for your system and load conditions. - </p><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="ha-troubleshoot-cluster.conf"></a>cluster.conf:</h5></div></div></div><pre class="programlisting"> -<rm status_poll_interval=1> - </pre><p> - status_poll_interval is the interval in seconds that the - resource manager checks the status of managed services. This - affects how quickly the manager will detect failed services. - </p><pre class="programlisting"> -<ip address="20.0.20.200" monitor_link="yes" sleeptime="0"/> - </pre><p> - This is a virtual IP address for client traffic. - monitor_link="yes" means monitor the health of the network interface - used for the VIP. sleeptime="0" means don't delay when - failing over the VIP to a new address. - </p></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="ha-troubleshoot-qpidd.conf"></a>qpidd.conf</h5></div></div></div><pre class="programlisting"> -link-maintenance-interval=0.1 - </pre><p> - Interval for backup brokers to check the link to the primary - re-connect if need be. Default 2 seconds. Can be set lower for - faster fail-over. Setting too low will result in excessive - link-checking activity on the broker. - </p><pre class="programlisting"> -link-heartbeat-interval=5 - </pre><p> - Heartbeat interval for federation links. The HA cluster uses - federation links between the primary and each backup. The - primary can take up to twice the heartbeat interval to detect a - failed backup. When a sender sends a message the primary waits - for all backups to acknowledge before acknowledging to the - sender. A disconnected backup may cause the primary to block - senders until it is detected via heartbeat. - </p><p> - This interval is also used as the timeout for broker status - checks by rgmanager. It may take up to this interval for - rgmanager to detect a hung broker. - </p><p> - The default of 120 seconds is very high, you will probably want - to set this to a lower value. If set too low, under network - congestion or heavy load, a slow-to-respond broker may be - re-started by rgmanager. - </p></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="ha-troubleshoot-total-cluster-failure"></a>1.12.12.4. Total cluster failure</h4></div></div></div><p> - Note: for definition of broker states <em class="firstterm">joining</em>, - <em class="firstterm">catch-up</em>, <em class="firstterm">ready</em>, - <em class="firstterm">recovering</em> and <em class="firstterm">active</em> see - <a class="xref" href="chapter-ha.html#ha-broker-states" title="HA Broker States">HA Broker States</a> - </p><p> - The cluster can only guarantee availability as long as there is at - least one active primary broker or ready backup broker left alive. - If all the brokers fail simultaneously, the cluster will fail and - non-persistent data will be lost. - </p><p> - While there is an active primary broker, clients can get service. - If the active primary fails, one of the "ready" backup - brokers will take over, recover and become active. Note a backup - can only be promoted to primary if it is in the "ready" - state (with the exception of the first primary in a new cluster - where all brokers are in the "joining" state) - </p><p> - Given a stable cluster of N brokers with one active primary and - N-1 ready backups, the system can sustain up to N-1 failures in - rapid succession. The surviving broker will be promoted to active - and continue to give service. - </p><p> - However at this point the system <span class="emphasis"><em>cannot</em></span> - sustain a failure of the surviving broker until at least one of - the other brokers recovers, catches up and becomes a ready backup. - If the surviving broker fails before that the cluster will fail in - one of two modes (depending on the exact timing of failures) - </p><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="ha-troubleshoot-the-cluster-hangs"></a>1. The cluster hangs</h5></div></div></div><p> - All brokers are in joining or catch-up mode. rgmanager tries to - promote a new primary but cannot find any candidates and so - gives up. clustat will show that the qpidd services are running - but the the qpidd-primary service has stopped, something like - this: - </p><pre class="programlisting"> -Service Name Owner (Last) State -------- ---- ----- ------ ----- -service:mrg33-qpidd-service 20.0.10.33 started -service:mrg34-qpidd-service 20.0.10.34 started -service:mrg35-qpidd-service 20.0.10.35 started -service:qpidd-primary-service (20.0.10.33) stopped - </pre><p> - Eventually all brokers become stuck in "joining" mode, - as shown by: <code class="literal">qpid-ha status --all</code> - </p><p> - At this point you need to restart the cluster in one of the - following ways: - </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p> - Restart the entire cluster: - In <code class="literal">luci:<em class="replaceable"><code>your-cluster</code></em>:Nodes</code> - click reboot to restart the entire cluster - </p></li><li class="listitem"><p> - Stop and restart the cluster with - <code class="literal">ccs --stopall; ccs --startall</code> - </p></li><li class="listitem"><p> - Restart just the Qpid services:In <code class="literal">luci:<em class="replaceable"><code>your-cluster</code></em>:Service Groups</code> - </p><div class="orderedlist"><ol class="orderedlist" type="a"><li class="listitem"><p>Select all the qpidd (not qpidd-primary) services, click restart</p></li><li class="listitem"><p>Select the qpidd-primary service, click restart</p></li></ol></div><p> - </p></li><li class="listitem"><p> - Stop the <code class="literal">qpidd-primary</code> and - <code class="literal">qpidd</code> services with <code class="literal">clusvcadm</code>, - then restart (qpidd-primary last) - </p></li></ol></div><p> - </p></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="ha-troubleshoot-the-cluster-reboots"></a>2. The cluster reboots</h5></div></div></div><p> - A new primary is promoted and the cluster is functional but all - non-persistent data from before the failure is lost. - </p></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="ha-troubleshoot-fencing-and-network-partitions"></a>1.12.12.5. Fencing and network partitions</h4></div></div></div><p> - A network partition is a a network failure that divides the - cluster into two or more sub-clusters, where each broker can - communicate with brokers in its own sub-cluster but not with - brokers in other sub-clusters. This condition is also referred to - as a "split brain". - </p><p> - Nodes in one sub-cluster can't tell whether nodes in other - sub-clusters are dead or are still running but disconnected. We - cannot allow each sub-cluster to independently declare its own - qpidd primary and start serving clients, as the cluster will - become inconsistent. We must ensure only one sub-cluster continues - to provide service. - </p><p> - A <span class="emphasis"><em>quorum</em></span> determines which sub-cluster - continues to operate, and <span class="emphasis"><em>power fencing</em></span> - ensures that nodes in non-quorate sub-clusters cannot attempt to - provide service inconsistently. For more information see: - </p><p> - https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/High_Availability_Add-On_Overview/index.html, - chapter 2. Quorum and 4. Fencing. - </p></div></div><div class="footnotes"><br /><hr align="left" width="100" /><div class="footnote" id="ftn.idm221065919424"><p><a class="para" href="#idm221065919424"><sup class="para">[1] </sup></a> - You can control the maximum number of messages in the buffer by setting the - client's <code class="literal">capacity</code>. For details of how to set the capacity - in client code see "Using the Qpid Messaging API" in - <em class="citetitle">Programming in Apache Qpid</em>. - </p></div><div class="footnote" id="ftn.idm221066427664"><p><a class="para" href="#idm221066427664"><sup class="para">[2] </sup></a> - Clients must use "at-least-once" reliability to enable re-send of unacknowledged - messages. This is the default behaviour, no options need be set to enable it. For - details of client addressing options see "Using the Qpid Messaging API" - in <em class="citetitle">Programming in Apache Qpid</em>. - </p></div><div class="footnote" id="ftn.idm221065028880"><p><a class="para" href="#idm221065028880"><sup class="para">[3] </sup></a> - The full grammar for the URL is: - </p><pre class="programlisting"> -url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)* -addr = tcp_addr / rmda_addr / ssl_addr / ... -tcp_addr = ["tcp:"] host [":" port] -rdma_addr = "rdma:" host [":" port] -ssl_addr = "ssl:" host [":" port]' - </pre></div></div></div><div class="navfooter"><hr /><table summary="Navigation footer" width="100%"><tr><td align="left" width="40%"><a accesskey="p" href="Using-message-groups.html">Prev</a> </td><td align="center" width="20%"><a accesskey="u" href="ch01.html">Up</a></td><td align="right" width="40%"> <a accesskey="n" href="ha-queue-replication.html">Next</a></td></tr><tr><td align="left" valign="top" width="40%">1.11.  - Using Message Groups -  </td><td align="center" width="20%"><a accesskey="h" href="index.html">Home</a></td><td align="right" valign="top" width="40%"> 1.13. Replicating Queues with the HA module</td></tr></table></div></div> - - <hr/> - - <ul id="-apache-navigation"> - <li><a href="http://www.apache.org/">Apache</a></li> - <li><a href="http://www.apache.org/licenses/">License</a></li> - <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> - <li><a href="http://www.apache.org/foundation/thanks.html">Thanks!</a></li> - <li><a href="/security.html">Security</a></li> - <li><a href="http://www.apache.org/"><img id="-apache-feather" width="48" height="14" src="" alt="Apache"/></a></li> - </ul> - - <p id="-legal"> - Apache Qpid, Messaging built on AMQP; Copyright © 2015 - The Apache Software Foundation; Licensed under - the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache - License, Version 2.0</a>; Apache Qpid, Qpid, Qpid Proton, - Proton, Apache, the Apache feather logo, and the Apache Qpid - project logo are trademarks of The Apache Software - Foundation; All other marks mentioned may be trademarks or - registered trademarks of their respective owners - </p> - </div> - </div> - </div> - </body> -</html>
http://git-wip-us.apache.org/repos/asf/qpid-site/blob/fb1899b6/content/releases/qpid-cpp-0.34/cpp-broker/book/css/style.css ---------------------------------------------------------------------- diff --git a/content/releases/qpid-cpp-0.34/cpp-broker/book/css/style.css b/content/releases/qpid-cpp-0.34/cpp-broker/book/css/style.css deleted file mode 100644 index c681596..0000000 --- a/content/releases/qpid-cpp-0.34/cpp-broker/book/css/style.css +++ /dev/null @@ -1,279 +0,0 @@ -/* - * - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - * - */ -ul { - list-style-type:square; -} - -th { - font-weight: bold; -} - -.navfooter td { - font-size:10pt; -} - -.navheader td { - font-size:10pt; -} - -body { - margin:0; - background:#FFFFFF; - font-family:"Verdana", sans-serif; - font-size:10pt; -} - -.container { - width:950px; - margin:0 auto; -} - -body a { - color:#000000; -} - - -div.book { - margin-left:10pt; - margin-right:10pt; -} - -div.preface { - margin-left:10pt; - margin-right:10pt; -} - -div.chapter { - margin-left:10pt; - margin-right:10pt; -} - -div.section { - margin-left:10pt; - margin-right:10pt; -} - -div.titlepage { - margin-left:-10pt; - margin-right:-10pt; -} - -.calloutlist td { - font-size:10pt; -} - -.table-contents table { - border-spacing: 0px; -} - -.table-contents td { - font-size:10pt; - padding-left:6px; - padding-right:6px; -} - -div.breadcrumbs { - font-size:9pt; - margin-right:10pt; - padding-bottom:16px; -} - -.chapter h2.title { - font-size:20pt; - color:#0c3b82; -} - -.chapter .section h2.title { - font-size:18pt; - color:#0c3b82; -} - -.section h2.title { - font-size:16pt; - color:#0c3b82; -} - -.section h3.title { - font-size:14pt; - color:#0c3b82; -} - -.section h4.title { - font-size:12pt; - color:#0c3b82; -} - -.section h5.title { - font-size:12pt; - color:#0c3b82; -} - -.section h6.title { - font-size:12pt; - color:#0c3b82; -} - -.toc a { - font-size:9pt; -} - -.header { - height:100px; - width:950px; - background:url(http://qpid.apache.org/images/header.png) -} - -.logo { - text-align:center; - font-weight:600; - padding:0 0 0 0; - font-size:14px; - font-family:"Verdana", cursive; -} - -.logo a { - color:#000000; - text-decoration:none; -} - -.main_text_area { - margin-left:200px; -} - -.main_text_area_top { - height:14px; - font-size:1px; -} - -.main_text_area_bottom { - display:none; -/* height:14px; - margin-bottom:4px;*/ -} - -.main_text_area_body { - padding:5px 24px; -} - -.main_text_area_body p { - text-align:justify; -} - -.main_text_area br { - line-height:10px; -} - -.main_text_area h1 { - font-size:28px; - font-weight:600; - margin:0 0 24px 0; - color:#0c3b82; - font-family:"Verdana", Times, serif; -} - -.main_text_area h2 { - font-size:24px; - font-weight:600; - margin:24px 0 8px 0; - color:#0c3b82; - font-family:"Verdana",Times, serif; -} - -.main_text_area ol, .main_text_area ul { - padding:0; - margin:10px 0; - margin-left:20px; -} - -.main_text_area li { -/* margin-left:40px; */ -} - -.main_text_area, .menu_box { - font-size:13px; - line-height:17px; - color:#000000; -} - -.main_text_area { - font-size:14px; -} - -.main_text_area a { - color:#000000; -} - -.main_text_area a:hover { - color:#000000; -} - -.menu_box { - width:196px; - float:left; - margin-left:4px; -} - -.menu_box_top { - background:url(http://qpid.apache.org/images/menu_top.png) no-repeat; - height:14px; - font-size:1px; -} - -.menu_box_body { - background:url(http://qpid.apache.org/images/menu_body.png) repeat-y; - padding:5px 24px 5px 24px; -} - -.menu_box_bottom { - background:url(http://qpid.apache.org/images/menu_bottom.png) no-repeat; - height:14px; - font-size:1px; - margin-bottom:1px; -} - -.menu_box h3 { - font-size:20px; - font-weight:500; - margin:0 0 8px 0; - color:#0c3b82; - font-family:"Verdana",Times, serif; -} - -.menu_box ul { - margin:12px; - padding:0px; -} - -.menu_box li { - list-style:square; -} - -.menu_box a { - color:#000000; - text-decoration:none; -} - -.menu_box a:hover { - color:#000000; - text-decoration:underline; -} - - http://git-wip-us.apache.org/repos/asf/qpid-site/blob/fb1899b6/content/releases/qpid-cpp-0.34/cpp-broker/book/ha-queue-replication.html ---------------------------------------------------------------------- diff --git a/content/releases/qpid-cpp-0.34/cpp-broker/book/ha-queue-replication.html b/content/releases/qpid-cpp-0.34/cpp-broker/book/ha-queue-replication.html deleted file mode 100644 index 783fb2e..0000000 --- a/content/releases/qpid-cpp-0.34/cpp-broker/book/ha-queue-replication.html +++ /dev/null @@ -1,221 +0,0 @@ -<!DOCTYPE html> -<!-- - - - - Licensed to the Apache Software Foundation (ASF) under one - - or more contributor license agreements. See the NOTICE file - - distributed with this work for additional information - - regarding copyright ownership. The ASF licenses this file - - to you under the Apache License, Version 2.0 (the - - "License"); you may not use this file except in compliance - - with the License. You may obtain a copy of the License at - - - - http://www.apache.org/licenses/LICENSE-2.0 - - - - Unless required by applicable law or agreed to in writing, - - software distributed under the License is distributed on an - - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - - KIND, either express or implied. See the License for the - - specific language governing permissions and limitations - - under the License. - - ---> -<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> - <head> - <title>1.13. Replicating Queues with the HA module - Apache Qpid™</title> - <meta http-equiv="X-UA-Compatible" content="IE=edge"/> - <meta name="viewport" content="width=device-width, initial-scale=1.0"/> - <link rel="stylesheet" href="/site.css" type="text/css" async="async"/> - <link rel="stylesheet" href="/deferred.css" type="text/css" defer="defer"/> - <script type="text/javascript">var _deferredFunctions = [];</script> - <script type="text/javascript" src="/deferred.js" defer="defer"></script> - <!--[if lte IE 8]> - <link rel="stylesheet" href="/ie.css" type="text/css"/> - <script type="text/javascript" src="/html5shiv.js"></script> - <![endif]--> - - <!-- Redirects for `go get` and godoc.org --> - <meta name="go-import" - content="qpid.apache.org git https://git-wip-us.apache.org/repos/asf/qpid-proton.git"/> - <meta name="go-source" - content="qpid.apache.org -https://github.com/apache/qpid-proton/blob/go1/README.md -https://github.com/apache/qpid-proton/tree/go1{/dir} -https://github.com/apache/qpid-proton/blob/go1{/dir}/{file}#L{line}"/> - </head> - <body> - <div id="-content"> - <div id="-top" class="panel"> - <a id="-menu-link"><img width="16" height="16" src="" alt="Menu"/></a> - - <a id="-search-link"><img width="22" height="16" src="" alt="Search"/></a> - - <ul id="-global-navigation"> - <li><a id="-logotype" href="/index.html">Apache Qpid<sup>™</sup></a></li> - <li><a href="/documentation.html">Documentation</a></li> - <li><a href="/download.html">Download</a></li> - <li><a href="/discussion.html">Discussion</a></li> - </ul> - </div> - - <div id="-menu" class="panel" style="display: none;"> - <div class="flex"> - <section> - <h3>Project</h3> - - <ul> - <li><a href="/overview.html">Overview</a></li> - <li><a href="/components/index.html">Components</a></li> - <li><a href="/releases/index.html">Releases</a></li> - </ul> - </section> - - <section> - <h3>Messaging APIs</h3> - - <ul> - <li><a href="/proton/index.html">Qpid Proton</a></li> - <li><a href="/components/jms/index.html">Qpid JMS</a></li> - <li><a href="/components/messaging-api/index.html">Qpid Messaging API</a></li> - </ul> - </section> - - <section> - <h3>Servers and tools</h3> - - <ul> - <li><a href="/components/broker-j/index.html">Broker-J</a></li> - <li><a href="/components/cpp-broker/index.html">C++ broker</a></li> - <li><a href="/components/dispatch-router/index.html">Dispatch router</a></li> - </ul> - </section> - - <section> - <h3>Resources</h3> - - <ul> - <li><a href="/dashboard.html">Dashboard</a></li> - <li><a href="https://cwiki.apache.org/confluence/display/qpid/Index">Wiki</a></li> - <li><a href="/resources.html">More resources</a></li> - </ul> - </section> - </div> - </div> - - <div id="-search" class="panel" style="display: none;"> - <form action="http://www.google.com/search" method="get"> - <input type="hidden" name="sitesearch" value="qpid.apache.org"/> - <input type="text" name="q" maxlength="255" autofocus="autofocus" tabindex="1"/> - <button type="submit">Search</button> - <a href="/search.html">More ways to search</a> - </form> - </div> - - <div id="-middle" class="panel"> - <ul id="-path-navigation"><li><a href="/index.html">Home</a></li><li><a href="/releases/index.html">Releases</a></li><li><a href="/releases/qpid-cpp-0.34/index.html">Qpid C++ 0.34</a></li><li><a href="/releases/qpid-cpp-0.34/cpp-broker/book/index.html">AMQP Messaging Broker (Implemented in C++)</a></li><li>1.13. Replicating Queues with the HA module</li></ul> - - <div id="-middle-content"> - <div class="docbook"><div class="navheader"><table summary="Navigation header" width="100%"><tr><th align="center" colspan="3">1.13. Replicating Queues with the HA module</th></tr><tr><td align="left" width="20%"><a accesskey="p" href="chapter-ha.html">Prev</a> </td><th align="center" width="60%">Chapter 1.  - Running the AMQP Messaging Broker - </th><td align="right" width="20%"> <a accesskey="n" href="chapter-Managing-CPP-Broker.html">Next</a></td></tr></table><hr /></div><div class="section"><div class="titlepage"><div><div><h2 class="title"><a id="ha-queue-replication"></a>1.13. Replicating Queues with the HA module</h2></div></div></div><p> - As well as support for an active-passive cluster, the - HA module allows you to replicate individual queues, - even if the brokers are not in a cluster. The <em class="firstterm">original</em> - queue is used as normal. The <em class="firstterm">replica</em> queue is - updated automatically as messages are added to or removed from the original - queue. - </p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p> - It is not safe to modify the replica queue - other than via the automatic updates from the original. Adding or removing - messages on the replica queue will make replication inconsistent and may - cause message loss. - The HA module does <span class="emphasis"><em>not</em></span> enforce - restricted access to the replica queue (as it does in the case of a cluster) - so it is up to the application to ensure the replica is not used until it has - been disconnected from the original. - </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="idm221066074928"></a>1.13.1. Replicating queues</h3></div></div></div><p> - To create a replica queue, the HA module must be - loaded on both the original and replica brokers (it is loaded by default.) - You also need to set the configuration option: - </p><pre class="programlisting"> - ha-queue-replication=yes - </pre><p> - to enable this feature on a stand-alone broker. It is automatically - enabled for brokers that are part of a cluster. - </p><p> - Suppose that <span class="command"><strong>myqueue</strong></span> is a queue on - <span class="command"><strong>node1</strong></span> and we want to create a replica of - <span class="command"><strong>myqueue</strong></span> on <span class="command"><strong>node2</strong></span> (where both brokers - are using the default AMQP port.) This is accomplished by the command: - </p><pre class="programlisting"> - qpid-config --broker=node2 add queue --start-replica node1 myqueue - </pre><p> - If <span class="command"><strong>myqueue</strong></span> already exists on the replica - broker you can start replication from the original queue like this: - </p><pre class="programlisting"> - qpid-ha replicate -b node2 node1 myqueue - </pre><p> - </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="idm221064396992"></a>1.13.2. Replicating queues between clusters</h3></div></div></div><p> - You can replicate queues between two standalone brokers, between a - standalone broker and a cluster, or between two clusters (see <a class="xref" href="chapter-ha.html" title="1.12. Active-Passive Messaging Clusters">Section 1.12, “Active-Passive Messaging Clusters”</a>.) For failover in a cluster there are two cases to - consider. - </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p> - When the <span class="emphasis"><em>original</em></span> queue is on the active node - of a cluster, failover is automatic. If the active node - fails, the replication link will automatically reconnect and the - replica will continue to be updated from the new primary. - </p></li><li class="listitem"><p> - When the <span class="emphasis"><em>replica</em></span> queue is on the active node of a - cluster, there is no automatic failover. However you can use the - following workaround. - </p></li></ol></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="idm221064086896"></a>1.13.2.1. Work around for fail-over of replica queue in a cluster</h4></div></div></div><p> - When a primary broker fails the cluster resource manager calls a script - to promote a backup broker to be the new primary. By default this script - is <code class="filename">/etc/init.d/qpidd-primary</code> but you can modify - that in your <code class="filename">cluster.conf</code> file (see <a class="xref" href="chapter-ha.html#ha-rm-config" title="1.12.5. Configuring with rgmanager as resource manager">Section 1.12.5, “Configuring with <span class="command"><strong>rgmanager</strong></span> as resource manager”</a>.) - </p><p> - You can modify this script (on each host in your cluster) by adding - commands to create your replica queues just before the broker is - promoted, as indicated in the following exceprt from the script: - </p><pre class="programlisting"> -start() { - service qpidd start - echo -n $"Promoting qpid daemon to cluster primary: " - ################################ - #### Add your commands here #### - ################################ - $QPID_HA -b localhost:$QPID_PORT promote - [ "$?" -eq 0 ] && success || failure -} - </pre><p> - Your commands will be run, and your replicas created, whenever - the system fails over to a new primary. - </p></div></div></div><div class="navfooter"><hr /><table summary="Navigation footer" width="100%"><tr><td align="left" width="40%"><a accesskey="p" href="chapter-ha.html">Prev</a> </td><td align="center" width="20%"><a accesskey="u" href="ch01.html">Up</a></td><td align="right" width="40%"> <a accesskey="n" href="chapter-Managing-CPP-Broker.html">Next</a></td></tr><tr><td align="left" valign="top" width="40%">1.12. Active-Passive Messaging Clusters </td><td align="center" width="20%"><a accesskey="h" href="index.html">Home</a></td><td align="right" valign="top" width="40%"> Chapter 2.  - Managing the AMQP Messaging Broker - </td></tr></table></div></div> - - <hr/> - - <ul id="-apache-navigation"> - <li><a href="http://www.apache.org/">Apache</a></li> - <li><a href="http://www.apache.org/licenses/">License</a></li> - <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> - <li><a href="http://www.apache.org/foundation/thanks.html">Thanks!</a></li> - <li><a href="/security.html">Security</a></li> - <li><a href="http://www.apache.org/"><img id="-apache-feather" width="48" height="14" src="" alt="Apache"/></a></li> - </ul> - - <p id="-legal"> - Apache Qpid, Messaging built on AMQP; Copyright © 2015 - The Apache Software Foundation; Licensed under - the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache - License, Version 2.0</a>; Apache Qpid, Qpid, Qpid Proton, - Proton, Apache, the Apache feather logo, and the Apache Qpid - project logo are trademarks of The Apache Software - Foundation; All other marks mentioned may be trademarks or - registered trademarks of their respective owners - </p> - </div> - </div> - </div> - </body> -</html> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@qpid.apache.org For additional commands, e-mail: commits-h...@qpid.apache.org