[devel] [PATCH 2/5] fmd: add configuration parameters [#2996]

2019-01-20 Thread Gary Lee
Add parameters FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE and
FMS_RELAXED_NODE_PROMOTION.
---
 src/fm/fmd/fmd.conf | 17 +
 1 file changed, 17 insertions(+)

diff --git a/src/fm/fmd/fmd.conf b/src/fm/fmd/fmd.conf
index 9a106bf..209e484 100644
--- a/src/fm/fmd/fmd.conf
+++ b/src/fm/fmd/fmd.conf
@@ -30,6 +30,23 @@ export FMS_TAKEOVER_REQUEST_VALID_TIME=20
 # Full path to key-value store plugin
 #export FMS_KEYVALUE_STORE_PLUGIN_CMD=
 
+# In the event of SCs being split into network partitions, we can try to make
+# the active SC reside in the largest network partition. If it is preferable
+# to keep the current SC active, then set this to 0
+# Default is 1
+#export FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE=1
+
+# Default behaviour is not to allow promotion of this node to Active
+# unless a lock can be obtained, if split brain prevention is enabled.
+# Uncomment the next line to allow promotion of this node at cluster startup,
+# if a peer SC can be seen and we have a lower node ID, in the event the
+# consensus service is not available.
+# Also if the consensus service is down, but a peer SC can be seen,
+# then an active SC may remain active.
+# This mode should not be used together with the roaming SC feature
+# Default is 0
+#export FMS_RELAXED_NODE_PROMOTION=0
+
 # FM will supervise transitions to the ACTIVE role when this variable is set to
 # a non-zero value. The value is the time in the unit of 10 ms to wait for a
 # role change to ACTIVE to take effect. If AMF has not give FM an active
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 4/5] amfd: allow node to remain active is peer SC can be seen [#2996]

2019-01-20 Thread Gary Lee
If relaxed node promotion is enabled, allow a SC to remain
active if the peer SC can be seen, even if access to the
consensus service is lost.
---
 src/amf/amfd/ndfsm.cc  |  2 +-
 src/amf/amfd/ndproc.cc | 13 +++--
 src/amf/amfd/proc.h|  2 +-
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc
index 4146ddc..8c8f3c5 100644
--- a/src/amf/amfd/ndfsm.cc
+++ b/src/amf/amfd/ndfsm.cc
@@ -817,7 +817,7 @@ void avd_mds_avnd_down_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
   if (cb->node_failover_delay == 0) {
 avd_node_failover(node);
   }
-  check_quorum();
+  check_quorum(cb);
   node->node_info.member = SA_FALSE;
   // Update standby out of sync if standby sc goes down
   if (avd_cb->node_id_avd_other == node->node_info.nodeId) {
diff --git a/src/amf/amfd/ndproc.cc b/src/amf/amfd/ndproc.cc
index c4eebb1..ec347fc 100644
--- a/src/amf/amfd/ndproc.cc
+++ b/src/amf/amfd/ndproc.cc
@@ -1245,15 +1245,24 @@ void avd_node_failover(AVD_AVND *node, const bool 
mw_only) {
   TRACE_LEAVE();
 }
 
-void check_quorum() {
+void check_quorum(AVD_CL_CB *cb) {
   TRACE_ENTER();
 
   Consensus consensus_service;
   if (consensus_service.IsRemoteFencingEnabled() == false &&
   consensus_service.IsWritable() == false) {
+// if relaxed mode is enabled, ignore failure if peer SC is up
+if (consensus_service.IsRelaxedNodePromotionEnabled() == true) {
+  AVD_AVND* peer = avd_node_find_nodeid(cb->node_id_avd_other);
+  if (peer != nullptr && peer->node_state == AVD_AVND_STATE_PRESENT) {
+LOG_NO("Relaxed node promotion is enabled, peer SC is connected");
+return;
+  }
+}
+
 // remote fencing is disabled and we have lost write access
 // reboot this node to prevent split brain
 opensaf_reboot(0, nullptr,
   "Quorum lost. Rebooting this node to prevent split-brain");
   }
-}
\ No newline at end of file
+}
diff --git a/src/amf/amfd/proc.h b/src/amf/amfd/proc.h
index a378218..f1dc7ba 100644
--- a/src/amf/amfd/proc.h
+++ b/src/amf/amfd/proc.h
@@ -96,7 +96,7 @@ void avd_process_hb_event(AVD_CL_CB *cb_now, struct AVD_EVT 
*evt);
 extern void avd_node_mark_absent(AVD_AVND *node);
 extern void avd_tmr_snd_hb_evh(AVD_CL_CB *cb, AVD_EVT *evt);
 extern void avd_node_failover(AVD_AVND *node, const bool mw_only = false);
-extern void check_quorum();
+extern void check_quorum(AVD_CL_CB *cb);
 extern AVD_SU *get_other_su_from_oper_list(AVD_SU *su);
 extern void su_complete_admin_op(AVD_SU *su, SaAisErrorT result);
 extern void comp_complete_admin_op(AVD_COMP *comp, SaAisErrorT result);
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 3/5] osaf: allow active SC to be preferred during network split [#2996]

2019-01-20 Thread Gary Lee
Add FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE option to allow
active SC to be preferred during a network split. The default
behavior is to prefer the larger partition to maintain
existing behaviour.

Add configuration support for FMS_RELAXED_NODE_PROMOTION.
---
 src/osaf/consensus/consensus.cc | 39 ---
 src/osaf/consensus/consensus.h  |  9 +++--
 src/osaf/consensus/key_value.cc |  8 ++--
 3 files changed, 49 insertions(+), 7 deletions(-)

diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc
index 112af7d..5304c4f 100644
--- a/src/osaf/consensus/consensus.cc
+++ b/src/osaf/consensus/consensus.cc
@@ -64,6 +64,7 @@ SaAisErrorT Consensus::PromoteThisNode(const bool 
graceful_takeover,
cluster_size);
 if (rc != SA_AIS_OK) {
   LOG_WA("Takeover request failed (%d)", rc);
+  rc = SA_AIS_ERR_EXIST;
   return rc;
 }
 take_over_request_created = true;
@@ -99,7 +100,7 @@ SaAisErrorT Consensus::PromoteThisNode(const bool 
graceful_takeover,
   if (rc == SA_AIS_OK) {
 LOG_NO("Active controller set to %s", base::Conf::NodeName().c_str());
   } else {
-LOG_ER("Failed to promote this node (%u)", rc);
+LOG_WA("Failed to promote this node (%u)", rc);
   }
 
   return rc;
@@ -197,6 +198,10 @@ bool Consensus::IsWritable() const {
 
 bool Consensus::IsRemoteFencingEnabled() const { return use_remote_fencing_; }
 
+bool Consensus::IsRelaxedNodePromotionEnabled() const {
+  return relaxed_node_promotion_;
+}
+
 std::string Consensus::CurrentActive() const {
   TRACE_ENTER();
   if (use_consensus_ == false) {
@@ -228,6 +233,10 @@ Consensus::Consensus() {
   uint32_t split_brain_enable = base::GetEnv("FMS_SPLIT_BRAIN_PREVENTION", 0);
   std::string kv_store_cmd = base::GetEnv("FMS_KEYVALUE_STORE_PLUGIN_CMD", "");
   uint32_t use_remote_fencing = base::GetEnv("FMS_USE_REMOTE_FENCING", 0);
+  uint32_t prioritise_partition_size =
+base::GetEnv("FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE", 1);
+  uint32_t relaxed_node_promotion =
+base::GetEnv("FMS_RELAXED_NODE_PROMOTION", 0);
 
   // if not specified in fmd.conf,
   // takeover requests are valid for 20 seconds
@@ -246,6 +255,14 @@ Consensus::Consensus() {
 use_remote_fencing_ = true;
   }
 
+  if (prioritise_partition_size == 1) {
+prioritise_partition_size_ = true;
+  }
+
+  if (use_consensus_ == true && relaxed_node_promotion == 1) {
+relaxed_node_promotion_ = true;
+  }
+
   // needed for base::Conf::NodeName() later
   base::Conf::InitNodeName();
 }
@@ -373,6 +390,10 @@ SaAisErrorT Consensus::CreateTakeoverRequest(const 
std::string& current_owner,
 return CreateTakeoverRequest(current_owner, proposed_owner, cluster_size);
   }
 
+  if (rc != SA_AIS_OK) {
+ return rc;
+  }
+
   // wait up to max_takeover_retry seconds for request to be answered
   retries = 0;
   while (retries < max_takeover_retry) {
@@ -546,9 +567,21 @@ Consensus::TakeoverState Consensus::HandleTakeoverRequest(
   LOG_NO("Other network size: %" PRIu64 ", our network size: %" PRIu64,
  proposed_cluster_size, cluster_size);
 
+  const std::string state_str =
+tokens[static_cast(TakeoverElements::STATE)];
+
   TakeoverState result;
-  if (proposed_cluster_size > cluster_size) {
-result = TakeoverState::ACCEPTED;
+  if (state_str !=
+TakeoverStateStr[static_cast(TakeoverState::NEW)]) {
+return TakeoverState::UNDEFINED;
+  }
+
+  if (prioritise_partition_size_ == true) {
+if (proposed_cluster_size > cluster_size) {
+  result = TakeoverState::ACCEPTED;
+} else {
+  result = TakeoverState::REJECTED;
+}
   } else {
 result = TakeoverState::REJECTED;
   }
diff --git a/src/osaf/consensus/consensus.h b/src/osaf/consensus/consensus.h
index 6421c7c..2fbd3bd 100644
--- a/src/osaf/consensus/consensus.h
+++ b/src/osaf/consensus/consensus.h
@@ -57,6 +57,9 @@ class Consensus {
   // Is remote fencing enabled?
   bool IsRemoteFencingEnabled() const;
 
+  // Is relaxed node promotion enabled?
+  bool IsRelaxedNodePromotionEnabled() const;
+
   Consensus();
   virtual ~Consensus();
 
@@ -66,7 +69,7 @@ class Consensus {
 UNDEFINED = 0,
 NEW = 1,
 ACCEPTED = 2,
-REJECTED = 3,
+REJECTED = 3
   };
 
   enum class TakeoverElements : std::uint8_t {
@@ -85,13 +88,15 @@ class Consensus {
  private:
   bool use_consensus_ = false;
   bool use_remote_fencing_ = false;
+  bool prioritise_partition_size_ = false;
+  bool relaxed_node_promotion_ = false;
   uint32_t takeover_valid_time;
   uint32_t max_takeover_retry;
   const std::string kTestKeyname = "opensaf_write_test";
   const std::chrono::milliseconds kSleepInterval =
   std::chrono::milliseconds(1000);  // in ms
   static constexpr uint32_t kLockTimeout = 0;  // lock is persistent by default
-  static constexpr uint32_t kMaxRetry = 30;
+  static constexpr uint32_t kMaxRetry = 3;
 
   void CheckForExistingTakeoverRequest();
 

[devel] [PATCH 5/5] rded: add relaxed node promotion feature [#2996]

2019-01-20 Thread Gary Lee
Allow promotion of node to active at cluster startup, even if the
consensus service is unavailable, if the peer SC can be seen.

During normal cluster operation, if the consensus service becomes
unavailable but the peer SC can still be seen, allow the existing
active SC to remain active.

A new NCSMDS_SVC_ID_RDE_DISCOVERY service ID is exported by rded.
This is installed as soon as rded is started, unlike
NCSMDS_SVC_ID_RDE which is only installed when it becomes
a candidate for election.
---
 src/mds/mds_papi.h   |  1 +
 src/rde/rded/rde_cb.h| 12 +-
 src/rde/rded/rde_main.cc | 71 +++
 src/rde/rded/rde_mds.cc  | 94 --
 src/rde/rded/role.cc | 97 +++-
 src/rde/rded/role.h  |  4 +-
 6 files changed, 256 insertions(+), 23 deletions(-)

diff --git a/src/mds/mds_papi.h b/src/mds/mds_papi.h
index 03d755d..7cd543c 100644
--- a/src/mds/mds_papi.h
+++ b/src/mds/mds_papi.h
@@ -191,6 +191,7 @@ typedef enum ncsmds_svc_id {
   NCSMDS_SVC_ID_PLMS = 37,
   NCSMDS_SVC_ID_PLMS_HRB = 38,
   NCSMDS_SVC_ID_PLMA = 39,
+  NCSMDS_SVC_ID_RDE_DISCOVERY = 40,
   NCSMDS_SVC_ID_NCSMAX, /* This mnemonic always last */
 
   /* The range below is for OpenSAF internal use */
diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h
index d3f5a24..9a0919c 100644
--- a/src/rde/rded/rde_cb.h
+++ b/src/rde/rded/rde_cb.h
@@ -34,6 +34,9 @@
  **
  */
 
+enum class State {kNotActive = 0, kNotActiveSeenPeer, kActiveElected,
+  kActiveElectedSeenPeer, kActiveFailover};
+
 struct RDE_CONTROL_BLOCK {
   SYSF_MBX mbx;
   NCSCONTEXT task_handle;
@@ -43,6 +46,9 @@ struct RDE_CONTROL_BLOCK {
   bool monitor_lock_thread_running{false};
   bool monitor_takeover_req_thread_running{false};
   std::set cluster_members{};
+  // used for discovering peer controllers, regardless of their role
+  std::set peer_controllers{};
+  State state{State::kNotActive};
 };
 
 enum RDE_MSG_TYPE {
@@ -54,7 +60,9 @@ enum RDE_MSG_TYPE {
   RDE_MSG_NODE_UP = 6,
   RDE_MSG_NODE_DOWN = 7,
   RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8,
-  RDE_MSG_ACTIVE_PROMOTION_SUCCESS = 9
+  RDE_MSG_ACTIVE_PROMOTION_SUCCESS = 9,
+  RDE_MSG_CONTROLLER_UP = 10,
+  RDE_MSG_CONTROLLER_DOWN = 11
 };
 
 struct rde_peer_info {
@@ -82,7 +90,9 @@ extern const char *rde_msg_name[];
 
 extern RDE_CONTROL_BLOCK *rde_get_control_block();
 extern uint32_t rde_mds_register();
+extern uint32_t rde_discovery_mds_register();
 extern uint32_t rde_mds_unregister();
+extern uint32_t rde_discovery_mds_unregister();
 extern uint32_t rde_mds_send(rde_msg *msg, MDS_DEST to_dest);
 extern uint32_t rde_set_role(PCS_RDA_ROLE role);
 
diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
index e5813e4..2d9aa51 100644
--- a/src/rde/rded/rde_main.cc
+++ b/src/rde/rded/rde_main.cc
@@ -39,6 +39,7 @@
 #include "osaf/consensus/consensus.h"
 #include "rde/rded/rde_cb.h"
 #include "rde/rded/role.h"
+#include "rde_cb.h"
 
 #define RDA_MAX_CLIENTS 32
 
@@ -56,7 +57,9 @@ const char *rde_msg_name[] = {"-",
   "RDE_MSG_NODE_UP(6)",
   "RDE_MSG_NODE_DOWN(7)",
   "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)",
-  "RDE_MSG_ACTIVE_PROMOTION_SUCCESS(9)"};
+  "RDE_MSG_ACTIVE_PROMOTION_SUCCESS(9)",
+  "RDE_MSG_CONTROLLER_UP(10)",
+  "RDE_MSG_CONTROLLER_DOWN(11)"};
 
 static RDE_CONTROL_BLOCK _rde_cb;
 static RDE_CONTROL_BLOCK *rde_cb = &_rde_cb;
@@ -157,6 +160,23 @@ static void handle_mbx_event() {
   rde_cb->cluster_members.erase(msg->fr_node_id);
   TRACE("cluster_size %zu", rde_cb->cluster_members.size());
   break;
+case RDE_MSG_CONTROLLER_UP:
+  if (msg->fr_node_id != own_node_id) {
+rde_cb->peer_controllers.insert(msg->fr_node_id);
+TRACE("peer_controllers: size %zu", rde_cb->peer_controllers.size());
+if (rde_cb->state == State::kNotActive) {
+  TRACE("Set state to kNotActiveSeenPeer");
+  rde_cb->state = State::kNotActiveSeenPeer;
+} else if (rde_cb->state == State::kActiveElected) {
+  TRACE("Set state to kActiveElectedSeenPeer");
+  rde_cb->state = State::kActiveElectedSeenPeer;
+}
+  }
+  break;
+case RDE_MSG_CONTROLLER_DOWN:
+  rde_cb->peer_controllers.erase(msg->fr_node_id);
+  TRACE("peer_controllers: size %zu", rde_cb->peer_controllers.size());
+  break;
 case RDE_MSG_TAKEOVER_REQUEST_CALLBACK: {
   rde_cb->monitor_takeover_req_thread_running = false;
 
@@ -179,13 +199,44 @@ static void handle_mbx_event() {
"Another controller is taking over the active role. 
"
"Rebooting this node");
   }
-} else {
-  LOG_NO("Rejected takeover request");
-
-  

[devel] [PATCH 1/5] osaf: update etcd3 to poll instead of watch [#2996]

2019-01-20 Thread Gary Lee
The 'watch' command does not return if the etcd server goes down.
We need to poll the etcd server to properly check we still have
connectivity to the etcd server.
---
 src/osaf/consensus/plugins/etcd3.plugin | 50 ++---
 1 file changed, 40 insertions(+), 10 deletions(-)

diff --git a/src/osaf/consensus/plugins/etcd3.plugin 
b/src/osaf/consensus/plugins/etcd3.plugin
index b3814c9..4998df0 100644
--- a/src/osaf/consensus/plugins/etcd3.plugin
+++ b/src/osaf/consensus/plugins/etcd3.plugin
@@ -17,9 +17,12 @@
 # backward compatible. This plugin may need to be adapted.
 
 readonly keyname="opensaf_consensus_lock"
+readonly takeover_request="takeover_request"
+readonly node_name_file="/etc/opensaf/node_name"
 readonly directory="/opensaf/"
 readonly etcd_options=""
-readonly etcd_timeout="10s"
+readonly etcd_timeout="3s"
+readonly heartbeat_interval=2
 
 export ETCDCTL_API=3
 
@@ -29,7 +32,8 @@ export ETCDCTL_API=3
 #   $1 - 
 # returns:
 #   0 - success,  is echoed to stdout
-#   non-zero - failure
+#   1 - invalid param
+#   other - failure
 get() {
   readonly key="$1"
 
@@ -51,7 +55,7 @@ get() {
   return 1
 fi
   else
-return 1
+return 2
   fi
 }
 
@@ -101,7 +105,8 @@ setkey() {
 # returns:
 #   0 - success
 #   1 - already exists
-#   2 or above - other failure
+#   2 - invalid param
+#   3 or above - other failure
 create_key() {
   readonly key="$1"
   readonly value="$2"
@@ -114,7 +119,7 @@ create_key() {
   lease_id=$(echo $output | awk '{print $2}')
   lease_param="--lease="$lease_id""
 else
-  return 2
+  return 3
 fi
   else
 lease_param=""
@@ -135,7 +140,7 @@ create_key() {
   then
 return 1
   else
-return 2
+return 3
   fi
 }
 
@@ -149,6 +154,7 @@ create_key() {
 #   $4 - 
 # returns:
 #   0 - success
+#   1 - invalid param
 #   non-zero - failure
 setkey_match_prev() {
   readonly key="$1"
@@ -326,10 +332,34 @@ unlock() {
 #   non-zero - failure
 watch() {
   readonly watch_key="$1"
-  etcdctl $etcd_options --dial-timeout $etcd_timeout \
-watch "$directory$watch_key" | grep -m0 \"\" 2>&1
-  get "$watch_key"
-  return 0
+
+  # get baseline
+  orig_value=$(get "$watch_key")
+  result=$?
+
+  if [ "$result" -le "1" ]; then
+while true
+do
+  sleep $heartbeat_interval
+  current_value=$(get "$watch_key")
+  result=$?
+  if [ "$result" -gt "1" ]; then
+# etcd down?
+if [ "$watch_key" == "$takeover_request" ]; then
+  hostname=`cat $node_name_file`
+  echo "$hostname SC-0 1000 UNDEFINED"
+  return 0
+else
+  return 1
+fi
+  elif [ "$orig_value" != "$current_value" ]; then
+echo $current_value
+return 0
+  fi
+done
+  fi
+
+  return 1
 }
 
 # argument parsing
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/5] Review Request for rded: add relaxed node promotion feature [#2996]

2019-01-20 Thread Gary Lee
Summary: rded: add relaxed node promotion feature [#2996]
Review request for Ticket(s): 2996
Peer Reviewer(s): Hans, Minh
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2996
Base revision: 35035599567d1add6975a89f1286f20738d67bf1
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 9a681198810be2e2ad3f512ff966fe1d9eceb1ab
Author: Gary Lee 
Date:   Mon, 21 Jan 2019 14:35:49 +1100

rded: add relaxed node promotion feature [#2996]

Allow promotion of node to active at cluster startup, even if the
consensus service is unavailable, if the peer SC can be seen.

During normal cluster operation, if the consensus service becomes
unavailable but the peer SC can still be seen, allow the existing
active SC to remain active.

A new NCSMDS_SVC_ID_RDE_DISCOVERY service ID is exported by rded.
This is installed as soon as rded is started, unlike
NCSMDS_SVC_ID_RDE which is only installed when it becomes
a candidate for election.



revision d2fad05f5ab3b502403493763f5f2bb31608444f
Author: Gary Lee 
Date:   Mon, 21 Jan 2019 14:35:49 +1100

amfd: allow node to remain active is peer SC can be seen [#2996]

If relaxed node promotion is enabled, allow a SC to remain
active if the peer SC can be seen, even if access to the
consensus service is lost.



revision 4e1bbbd4997a6ea8307695e81a64dd9c53da15aa
Author: Gary Lee 
Date:   Mon, 21 Jan 2019 14:35:42 +1100

osaf: allow active SC to be preferred during network split [#2996]

Add FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE option to allow
active SC to be preferred during a network split. The default
behavior is to prefer the larger partition to maintain
existing behaviour.

Add configuration support for FMS_RELAXED_NODE_PROMOTION.



revision 7b50ffd37aafb82e71c726781824f8d6883c5aa5
Author: Gary Lee 
Date:   Mon, 21 Jan 2019 14:27:38 +1100

fmd: add configuration parameters [#2996]

Add parameters FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE and
FMS_RELAXED_NODE_PROMOTION.



revision 1bb52d591e6014e013c8335f7f1a1f516ecc8566
Author: Gary Lee 
Date:   Mon, 21 Jan 2019 14:01:08 +1100

osaf: update etcd3 to poll instead of watch [#2996]

The 'watch' command does not return if the etcd server goes down.
We need to poll the etcd server to properly check we still have
connectivity to the etcd server.



Complete diffstat:
--
 src/amf/amfd/ndfsm.cc   |  2 +-
 src/amf/amfd/ndproc.cc  | 13 -
 src/amf/amfd/proc.h |  2 +-
 src/fm/fmd/fmd.conf | 17 ++
 src/mds/mds_papi.h  |  1 +
 src/osaf/consensus/consensus.cc | 39 -
 src/osaf/consensus/consensus.h  |  9 ++-
 src/osaf/consensus/key_value.cc |  8 ++-
 src/osaf/consensus/plugins/etcd3.plugin | 50 +
 src/rde/rded/rde_cb.h   | 12 +++-
 src/rde/rded/rde_main.cc| 71 +---
 src/rde/rded/rde_mds.cc | 94 ++--
 src/rde/rded/role.cc| 97 +
 src/rde/rded/role.h |  4 +-
 14 files changed, 375 insertions(+), 44 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
Ack from any reviewer, or in 1 week

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e.