Hi!

See my replies below.

regards,
Anders Widell

On 03/29/2016 02:55 PM, ramesh betham wrote:
> My initial comments..
>
>      1. Can avoid peer discovery waiting state to certain fixed period
>         by defining the spare-count in xml. What do you say..
>
I am not sure what you mean here. Do you propose RDE should read and 
parse imm.xml to figure out how many system controllers have been 
configured? Remember that if PBE is disabled, then the configuration 
stored in the imm.xml file might be different from the current IMM 
configuration (which is stored in RAM on existing nodes in the cluster). 
And if PBE is enabled, we should read the sqlite imm.db file instead of 
imm.xml.

In any case, the peer discovery time is not really depending on the 
number of system controller nodes. The only advantage of knowing the 
number of system controller nodes is that we can stop waiting for more 
nodes when we have discovered them all, instead of always waiting for 
the timer to expire. But this is not a big deal, since the peer 
discovery will only happen in the following three cases:

1. Initial cluster start
2. Cluster restart
3. Simultaneous loss of both system controller nodes (only applicable 
when IMMSV_SC_ABSENCE_ALLOWED is non-zero)

So yes, these three cases will be a couple of seconds slower than 
before, but they should happen very rarely and I don't think it is worth 
optimising the peer discovery in this way to shave off a few seconds. As 
I pointed out, reading the IMM configuration will not be easy. If we 
would like to do this, then I think the only way is to add an option in 
/etc/opensaf/rde.conf where you can configure the maximum possible 
number of system controller nodes. That should be fairly easy, but then 
there is a risk for misconfiguration - i.e. someone actually configures 
more nodes than RDE expects. Such a misconfiguration will be difficult 
to detect: it will probably work most of the time but there is a 
significantly higher risk for split-brain.

>     1.
>
>
>      2. What is the idea behind introducing QUIESCING state when it is
>         not used?
>
No you are right, it is not used. I only added it for completeness, but 
it is not necessary really.
>
>     1.
>
> Regards,
> Ramesh.
>
> On 2/29/2016 9:09 PM, Anders Widell wrote:
>>   Makefile.am                                          |    3 +-
>>   opensaf.spec.in                                      |    1 +
>>   osaf/services/infrastructure/rde/Makefile.am         |    3 +-
>>   osaf/services/infrastructure/rde/config/rde.conf     |    3 +
>>   osaf/services/infrastructure/rde/include/Makefile.am |    3 +-
>>   osaf/services/infrastructure/rde/include/rde_amf.h   |    5 +
>>   osaf/services/infrastructure/rde/include/rde_cb.h    |   12 +-
>>   osaf/services/infrastructure/rde/include/rde_rda.h   |    3 +
>>   osaf/services/infrastructure/rde/include/role.h      |   60 +++
>>   osaf/services/infrastructure/rde/rde_amf.cc          |   42 +-
>>   osaf/services/infrastructure/rde/rde_main.cc         |  323 
>> +++++-------------
>>   osaf/services/infrastructure/rde/rde_mds.cc          |   76 ++-
>>   osaf/services/infrastructure/rde/rde_rda.cc          |    7 +-
>>   osaf/services/infrastructure/rde/role.cc             |  144 ++++++++
>>   scripts/opensaf_sc_active                            |   30 +
>>   15 files changed, 439 insertions(+), 276 deletions(-)
>>
>>
>> Without changing the network protocol, generalize the role selection 
>> algorithm
>> used by RDE so that it works with more than two nodes. Nodes are now 
>> initially
>> given the QUIESCED role. When there are no ACTIVE or STANDBY nodes, the CLM 
>> node
>> agent will initiate an election of a new ACTIVE node. RDE on the winning node
>> will set its role to ACTIVE.
>>
>> There is now also a new customizable script which is executed on a node 
>> before
>> it becomes ACTIVE. The purpose of this script is to perform necessary
>> preparations (if any) needed before a node can take on the ACTIVE role. For
>> example, shared files system may have to be mounted.
>>
>> diff --git a/Makefile.am b/Makefile.am
>> --- a/Makefile.am
>> +++ b/Makefile.am
>> @@ -187,7 +187,8 @@ nodist_pkgsysconf_DATA = \
>>   osaf_execbindir = $(pkglibdir)
>>   dist_osaf_execbin_SCRIPTS = \
>>      $(top_srcdir)/scripts/opensaf_reboot \
>> -    $(top_srcdir)/scripts/opensaf_scale_out
>> +    $(top_srcdir)/scripts/opensaf_scale_out \
>> +    $(top_srcdir)/scripts/opensaf_sc_active
>>   
>>   if ENABLE_RPM_TARGET
>>   
>> diff --git a/opensaf.spec.in b/opensaf.spec.in
>> --- a/opensaf.spec.in
>> +++ b/opensaf.spec.in
>> @@ -948,6 +948,7 @@ fi
>>   %{_pkglibdir}/opensafd
>>   %{_pkglibdir}/opensaf_reboot
>>   %{_pkglibdir}/opensaf_scale_out
>> +%{_pkglibdir}/opensaf_sc_active
>>   %{_pkglibdir}/configure_tipc
>>   
>>   
>> diff --git a/osaf/services/infrastructure/rde/Makefile.am 
>> b/osaf/services/infrastructure/rde/Makefile.am
>> --- a/osaf/services/infrastructure/rde/Makefile.am
>> +++ b/osaf/services/infrastructure/rde/Makefile.am
>> @@ -31,7 +31,8 @@ osafrded_SOURCES = \
>>      rde_amf.cc \
>>      rde_main.cc \
>>      rde_mds.cc \
>> -    rde_rda.cc
>> +    rde_rda.cc \
>> +    role.cc
>>   
>>   osafrded_LDADD = \
>>      $(top_builddir)/osaf/libs/core/libopensaf_core.la \
>> diff --git a/osaf/services/infrastructure/rde/config/rde.conf 
>> b/osaf/services/infrastructure/rde/config/rde.conf
>> --- a/osaf/services/infrastructure/rde/config/rde.conf
>> +++ b/osaf/services/infrastructure/rde/config/rde.conf
>> @@ -10,6 +10,9 @@
>>   # waiting to discover a peer. Default is 2s.
>>   #export RDE_DISCOVER_PEER_TIMEOUT=5000
>>   
>> +# Maximum execution time in milliseconds for the opensaf_sc_active script.
>> +#export RDE_PRE_ACTIVE_SCRIPT_TIMEOUT=5000
>> +
>>   # Healthcheck keys
>>   export RDE_HA_ENV_HEALTHCHECK_KEY="Default"
>>   
>> diff --git a/osaf/services/infrastructure/rde/include/Makefile.am 
>> b/osaf/services/infrastructure/rde/include/Makefile.am
>> --- a/osaf/services/infrastructure/rde/include/Makefile.am
>> +++ b/osaf/services/infrastructure/rde/include/Makefile.am
>> @@ -22,4 +22,5 @@ noinst_HEADERS = \
>>      rde_amf.h \
>>      rde_cb.h \
>>      rde_rda.h \
>> -    rde_rda_common.h
>> +    rde_rda_common.h \
>> +    role.h
>> diff --git a/osaf/services/infrastructure/rde/include/rde_amf.h 
>> b/osaf/services/infrastructure/rde/include/rde_amf.h
>> --- a/osaf/services/infrastructure/rde/include/rde_amf.h
>> +++ b/osaf/services/infrastructure/rde/include/rde_amf.h
>> @@ -44,6 +44,8 @@
>>   #include "saAis.h"
>>   #include "saAmf.h"
>>   
>> +class Role;
>> +
>>   /*
>>    * Macro used to get the AMF version used
>>    */
>> @@ -56,11 +58,14 @@ struct RDE_AMF_CB {
>>      char comp_name[256];
>>      SaAmfHandleT amf_hdl;   /* AMF handle */
>>      SaSelectionObjectT amf_fd;      /* AMF selection fd */
>> +    Role* role;
>>      bool is_amf_up; /* For amf_fd and pipe_fd */
>>      bool nid_started;       /**< true if started by NID */
>>   
>>   };
>>   
>>   extern uint32_t rde_amf_init(RDE_AMF_CB *rde_amf_cb);
>> +extern SaAisErrorT internal_csi_set_callback(SaInvocationT invocation,
>> +    SaAmfHAStateT new_haState);
>>   
>>   #endif
>> diff --git a/osaf/services/infrastructure/rde/include/rde_cb.h 
>> b/osaf/services/infrastructure/rde/include/rde_cb.h
>> --- a/osaf/services/infrastructure/rde/include/rde_cb.h
>> +++ b/osaf/services/infrastructure/rde/include/rde_cb.h
>> @@ -25,6 +25,12 @@
>>   #include "rde_amf.h"
>>   #include "rde_rda.h"
>>   #include "rda_papi.h"
>> +#include "osaf_utility.h"
>> +
>> +/* The value to put in the PATH environment variable when calling the
>> +   SC active script */
>> +#define SC_ACTIVE_PATH_ENV "/usr/local/sbin:/usr/local/bin:/usr/sbin:" \
>> +    "/usr/bin:/sbin:/bin"
>>   
>>   /*
>>   **  RDE_CONTROL_BLOCK
>> @@ -41,8 +47,6 @@ struct RDE_CONTROL_BLOCK {
>>      bool fabric_interface;
>>      uint32_t select_timeout;
>>   
>> -    PCS_RDA_ROLE ha_role;
>> -
>>      RDE_RDA_CB rde_rda_cb;
>>      RDE_AMF_CB rde_amf_cb;
>>   
>> @@ -70,7 +74,6 @@ struct rde_msg {
>>   };
>>   
>>   extern const char *rde_msg_name[];
>> -extern NCS_NODE_ID rde_my_node_id;
>>   
>>   
>> /*****************************************************************************\
>>    *                                                                         
>>     *
>> @@ -79,7 +82,8 @@ extern NCS_NODE_ID rde_my_node_id;
>>   
>> \*****************************************************************************/
>>   
>>   extern RDE_CONTROL_BLOCK *rde_get_control_block();
>> -extern uint32_t rde_mds_register(RDE_CONTROL_BLOCK *cb);
>> +extern uint32_t rde_mds_register();
>> +extern uint32_t rde_mds_unregister();
>>   extern uint32_t rde_mds_send(struct rde_msg *msg, MDS_DEST to_dest);
>>   extern uint32_t rde_set_role(PCS_RDA_ROLE role);
>>   
>> diff --git a/osaf/services/infrastructure/rde/include/rde_rda.h 
>> b/osaf/services/infrastructure/rde/include/rde_rda.h
>> --- a/osaf/services/infrastructure/rde/include/rde_rda.h
>> +++ b/osaf/services/infrastructure/rde/include/rde_rda.h
>> @@ -38,6 +38,8 @@
>>   #include <sys/socket.h>
>>   #include <sys/un.h>
>>   
>> +class Role;
>> +
>>   
>> /*****************************************************************************\
>>    *                                                                         
>>     *
>>    *   Constants and Enumerated Values                                       
>>     *
>> @@ -85,6 +87,7 @@ struct RDE_RDA_CB {
>>      int fd;                 /* File descriptor          */
>>      int flags;              /* Flags specified for open */
>>      int client_count;
>> +    Role* role;
>>      RDE_RDA_CLIENT clients[MAX_RDA_CLIENTS];
>>   
>>   } ;
>> diff --git a/osaf/services/infrastructure/rde/include/role.h 
>> b/osaf/services/infrastructure/rde/include/role.h
>> new file mode 100644
>> --- /dev/null
>> +++ b/osaf/services/infrastructure/rde/include/role.h
>> @@ -0,0 +1,60 @@
>> +/*      -*- OpenSAF  -*-
>> + *
>> + * (C) Copyright 2016 The OpenSAF Foundation
>> + *
>> + * This program is distributed in the hope that it will be useful, but
>> + * WITHOUT ANY WARRANTY; without even the implied warranty of 
>> MERCHANTABILITY
>> + * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
>> + * under the GNU Lesser General Public License Version 2.1, February 1999.
>> + * The complete license can be accessed from the following location:
>> + *http://opensource.org/licenses/lgpl-license.php
>> + * See the Copying file included with the OpenSAF distribution for full
>> + * licensing terms.
>> + *
>> + * Author(s): Ericsson AB
>> + *
>> + */
>> +
>> +#ifndef OPENSAF_OSAF_SERVICES_INFRASTRUCTURE_RDE_INCLUDE_ROLE_H_
>> +#define OPENSAF_OSAF_SERVICES_INFRASTRUCTURE_RDE_INCLUDE_ROLE_H_
>> +
>> +#include <time.h>
>> +#include <stdint.h>
>> +#include "rda_papi.h"
>> +#include "mds_papi.h"
>> +#include "base/macros.h"
>> +
>> +namespace base {
>> +class Process;
>> +}
>> +
>> +class Role {
>> +  DELETE_COPY_AND_MOVE_OPERATORS(Role);
>> + public:
>> +  Role(NODE_ID own_node_id);
>> +  void SetPeerState(PCS_RDA_ROLE node_role, NODE_ID node_id);
>> +  timespec* Poll(timespec* ts);
>> +  uint32_t SetRole(PCS_RDA_ROLE new_role);
>> +  PCS_RDA_ROLE role() const;
>> +  static const char* to_string(PCS_RDA_ROLE role);
>> +
>> + private:
>> +  static const uint64_t kDefaultDiscoverPeerTimeout = 2000;
>> +  static const uint64_t kDefaultPreActiveScriptTimeout = 5000;
>> +  void ExecutePreActiveScript();
>> +  void ResetElectionTimer();
>> +  uint32_t UpdateMdsRegistration(PCS_RDA_ROLE new_role,
>> +                                 PCS_RDA_ROLE old_role);
>> +
>> +  PCS_RDA_ROLE role_;
>> +  NODE_ID own_node_id_;
>> +  base::Process* proc_;
>> +  timespec election_end_time_;
>> +  uint64_t discover_peer_timeout_;
>> +  uint64_t pre_active_script_timeout_;
>> +  static const char *const role_names_[];
>> +  static const char *const pre_active_script_;
>> +};
>> +
>> +#endif /* OPENSAF_OSAF_SERVICES_INFRASTRUCTURE_RDE_INCLUDE_ROLE_H_ */
>> +
>> diff --git a/osaf/services/infrastructure/rde/rde_amf.cc 
>> b/osaf/services/infrastructure/rde/rde_amf.cc
>> --- a/osaf/services/infrastructure/rde/rde_amf.cc
>> +++ b/osaf/services/infrastructure/rde/rde_amf.cc
>> @@ -15,10 +15,13 @@
>>    *
>>    */
>>   
>> +#include "rde_amf.h"
>> +
>>   #include <logtrace.h>
>>   #include <nid_start_util.h>
>>   
>>   #include "rde_cb.h"
>> +#include "role.h"
>>   
>>   static RDE_AMF_CB *rde_amf_get_cb()
>>   {
>> @@ -27,13 +30,24 @@ static RDE_AMF_CB *rde_amf_get_cb()
>>   }
>>   
>>   static void rde_saf_CSI_set_callback(SaInvocationT invocation,
>> -    const SaNameT *compName, SaAmfHAStateT haState, SaAmfCSIDescriptorT 
>> csiDescriptor)
>> +    const SaNameT *compName, SaAmfHAStateT new_haState,
>> +    SaAmfCSIDescriptorT csiDescriptor)
>>   {
>>      RDE_AMF_CB *rde_amf_cb = rde_amf_get_cb();
>> -
>> -    TRACE_ENTER();
>> -
>> -    (void) saAmfResponse(rde_amf_cb->amf_hdl, invocation, SA_AIS_OK);
>> +    TRACE_ENTER2("new_haState = %d, current = %d",
>> +                     static_cast<int>(new_haState),
>> +                     static_cast<int>(rde_amf_cb->role->role()));
>> +    uint32_t rc = NCSCC_RC_SUCCESS;
>> +    SaAisErrorT error = SA_AIS_OK;
>> +    if (new_haState != SA_AMF_HA_ACTIVE) {
>> +            if ((rc = rde_amf_cb->role->SetRole((PCS_RDA_ROLE) 
>> new_haState)) !=
>> +                    NCSCC_RC_SUCCESS) {
>> +                    LOG_ER("SetRole failed %u", (unsigned) rc);
>> +                    error = SA_AIS_ERR_FAILED_OPERATION;
>> +            }
>> +    }
>> +    error = saAmfResponse(rde_amf_cb->amf_hdl, invocation, error);
>> +    TRACE_LEAVE2("rc = %u, error = %d", (unsigned) rc, (int) error);
>>   }
>>   
>>   static void rde_saf_health_chk_callback(SaInvocationT invocation,
>> @@ -46,14 +60,19 @@ static void rde_saf_health_chk_callback(
>>      (void) saAmfResponse(rde_amf_cb->amf_hdl, invocation, SA_AIS_OK);
>>   }
>>   
>> -void rde_saf_CSI_rem_callback(SaInvocationT invocation,
>> -                          const SaNameT *compName, const SaNameT *csiName, 
>> const SaAmfCSIFlagsT csiFlags)
>> +void rde_saf_CSI_rem_callback(SaInvocationT invocation, const SaNameT 
>> *compName,
>> +    const SaNameT *csiName, const SaAmfCSIFlagsT csiFlags)
>>   {
>>      RDE_AMF_CB *rde_amf_cb = rde_amf_get_cb();
>> -
>> -    TRACE_ENTER();
>> -
>> -    (void) saAmfResponse(rde_amf_cb->amf_hdl, invocation, SA_AIS_OK);
>> +    TRACE_ENTER2("current role: %d", 
>> static_cast<int>(rde_amf_cb->role->role()));
>> +    uint32_t rc = NCSCC_RC_SUCCESS;
>> +    SaAisErrorT error = SA_AIS_OK;
>> +    if ((rc = rde_amf_cb->role->SetRole(PCS_RDA_QUIESCED)) != 
>> NCSCC_RC_SUCCESS) {
>> +            LOG_ER("SetRole failed %u", (unsigned) rc);
>> +            error = SA_AIS_ERR_FAILED_OPERATION;
>> +    }
>> +    error = saAmfResponse(rde_amf_cb->amf_hdl, invocation, error);
>> +    TRACE_LEAVE2("rc = %u, error = %d", (unsigned) rc, (int) error);
>>   }
>>   
>>   void rde_saf_comp_terminate_callback(SaInvocationT invocation, const 
>> SaNameT *compName)
>> @@ -176,4 +195,3 @@ uint32_t rde_amf_init(RDE_AMF_CB *rde_am
>>      TRACE_LEAVE2("AMF Initialization SUCCESS......");
>>      return(rc);
>>   }
>> -
>> diff --git a/osaf/services/infrastructure/rde/rde_main.cc 
>> b/osaf/services/infrastructure/rde/rde_main.cc
>> --- a/osaf/services/infrastructure/rde/rde_main.cc
>> +++ b/osaf/services/infrastructure/rde/rde_main.cc
>> @@ -20,6 +20,14 @@
>>   #include <cstdlib>
>>   #include <poll.h>
>>   #include <libgen.h>
>> +#include <cstring>
>> +#include <cerrno>
>> +#include <unistd.h>
>> +#include <limits.h>
>> +#include <sys/types.h>
>> +#include <sys/wait.h>
>> +#include <sys/time.h>
>> +#include <sys/resource.h>
>>   
>>   #include <logtrace.h>
>>   #include <mds_papi.h>
>> @@ -28,19 +36,12 @@
>>   #include <nid_api.h>
>>   
>>   #include "rde_cb.h"
>> +#include "osaf_time.h"
>> +#include "osaf_poll.h"
>> +#include "role.h"
>>   
>>   #define RDA_MAX_CLIENTS 32
>>   
>> -static const char *role_string[] =
>> -{
>> -    "Undefined role",
>> -    "ACTIVE",
>> -    "STANDBY",
>> -    "QUIESCED",
>> -    "QUIESCING",
>> -    "Invalid"
>> -};
>> -
>>   enum {
>>      FD_TERM = 0,
>>      FD_AMF = 1,
>> @@ -49,8 +50,9 @@ enum {
>>      FD_CLIENT_START
>>   };
>>   
>> -NCS_NODE_ID rde_my_node_id;
>> -static NCS_NODE_ID peer_node_id;
>> +static void SendPeerInfoReq(MDS_DEST mds_dest);
>> +static void SendPeerInfoResp(MDS_DEST mds_dest);
>> +static void CheckForSplitBrain(const rde_msg* msg);
>>   
>>   const char *rde_msg_name[] = {
>>      "-",
>> @@ -60,12 +62,11 @@ const char *rde_msg_name[] = {
>>      "RDE_MSG_PEER_INFO_RESP(4)",
>>   };
>>   
>> -/* note: default value mentioned in $pkgsysconfdir/rde.conf, change in both 
>> places */
>> -static int discover_peer_timeout = 2000;
>>   static RDE_CONTROL_BLOCK _rde_cb;
>>   static RDE_CONTROL_BLOCK *rde_cb = &_rde_cb;
>>   static NCS_SEL_OBJ usr1_sel_obj;
>> -
>> +static NODE_ID own_node_id;
>> +static Role* role;
>>   
>>   RDE_CONTROL_BLOCK *rde_get_control_block()
>>   {
>> @@ -86,18 +87,6 @@ static void sigusr1_handler(int sig)
>>      ncs_sel_obj_ind(&usr1_sel_obj);
>>   }
>>   
>> -uint32_t rde_set_role(PCS_RDA_ROLE role)
>> -{
>> -    LOG_NO("RDE role set to %s", role_string[role]);
>> -
>> -    rde_cb->ha_role = role;
>> -
>> -    /* Send new role to all RDA client */
>> -    rde_rda_send_role(rde_cb->ha_role);
>> -
>> -    return NCSCC_RC_SUCCESS;
>> -}
>> -
>>   static int fd_to_client_ixd(int fd)
>>   {
>>      int i;
>> @@ -114,28 +103,42 @@ static int fd_to_client_ixd(int fd)
>>   
>>   static void handle_mbx_event()
>>   {
>> -    struct rde_msg *msg;
>> +    rde_msg *msg;
>>   
>>      TRACE_ENTER();
>>   
>>      msg = (struct rde_msg*)ncs_ipc_non_blk_recv(&rde_cb->mbx);
>> +    TRACE("Received %s from node 0x%x with state %s. My state is %s",
>> +          rde_msg_name[msg->type], msg->fr_node_id,
>> +          Role::to_string(msg->info.peer_info.ha_role),
>> +          Role::to_string(role->role()));
>>   
>>      switch (msg->type) {
>>      case RDE_MSG_PEER_INFO_REQ: {
>> -            struct rde_msg peer_info_req;
>> -            TRACE("Received %s", rde_msg_name[msg->type]);
>> -            peer_info_req.type = RDE_MSG_PEER_INFO_RESP;
>> -            peer_info_req.info.peer_info.ha_role = rde_cb->ha_role;
>> -            rde_mds_send(&peer_info_req, msg->fr_dest);
>> +            LOG_NO("Got peer info request from node 0x%x with role %s",
>> +                    msg->fr_node_id,
>> +                   Role::to_string(msg->info.peer_info.ha_role));
>> +                CheckForSplitBrain(msg);
>> +                SendPeerInfoResp(msg->fr_dest);
>>              break;
>>      }
>> -    case RDE_MSG_PEER_UP:
>> -            TRACE("Received %s", rde_msg_name[msg->type]);
>> -            peer_node_id = msg->fr_node_id;
>> +    case RDE_MSG_PEER_INFO_RESP: {
>> +            LOG_NO("Got peer info response from node 0x%x with role %s",
>> +                    msg->fr_node_id,
>> +                    Role::to_string(msg->info.peer_info.ha_role));
>> +                CheckForSplitBrain(msg);
>> +            role->SetPeerState(msg->info.peer_info.ha_role, 
>> msg->fr_node_id);
>>              break;
>> +    }
>> +    case RDE_MSG_PEER_UP: {
>> +            if (msg->fr_node_id != own_node_id) {
>> +                    LOG_NO("Peer up on node 0x%x", msg->fr_node_id);
>> +                    SendPeerInfoReq(msg->fr_dest);
>> +            }
>> +            break;
>> +    }
>>      case RDE_MSG_PEER_DOWN:
>> -            TRACE("Received %s", rde_msg_name[msg->type]);
>> -            peer_node_id = 0;
>> +            LOG_NO("Peer down on node 0x%x", msg->fr_node_id);
>>              break;
>>      default:
>>              LOG_ER("%s: discarding unknown message type %u", __FUNCTION__, 
>> msg->type);
>> @@ -147,159 +150,26 @@ static void handle_mbx_event()
>>      TRACE_LEAVE();
>>   }
>>   
>> -static uint32_t discover_peer(int mbx_fd)
>> -{
>> -    struct pollfd fds[1];
>> -    struct rde_msg *msg;
>> -    int ret;
>> -    uint32_t rc = NCSCC_RC_SUCCESS;
>> -
>> -    TRACE_ENTER();
>> -
>> -    fds[0].fd = mbx_fd;
>> -    fds[0].events = POLLIN;
>> -
>> -    while (1) {
>> -            ret = poll(fds, 1, discover_peer_timeout);
>> -
>> -            if (ret == -1) {
>> -                    if (errno == EINTR)
>> -                            continue;
>> -                    
>> -                    LOG_ER("poll failed - %s", strerror(errno));
>> -                    rc = NCSCC_RC_FAILURE;
>> -                    goto done;
>> -            }
>> -
>> -            if (ret == 0) {
>> -                    TRACE("Peer discovery timeout");
>> -                    goto done;
>> -            }
>> -
>> -            if (ret == 1) {
>> -                    msg = (struct 
>> rde_msg*)ncs_ipc_non_blk_recv(&rde_cb->mbx);
>> -
>> -                    switch (msg->type) {
>> -                    case RDE_MSG_PEER_UP: {
>> -                            struct rde_msg peer_info_req;
>> -
>> -                            peer_node_id = msg->fr_node_id;
>> -                            TRACE("Received %s", rde_msg_name[msg->type]);
>> -
>> -                            /* Send request for peer information */
>> -                            peer_info_req.type = RDE_MSG_PEER_INFO_REQ;
>> -                            peer_info_req.info.peer_info.ha_role = 
>> rde_cb->ha_role;
>> -                            rde_mds_send(&peer_info_req, msg->fr_dest);
>> -                            goto done;
>> -                    }
>> -                    default:
>> -                            LOG_ER("%s: discarding unknown message type 
>> %u", __FUNCTION__, msg->type);
>> -                            break;
>> -                    }
>> -
>> -            } else
>> -                    assert(0);
>> -    }
>> -done:
>> -    TRACE_LEAVE();
>> -    return rc;
>> +static void CheckForSplitBrain(const rde_msg* msg) {
>> +  PCS_RDA_ROLE own_role = role->role();
>> +  PCS_RDA_ROLE other_role = msg->info.peer_info.ha_role;
>> +  if (own_role == PCS_RDA_ACTIVE && other_role == PCS_RDA_ACTIVE) {
>> +    opensaf_reboot(0, NULL, "Split-brain detected");
>> +  }
>>   }
>>   
>> -static uint32_t determine_role(int mbx_fd)
>> -{
>> -    struct pollfd fds[1];
>> -    struct rde_msg *msg;
>> -    int ret;
>> -    uint32_t rc = NCSCC_RC_SUCCESS;
>> +static void SendPeerInfoReq(MDS_DEST mds_dest) {
>> +    rde_msg peer_info_req;
>> +    peer_info_req.type = RDE_MSG_PEER_INFO_REQ;
>> +    peer_info_req.info.peer_info.ha_role = role->role();
>> +    rde_mds_send(&peer_info_req, mds_dest);
>> +}
>>   
>> -    TRACE_ENTER();
>> -
>> -    if (peer_node_id == 0) {
>> -            LOG_NO("No peer available => Setting Active role for this 
>> node");
>> -            rde_cb->ha_role = PCS_RDA_ACTIVE;
>> -            goto done;
>> -    }
>> -
>> -    fds[0].fd = mbx_fd;
>> -    fds[0].events = POLLIN;
>> -
>> -    while (1) {
>> -            ret = poll(fds, 1, -1);
>> -
>> -            if (ret == -1) {
>> -                    if (errno == EINTR)
>> -                            continue;
>> -                    
>> -                    LOG_ER("poll failed - %s", strerror(errno));
>> -                    rc = NCSCC_RC_FAILURE;
>> -                    goto done;
>> -            }
>> -
>> -            assert(ret == 1);
>> -
>> -            msg = (struct rde_msg*)ncs_ipc_non_blk_recv(&rde_cb->mbx);
>> -
>> -            switch (msg->type) {
>> -            case RDE_MSG_PEER_UP:
>> -                    TRACE("Received straggler up msg, ignoring");
>> -                    assert(peer_node_id);
>> -                    break;
>> -            case RDE_MSG_PEER_DOWN:
>> -                    TRACE("Received %s", rde_msg_name[msg->type]);
>> -                    LOG_NO("peer rde@%x down waiting for response => 
>> Setting Active role", peer_node_id);
>> -                    rde_cb->ha_role = PCS_RDA_ACTIVE;
>> -                    peer_node_id = 0;
>> -                    goto done;
>> -            case RDE_MSG_PEER_INFO_REQ: {
>> -                    struct rde_msg peer_info_req;
>> -                    TRACE("Received %s", rde_msg_name[msg->type]);
>> -                    peer_info_req.type = RDE_MSG_PEER_INFO_RESP;
>> -                    peer_info_req.info.peer_info.ha_role = rde_cb->ha_role;
>> -                    rde_mds_send(&peer_info_req, msg->fr_dest);
>> -                    break;
>> -            }
>> -            case RDE_MSG_PEER_INFO_RESP:
>> -                    TRACE("Received %s", rde_msg_name[msg->type]);
>> -                    switch (msg->info.peer_info.ha_role) {
>> -                    case PCS_RDA_UNDEFINED:
>> -                            TRACE("my=%x, peer=%x", rde_my_node_id, 
>> msg->fr_node_id);
>> -                            if (rde_my_node_id < msg->fr_node_id) {
>> -                                    rde_cb->ha_role = PCS_RDA_ACTIVE;
>> -                                    LOG_NO("Peer rde@%x has no state, my 
>> nodeid is less => Setting Active role", msg->fr_node_id);
>> -                            } else if (rde_my_node_id > msg->fr_node_id) {
>> -                                    rde_cb->ha_role = PCS_RDA_STANDBY;
>> -                                    LOG_NO("Peer rde@%x has no state, my 
>> nodeid is greater => Setting Standby role", msg->fr_node_id);
>> -                            } else
>> -                                    assert(0);
>> -                            goto done;
>> -                    case PCS_RDA_ACTIVE:
>> -                            rde_cb->ha_role = PCS_RDA_STANDBY;
>> -                            LOG_NO("Peer rde@%x has active state => 
>> Assigning Standby role to this node", msg->fr_node_id);
>> -                            goto done;
>> -                    case PCS_RDA_STANDBY:
>> -                            LOG_NO("Peer rde@%x has standby state => 
>> possible fail over, waiting...", msg->fr_node_id);
>> -                            sleep(1);
>> -                            
>> -                            /* Send request for peer information */
>> -                            struct rde_msg peer_info_req;
>> -                            peer_info_req.type = RDE_MSG_PEER_INFO_REQ;
>> -                            peer_info_req.info.peer_info.ha_role = 
>> rde_cb->ha_role;
>> -                            rde_mds_send(&peer_info_req, msg->fr_dest);
>> -                            break;
>> -                    default:
>> -                            LOG_NO("rde@%x has unsupported state, panic!", 
>> msg->fr_node_id);
>> -                            assert(0);
>> -                    }
>> -                    break;
>> -            default:
>> -                    LOG_ER("%s: discarding unknown message type %u", 
>> __FUNCTION__, msg->type);
>> -                    break;
>> -            }
>> -    } /* while (1) */
>> -
>> -done:
>> -    TRACE_LEAVE();
>> -    return rc;
>> +static void SendPeerInfoResp(MDS_DEST mds_dest) {
>> +    rde_msg peer_info_req;
>> +    peer_info_req.type = RDE_MSG_PEER_INFO_RESP;
>> +    peer_info_req.info.peer_info.ha_role = role->role();
>> +    rde_mds_send(&peer_info_req, mds_dest);
>>   }
>>   
>>   /**
>> @@ -311,22 +181,26 @@ static int initialize_rde()
>>   {
>>      RDE_RDA_CB *rde_rda_cb = &rde_cb->rde_rda_cb;
>>      int rc = NCSCC_RC_FAILURE;
>> -    char *val;
>> +
>> +    if ((rc = rde_rda_open(RDE_RDA_SOCK_NAME, rde_rda_cb)) !=
>> +            NCSCC_RC_SUCCESS) {
>> +            goto init_failed;
>> +    }
>>   
>>      /* Determine how this process was started, by NID or AMF */
>>      if (getenv("SA_AMF_COMPONENT_NAME") == nullptr)
>>              rde_cb->rde_amf_cb.nid_started = true;
>>   
>> -    if ((val = getenv("RDE_DISCOVER_PEER_TIMEOUT")) != nullptr)
>> -            discover_peer_timeout = strtoul(val, nullptr, 0);
>> -
>> -    TRACE("discover_peer_timeout=%d", discover_peer_timeout);
>> -
>>      if ((rc = ncs_core_agents_startup()) != NCSCC_RC_SUCCESS) {
>>              LOG_ER("ncs_core_agents_startup FAILED");
>>              goto init_failed;
>>      }
>>   
>> +    own_node_id = ncs_get_node_id();
>> +        role = new Role(own_node_id);
>> +    rde_rda_cb->role = role;
>> +    rde_cb->rde_amf_cb.role = role;
>> +
>>      if (rde_cb->rde_amf_cb.nid_started &&
>>              (rc = ncs_sel_obj_create(&usr1_sel_obj)) != NCSCC_RC_SUCCESS) {
>>              LOG_ER("ncs_sel_obj_create FAILED");
>> @@ -343,20 +217,12 @@ static int initialize_rde()
>>              goto init_failed;
>>      }
>>   
>> -    rde_my_node_id = ncs_get_node_id();
>> -
>> -    if ((rc = rde_rda_open(RDE_RDA_SOCK_NAME, rde_rda_cb)) != 
>> NCSCC_RC_SUCCESS)
>> -            goto init_failed;
>> -
>>      if (rde_cb->rde_amf_cb.nid_started &&
>>              signal(SIGUSR1, sigusr1_handler) == SIG_ERR) {
>>              LOG_ER("signal USR1 FAILED: %s", strerror(errno));
>>              goto init_failed;
>>      }
>>   
>> -    if (rde_mds_register(rde_cb) != NCSCC_RC_SUCCESS)
>> -            goto init_failed;
>> -
>>      rc = NCSCC_RC_SUCCESS;
>>   
>>    init_failed:
>> @@ -365,14 +231,15 @@ static int initialize_rde()
>>   
>>   int main(int argc, char *argv[])
>>   {
>> -    uint32_t rc;
>> -    nfds_t nfds = 4;
>> +    nfds_t nfds = FD_CLIENT_START;
>>      struct pollfd fds[nfds + RDA_MAX_CLIENTS];
>> -    int i, ret;
>> +    int ret;
>>      NCS_SEL_OBJ mbx_sel_obj;
>>      RDE_RDA_CB *rde_rda_cb = &rde_cb->rde_rda_cb;
>>      int term_fd;
>>   
>> +    opensaf_reboot_prepare();
>> +
>>      daemonize(argc, argv);
>>   
>>      if (initialize_rde() != NCSCC_RC_SUCCESS)
>> @@ -380,24 +247,12 @@ int main(int argc, char *argv[])
>>   
>>      mbx_sel_obj = ncs_ipc_get_sel_obj(&rde_cb->mbx);
>>   
>> -    if ((rc = discover_peer(mbx_sel_obj.rmv_obj)) == NCSCC_RC_FAILURE)
>> -            goto init_failed;
>> -
>> -    if ((rc = determine_role(mbx_sel_obj.rmv_obj)) == NCSCC_RC_FAILURE)
>> -            goto init_failed;
>> -
>>      /* If AMF started register immediately */
>>      if (!rde_cb->rde_amf_cb.nid_started &&
>> -            (rc = rde_amf_init(&rde_cb->rde_amf_cb)) != NCSCC_RC_SUCCESS) {
>> +            rde_amf_init(&rde_cb->rde_amf_cb) != NCSCC_RC_SUCCESS) {
>>              goto init_failed;
>>      }
>>   
>> -    if (rde_cb->rde_amf_cb.nid_started &&
>> -            nid_notify("RDE", rc, nullptr) != NCSCC_RC_SUCCESS) {
>> -            LOG_ER("nid_notify failed");
>> -            goto done;
>> -    }
>> -
>>      daemon_sigterm_install(&term_fd);
>>   
>>      fds[FD_TERM].fd = term_fd;
>> @@ -416,8 +271,27 @@ int main(int argc, char *argv[])
>>      fds[FD_RDA_SERVER].fd = rde_cb->rde_rda_cb.fd;
>>      fds[FD_RDA_SERVER].events = POLLIN;
>>   
>> +    if (rde_cb->rde_amf_cb.nid_started) {
>> +            TRACE("NID started");
>> +            if (nid_notify("RDE", NCSCC_RC_SUCCESS, nullptr) != 
>> NCSCC_RC_SUCCESS) {
>> +                    LOG_ER("nid_notify failed");
>> +                    goto init_failed;
>> +            }
>> +    } else {
>> +            TRACE("Not NID started");
>> +    }
>> +
>>      while (1) {
>> -            ret = poll(fds, nfds, -1);
>> +            nfds_t fds_to_poll = role->role() != PCS_RDA_UNDEFINED ?
>> +                    nfds : FD_CLIENT_START;
>> +            ret = osaf_poll(fds, fds_to_poll, 0);
>> +            if (ret == 0) {
>> +                    timespec ts;
>> +                    timespec* timeout = role->Poll(&ts);
>> +                    fds_to_poll = role->role() != PCS_RDA_UNDEFINED ?
>> +                            nfds : FD_CLIENT_START;
>> +                    ret = osaf_ppoll(fds, fds_to_poll, timeout, nullptr);
>> +            }
>>   
>>              if (ret == -1) {
>>                      if (errno == EINTR)
>> @@ -475,7 +349,9 @@ int main(int argc, char *argv[])
>>                      TRACE("accepted new client, fd=%d, idx=%d, nfds=%lu", 
>> newsockfd, rde_rda_cb->client_count, nfds);
>>              }
>>   
>> -            for (i = FD_CLIENT_START; static_cast<nfds_t>(i) < nfds; i++) {
>> +            for (nfds_t i = FD_CLIENT_START;
>> +                    role->role() != PCS_RDA_UNDEFINED && i < fds_to_poll;
>> +                    i++) {
>>                      if (fds[i].revents & POLLIN) {
>>                              int client_disconnected = 0;
>>                              TRACE("received msg on fd %u", fds[i].fd);
>> @@ -483,9 +359,9 @@ int main(int argc, char *argv[])
>>                              if (client_disconnected) {
>>                                      /* reinitialize the fd array & nfds */
>>                                      nfds = FD_CLIENT_START;
>> -                                    for (i = 0; i < 
>> rde_rda_cb->client_count; i++, nfds++) {
>> -                                            fds[i + FD_CLIENT_START].fd = 
>> rde_rda_cb->clients[i].fd;
>> -                                            fds[i + FD_CLIENT_START].events 
>> = POLLIN;
>> +                                    for (int j = 0; j < 
>> rde_rda_cb->client_count; j++, nfds++) {
>> +                                            fds[j + FD_CLIENT_START].fd = 
>> rde_rda_cb->clients[j].fd;
>> +                                            fds[j + FD_CLIENT_START].events 
>> = POLLIN;
>>                                      }
>>                                      TRACE("client disconnected, fd array 
>> reinitialized, nfds=%lu", nfds);
>>                                      break;
>> @@ -498,7 +374,6 @@ int main(int argc, char *argv[])
>>      if (rde_cb->rde_amf_cb.nid_started &&
>>              nid_notify("RDE", NCSCC_RC_FAILURE, nullptr) != 
>> NCSCC_RC_SUCCESS) {
>>              LOG_ER("nid_notify failed");
>> -            rc = NCSCC_RC_FAILURE;
>>      }
>>   
>>    done:
>> diff --git a/osaf/services/infrastructure/rde/rde_mds.cc 
>> b/osaf/services/infrastructure/rde/rde_mds.cc
>> --- a/osaf/services/infrastructure/rde/rde_mds.cc
>> +++ b/osaf/services/infrastructure/rde/rde_mds.cc
>> @@ -15,6 +15,8 @@
>>    *
>>    */
>>   
>> +#include <thread>
>> +#include <chrono>
>>   #include <logtrace.h>
>>   #include <mds_papi.h>
>>   #include <ncsencdec_pub.h>
>> @@ -23,7 +25,6 @@
>>   
>>   #define RDE_MDS_PVT_SUBPART_VERSION 1
>>   
>> -static MDS_DEST peer_dest;
>>   static MDS_HDL mds_hdl;
>>   
>>   static uint32_t msg_encode(MDS_CALLBACK_ENC_INFO *enc_info)
>> @@ -145,12 +146,6 @@ static uint32_t mds_callback(struct ncsm
>>      case MDS_CALLBACK_DEC_FLAT:
>>              break;
>>      case MDS_CALLBACK_RECEIVE:
>> -            if (!peer_dest) {
>> -                    /* Sometimes a message from peer is received before 
>> MDS_UP, simulate MDS_UP */
>> -                    TRACE("generating up msg from rec event!");
>> -                    rc = mbx_send(RDE_MSG_PEER_UP, 
>> info->info.receive.i_fr_dest, info->info.receive.i_node_id);
>> -            }
>> -
>>              msg = (struct rde_msg*)info->info.receive.i_msg;
>>              msg->fr_dest = info->info.receive.i_fr_dest;
>>              msg->fr_node_id = info->info.receive.i_node_id;
>> @@ -162,18 +157,13 @@ static uint32_t mds_callback(struct ncsm
>>              }
>>              break;
>>      case MDS_CALLBACK_SVC_EVENT:
>> -            if (rde_my_node_id == info->info.svc_evt.i_node_id)
>> -                    goto done;
>> -
>>              if (info->info.svc_evt.i_change == NCSMDS_DOWN) {
>>                      TRACE("MDS DOWN dest: %" PRIx64 ", node ID: %x, svc_id: 
>> %d",
>>                              info->info.svc_evt.i_dest, 
>> info->info.svc_evt.i_node_id, info->info.svc_evt.i_svc_id);
>> -                    peer_dest = 0;
>>                      rc = mbx_send(RDE_MSG_PEER_DOWN, 
>> info->info.svc_evt.i_dest, info->info.svc_evt.i_node_id);
>>              } else if (info->info.svc_evt.i_change == NCSMDS_UP) {
>>                      TRACE("MDS UP dest: %" PRIx64 ", node ID: %x, svc_id: 
>> %d",
>>                              info->info.svc_evt.i_dest, 
>> info->info.svc_evt.i_node_id, info->info.svc_evt.i_svc_id);
>> -                    peer_dest = info->info.svc_evt.i_dest;
>>                      rc = mbx_send(RDE_MSG_PEER_UP, 
>> info->info.svc_evt.i_dest, info->info.svc_evt.i_node_id);
>>              } else {
>>                      TRACE("MDS %u dest: %" PRIx64 ", node ID: %x, svc_id: 
>> %d", info->info.svc_evt.i_change,
>> @@ -192,7 +182,7 @@ done:
>>      return rc;
>>   }
>>   
>> -uint32_t rde_mds_register(RDE_CONTROL_BLOCK *cb)
>> +uint32_t rde_mds_register()
>>   {
>>      NCSADA_INFO ada_info;
>>      NCSMDS_INFO svc_info;
>> @@ -243,28 +233,56 @@ uint32_t rde_mds_register(RDE_CONTROL_BL
>>      return NCSCC_RC_SUCCESS;
>>   }
>>   
>> +uint32_t rde_mds_unregister()
>> +{
>> +    NCSMDS_INFO mds_info;
>> +    uint32_t rc = NCSCC_RC_SUCCESS;
>> +    TRACE_ENTER();
>> +
>> +    /* Un-install your service into MDS.
>> +       No need to cancel the services that are subscribed */
>> +    memset(&mds_info, 0, sizeof(NCSMDS_INFO));
>> +
>> +    mds_info.i_mds_hdl = mds_hdl;
>> +    mds_info.i_svc_id = NCSMDS_SVC_ID_RDE;
>> +    mds_info.i_op = MDS_UNINSTALL;
>> +
>> +    rc = ncsmds_api(&mds_info);
>> +    if (rc != NCSCC_RC_SUCCESS) {
>> +            LOG_WA("MDS Unregister Failed");
>> +    }
>> +
>> +    TRACE_LEAVE2("retval = %u", rc);
>> +    return rc;
>> +}
>> +
>>   uint32_t rde_mds_send(struct rde_msg *msg, MDS_DEST to_dest)
>>   {
>>      NCSMDS_INFO info;
>>      uint32_t rc;
>>   
>> -    TRACE("Sending %s to %" PRIx64, rde_msg_name[msg->type], to_dest);
>> -    memset(&info, 0, sizeof(info));
>> +        for (int i = 0; i != 3; ++i) {
>> +            TRACE("Sending %s to %" PRIx64, rde_msg_name[msg->type], 
>> to_dest);
>> +            memset(&info, 0, sizeof(info));
>>   
>> -    info.i_mds_hdl = mds_hdl;
>> -    info.i_op = MDS_SEND;
>> -    info.i_svc_id = NCSMDS_SVC_ID_RDE;
>> -    
>> -    info.info.svc_send.i_msg = msg;
>> -    info.info.svc_send.i_priority = MDS_SEND_PRIORITY_MEDIUM;
>> -    info.info.svc_send.i_sendtype = MDS_SENDTYPE_SND;
>> -    info.info.svc_send.i_to_svc = NCSMDS_SVC_ID_RDE;
>> -    info.info.svc_send.info.snd.i_to_dest = to_dest;
>> -    
>> -    rc = ncsmds_api(&info);
>> -    if (NCSCC_RC_FAILURE == rc)
>> -            LOG_ER("rde async MDS send FAILED");
>> +            info.i_mds_hdl = mds_hdl;
>> +            info.i_op = MDS_SEND;
>> +            info.i_svc_id = NCSMDS_SVC_ID_RDE;
>> +
>> +            info.info.svc_send.i_msg = msg;
>> +            info.info.svc_send.i_priority = MDS_SEND_PRIORITY_MEDIUM;
>> +            info.info.svc_send.i_sendtype = MDS_SENDTYPE_SND;
>> +            info.info.svc_send.i_to_svc = NCSMDS_SVC_ID_RDE;
>> +            info.info.svc_send.info.snd.i_to_dest = to_dest;
>> +
>> +            rc = ncsmds_api(&info);
>> +            if (NCSCC_RC_FAILURE == rc) {
>> +                    LOG_ER("Failed to send %s to %" PRIx64, 
>> rde_msg_name[msg->type], to_dest);
>> +                    
>> std::this_thread::sleep_for(std::chrono::milliseconds(100));
>> +            } else {
>> +                    break;
>> +            }
>> +    }
>>   
>>      return rc;
>>   }
>> -
>> diff --git a/osaf/services/infrastructure/rde/rde_rda.cc 
>> b/osaf/services/infrastructure/rde/rde_rda.cc
>> --- a/osaf/services/infrastructure/rde/rde_rda.cc
>> +++ b/osaf/services/infrastructure/rde/rde_rda.cc
>> @@ -38,6 +38,7 @@
>>   #include <logtrace.h>
>>   
>>   #include "rde_cb.h"
>> +#include "role.h"
>>   
>>   
>> /*****************************************************************************
>>   
>> @@ -261,11 +262,9 @@ static uint32_t rde_rda_read_msg(int fd,
>>   static uint32_t rde_rda_process_get_role(RDE_RDA_CB *rde_rda_cb, int index)
>>   {
>>      char msg[64] = { 0 };
>> -    RDE_CONTROL_BLOCK *rde_cb = rde_get_control_block();
>> -
>>      TRACE_ENTER();
>>   
>> -    sprintf(msg, "%d %d", RDE_RDA_GET_ROLE_RES, rde_cb->ha_role);
>> +    sprintf(msg, "%d %d", RDE_RDA_GET_ROLE_RES, 
>> static_cast<int>(rde_rda_cb->role->role()));
>>      if (rde_rda_write_msg(rde_rda_cb->clients[index].fd, msg) != 
>> NCSCC_RC_SUCCESS) {
>>              return NCSCC_RC_FAILURE;
>>      }
>> @@ -299,7 +298,7 @@ static uint32_t rde_rda_process_set_role
>>   
>>      TRACE_ENTER();
>>   
>> -    if (rde_set_role(static_cast<PCS_RDA_ROLE>(role)) != NCSCC_RC_SUCCESS)
>> +    if (rde_rda_cb->role->SetRole(static_cast<PCS_RDA_ROLE>(role)) != 
>> NCSCC_RC_SUCCESS)
>>              sprintf(msg, "%d", RDE_RDA_SET_ROLE_NACK);
>>      else
>>              sprintf(msg, "%d", RDE_RDA_SET_ROLE_ACK);
>> diff --git a/osaf/services/infrastructure/rde/role.cc 
>> b/osaf/services/infrastructure/rde/role.cc
>> new file mode 100644
>> --- /dev/null
>> +++ b/osaf/services/infrastructure/rde/role.cc
>> @@ -0,0 +1,144 @@
>> +/*      -*- OpenSAF  -*-
>> + *
>> + * (C) Copyright 2016 The OpenSAF Foundation
>> + *
>> + * This program is distributed in the hope that it will be useful, but
>> + * WITHOUT ANY WARRANTY; without even the implied warranty of 
>> MERCHANTABILITY
>> + * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
>> + * under the GNU Lesser General Public License Version 2.1, February 1999.
>> + * The complete license can be accessed from the following location:
>> + *http://opensource.org/licenses/lgpl-license.php
>> + * See the Copying file included with the OpenSAF distribution for full
>> + * licensing terms.
>> + *
>> + * Author(s): Ericsson AB
>> + *
>> + */
>> +
>> +#ifndef _GNU_SOURCE
>> +#define _GNU_SOURCE
>> +#endif
>> +#include "role.h"
>> +#include <stdint.h>
>> +#include <chrono>
>> +#include "configmake.h"
>> +#include "logtrace.h"
>> +#include "ncs_main_papi.h"
>> +#include "base/getenv.h"
>> +#include "base/process.h"
>> +#include "base/time.h"
>> +#include "rde_cb.h"
>> +
>> +const char *const Role::role_names_[] = {
>> +  "Undefined",
>> +  "ACTIVE",
>> +  "STANDBY",
>> +  "QUIESCED",
>> +  "QUIESCING",
>> +  "Invalid"
>> +};
>> +
>> +const char *const Role::pre_active_script_ = PKGLIBDIR "/opensaf_sc_active";
>> +
>> +PCS_RDA_ROLE Role::role() const {
>> +  return role_;
>> +}
>> +
>> +const char* Role::to_string(PCS_RDA_ROLE role) {
>> +  return role >= 0 && role < sizeof(role_names_) / sizeof(role_names_[0]) ?
>> +                             role_names_[role] : role_names_[0];
>> +}
>> +
>> +Role::Role(NODE_ID own_node_id)
>> +    : role_{PCS_RDA_QUIESCED},
>> +      own_node_id_{own_node_id},
>> +      proc_{new base::Process()},
>> +      election_end_time_{},
>> +      discover_peer_timeout_{base::GetEnv("RDE_DISCOVER_PEER_TIMEOUT",
>> +                                          kDefaultDiscoverPeerTimeout)},
>> +      
>> pre_active_script_timeout_{base::GetEnv("RDE_PRE_ACTIVE_SCRIPT_TIMEOUT",
>> +                                          kDefaultPreActiveScriptTimeout)} {
>> +}
>> +
>> +timespec* Role::Poll(timespec* ts) {
>> +  timespec* timeout = nullptr;
>> +  if (role_ == PCS_RDA_UNDEFINED) {
>> +    timespec now = base::ReadMonotonicClock();
>> +    if (election_end_time_ >= now) {
>> +      *ts = election_end_time_ - now;
>> +      timeout = ts;
>> +    } else {
>> +      ExecutePreActiveScript();
>> +      LOG_NO("Switched to ACTIVE from %s", to_string(role()));
>> +      role_ = PCS_RDA_ACTIVE;
>> +      rde_rda_send_role(role_);
>> +    }
>> +  }
>> +  return timeout;
>> +}
>> +
>> +void Role::ExecutePreActiveScript() {
>> +  int argc = 1;
>> +  char* argv[] = {
>> +    const_cast<char*>(pre_active_script_),
>> +    nullptr
>> +  };
>> +  proc_->Execute(argc, argv,
>> +                 std::chrono::milliseconds(pre_active_script_timeout_));
>> +}
>> +
>> +uint32_t Role::SetRole(PCS_RDA_ROLE new_role) {
>> +  PCS_RDA_ROLE old_role = role_;
>> +  if (new_role == PCS_RDA_ACTIVE &&
>> +      (old_role == PCS_RDA_UNDEFINED || old_role == PCS_RDA_QUIESCED)) {
>> +    LOG_NO("Requesting ACTIVE role");
>> +    new_role = PCS_RDA_UNDEFINED;
>> +  }
>> +  if (new_role != old_role) {
>> +    LOG_NO("RDE role set to %s", to_string(new_role));
>> +    if (new_role == PCS_RDA_ACTIVE) ExecutePreActiveScript();
>> +    role_ = new_role;
>> +    if (new_role == PCS_RDA_UNDEFINED) ResetElectionTimer();
>> +    else rde_rda_send_role(new_role);
>> +  }
>> +  return UpdateMdsRegistration(new_role, old_role);
>> +}
>> +
>> +void Role::ResetElectionTimer() {
>> +  election_end_time_ = base::ReadMonotonicClock() +
>> +      base::MillisToTimespec(discover_peer_timeout_);
>> +}
>> +
>> +uint32_t Role::UpdateMdsRegistration(PCS_RDA_ROLE new_role,
>> +                                     PCS_RDA_ROLE old_role) {
>> +  uint32_t rc = NCSCC_RC_SUCCESS;
>> +  bool mds_registered_before = old_role != PCS_RDA_QUIESCED;
>> +  bool mds_registered_after = new_role != PCS_RDA_QUIESCED;
>> +  if (mds_registered_after != mds_registered_before) {
>> +    if (mds_registered_after) {
>> +      if (rde_mds_register() != NCSCC_RC_SUCCESS) {
>> +        LOG_ER("rde_mds_register() failed");
>> +        rc = NCSCC_RC_FAILURE;
>> +      }
>> +    } else {
>> +      if (rde_mds_unregister() != NCSCC_RC_SUCCESS) {
>> +        LOG_ER("rde_mds_unregister() failed");
>> +        rc = NCSCC_RC_FAILURE;
>> +      }
>> +    }
>> +  }
>> +  return rc;
>> +}
>> +
>> +void Role::SetPeerState(PCS_RDA_ROLE node_role, NODE_ID node_id) {
>> +  if (role() == PCS_RDA_UNDEFINED) {
>> +    if (node_role == PCS_RDA_ACTIVE ||
>> +        node_role == PCS_RDA_STANDBY ||
>> +        (node_role == PCS_RDA_UNDEFINED && node_id < own_node_id_)) {
>> +    SetRole(PCS_RDA_QUIESCED);
>> +    LOG_NO("Giving up election against 0x%" PRIx32 " with role %s. "
>> +           "My role is now %s", node_id, to_string(node_role),
>> +           to_string(role()));
>> +    }
>> +  }
>> +}
>> diff --git a/scripts/opensaf_sc_active b/scripts/opensaf_sc_active
>> new file mode 100644
>> --- /dev/null
>> +++ b/scripts/opensaf_sc_active
>> @@ -0,0 +1,30 @@
>> +#!/bin/sh
>> +#
>> +#      -*- OpenSAF  -*-
>> +#
>> +# (C) Copyright 2015 The OpenSAF Foundation
>> +#
>> +# This program is distributed in the hope that it will be useful, but
>> +# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
>> +# or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
>> +# under the GNU Lesser General Public License Version 2.1, February 1999.
>> +# The complete license can be accessed from the following location:
>> +#http://opensource.org/licenses/lgpl-license.php
>> +# See the Copying file included with the OpenSAF distribution for full
>> +# licensing terms.
>> +#
>> +# Author(s): Ericsson AB
>> +#
>> +
>> +. /etc/opensaf/osafdir.conf
>> +
>> +export PATH=$sbindir:$bindir:$PATH
>> +export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH
>> +
>> +# This script will be executed before a node becomes the active SC.
>> +#
>> +# NOTE: The SC will wait for this script to exit before taking on the active
>> +# role, so therefore this script must exit quickly. The execution time is
>> +# supervised and if it exceeds the RDE_PRE_ACTIVE_SCRIPT_TIMEOUT time 
>> configured
>> +# in rde.conf, the script will be killed.
>> +exit 0
>

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to