from:"Ravi Sekhar Reddy Konda"

Re: [devel] [PATCH 1/1] plmd: fix crash when saPlmReadinessTrack is called in error [#2919]

2018-08-28 Thread Ravi Sekhar Reddy Konda

Hi Alex,

Ack for the patch, review only

Regards,
Ravi
- Original Message -
From: ajo...@rbbn.com
To: mathi.np@gmail.com, ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net, ajo...@rbbn.com
Sent: Tuesday, August 28, 2018 2:09:19 AM GMT +05:30 Chennai, Kolkata, Mumbai, 
New Delhi
Subject: [PATCH 1/1] plmd: fix crash when saPlmReadinessTrack is called in 
error [#2919]

plmd crashes when saPlmReadinessTrack is called with entities pointer set,
but smaller than what plmd would return.

In this case plmd is returning ERR_NO_SPACE, which is correct, but it is
setting numberOfEntities without setting the entities pointer. This causes
the edu routines to crash.

It is not necessary to set numberOfEntities since we are returning an error
code.
---
 src/plm/plmd/plms_proc.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/plm/plmd/plms_proc.c b/src/plm/plmd/plms_proc.c
index aa93e5942..2b4445394 100644
--- a/src/plm/plmd/plms_proc.c
+++ b/src/plm/plmd/plms_proc.c
@@ -879,6 +879,12 @@ void plms_process_trk_start_evt(PLMS_EVT *plm_evt)
no_of_ent_recd = no_of_ent_in_grp;
}
 
+   if (no_of_ent_in_grp != no_of_ent_recd) {
+   LOG_ER("PLMS: no of entities sent is != entities in grp");
+   rc = SA_AIS_ERR_NO_SPACE;
+   goto send_resp;
+   }
+
plm_resp.res_evt.entities = (SaPlmReadinessTrackedEntitiesT *)malloc(
sizeof(SaPlmReadinessTrackedEntitiesT));
 
@@ -889,12 +895,6 @@ void plms_process_trk_start_evt(PLMS_EVT *plm_evt)
strerror(errno));
goto send_resp;
}
-   if (no_of_ent_in_grp != no_of_ent_recd) {
-   LOG_ER("PLMS: no of entities sent is != entities in grp");
-   plm_resp.res_evt.entities->numberOfEntities = no_of_ent_in_grp;
-   rc = SA_AIS_ERR_NO_SPACE;
-   goto send_resp;
-   }
 
if (m_PLM_IS_SA_TRACK_CHANGES_SET(track_flags) ||
m_PLM_IS_SA_TRACK_CHANGES_ONLY_SET(track_flags)) {
-- 
2.14.4


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] imm: correct data size mismatches in pbe code [#2770]

2018-05-31 Thread Ravi Sekhar Reddy Konda

Hi VuMinh,

Ack for the patch

Regards,
Ravi

-Original Message-
From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] 
Sent: Thursday, January 25, 2018 5:22 PM
To: zoran.milinko...@ericsson.com; ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: [PATCH 1/1] imm: correct data size mismatches in pbe code [#2770]

Object ID and Class ID are `unsigned int` data type, but they are not used 
consistently through the codes. Some places use `int` type.

This patch corrects these places.
---
 src/imm/common/immpbe_dump.cc| 49 
 src/imm/common/immpbe_dump.h |  6 ++---
 src/imm/immpbed/immpbe_daemon.cc |  5 
 src/imm/tools/imm_dumper.cc  |  5 ++--
 4 files changed, 31 insertions(+), 34 deletions(-)

diff --git a/src/imm/common/immpbe_dump.cc b/src/imm/common/immpbe_dump.cc 
index 11af674..9f21428 100644
--- a/src/imm/common/immpbe_dump.cc
+++ b/src/imm/common/immpbe_dump.cc
@@ -350,7 +350,7 @@ static void typeToPBE(SaImmAttrDefinitionT_2 *p, void 
*dbHandle,  }
 
 static void valuesToPBE(const SaImmAttrValuesT_2 *p, SaImmAttrFlagsT attrFlags,
-int objId, void *db_handle) {
+unsigned objId, void *db_handle) {
   int rc = 0;
   bool badfile = false;
   sqlite3 *dbHandle = (sqlite3 *)db_handle; @@ -1369,9 +1369,9 @@ void 
objectModifyDiscardAllValuesOfAttrToPBE(
 
   int rc = 0;
   char *zErr = NULL;
-  int object_id;
+  unsigned object_id;
   std::string object_id_str;
-  int class_id;
+  unsigned class_id;
   SaImmValueTypeT attr_type;
   SaImmAttrFlagsT attr_flags;
   unsigned int rowsModified = 0;
@@ -1591,8 +1591,8 @@ void objectModifyDiscardMatchingValuesOfAttrToPBE(
 
   int rc = 0;
   std::string object_id_str;
-  int object_id;
-  int class_id;
+  unsigned object_id;
+  unsigned class_id;
   SaImmValueTypeT attr_type;
   SaImmAttrFlagsT attr_flags;
   unsigned int rowsModified = 0;
@@ -1847,8 +1847,8 @@ void objectModifyAddValuesOfAttrToPBE(void *db_handle, 
std::string objName,
 
   int rc = 0;
   std::string object_id_str;
-  int object_id;
-  int class_id;
+  unsigned object_id;
+  unsigned class_id;
   SaImmValueTypeT attr_type;
   SaImmAttrFlagsT attr_flags;
   unsigned int rowsModified = 0;
@@ -2105,7 +2105,7 @@ bailout:
   exit(1);
 }
 
-int dumpInstancesOfClassToPBE(SaImmHandleT immHandle, ClassMap *classIdMap,
+unsigned dumpInstancesOfClassToPBE(SaImmHandleT immHandle, ClassMap 
+*classIdMap,
   std::string className, unsigned int *objIdCount,
   void *db_handle) {
   unsigned int obj_count = 0;
@@ -2175,7 +2175,8 @@ int dumpInstancesOfClassToPBE(SaImmHandleT immHandle, 
ClassMap *classIdMap,
   return obj_count;
 bailout:
   sqlite3_close(dbHandle);
-  return (-1);
+  LOG_ER("dumpInstncesOfClassesToPBE failed. dbHandle is closed - 
+ exiting");  exit(1);
 }
 
 void objectDeleteToPBE(std::string objectNameString, void *db_handle) { @@ 
-2186,8 +2187,8 @@ void objectDeleteToPBE(std::string objectNameString, void 
*db_handle) {
   int rc = 0;
   char *zErr = NULL;
   std::string object_id_str;
-  int object_id;
-  int class_id;
+  unsigned object_id;
+  unsigned class_id;
   std::string class_name;
   bool badfile = false;
   TRACE_ENTER();
@@ -2561,7 +2562,7 @@ int verifyPbeState(SaImmHandleT immHandle, ClassMap 
*classIdMap,
   char *execErr = NULL;
   sqlite3 *dbHandle = (sqlite3 *)db_handle;
   std::string sqlQ("SELECT MAX(obj_id) FROM objects");
-  int obj_count = 0;
+  unsigned obj_count = 0;
   char **result = NULL;
   char *qErr = NULL;
   int nrows = 0;
@@ -2632,8 +2633,8 @@ bailout:
   return 0;
 }
 
-int dumpObjectsToPbe(SaImmHandleT immHandle, ClassMap *classIdMap,
- void *db_handle) {
+unsigned dumpObjectsToPbe(SaImmHandleT immHandle, ClassMap *classIdMap,
+  void *db_handle) {
   int rc = 0;
   SaNameT root;
   SaImmSearchHandleT searchHandle;
@@ -2727,12 +2728,12 @@ int dumpObjectsToPbe(SaImmHandleT immHandle, ClassMap 
*classIdMap,
   return object_id; /* == number of dumped objects */
 bailout:
   sqlite3_close(dbHandle);
-  return (-1);
+  return (0);
 }
 
-int dumpObjectsToPbe(SaImmHandleT immHandle, ClassMap *classIdMap,
- void *db_handle,
- std::list ) {
+unsigned dumpObjectsToPbe(SaImmHandleT immHandle, ClassMap *classIdMap,
+  void *db_handle,
+  std::list ) {
   int rc = 0;
   SaNameT root;
   SaImmSearchHandleT searchHandle;
@@ -2837,7 +2838,7 @@ int dumpObjectsToPbe(SaImmHandleT immHandle, ClassMap 
*classIdMap,
 bailout:
   sqlite3_close(dbHandle);
   TRACE_LEAVE();
-  return (-1);
+  return (0);
 }
 
 SaAisErrorT pbeBeginTrans(void *db_handle) { @@ -3815,9 +3816,9 @@ unsigned 
int purgeInstancesOfClassToPBE(SaImmHandleT immHandle,
   return 0;
 }
 
-int dumpInstancesOfClassToPBE(SaImmHandleT immHandle, ClassMap* classIdMap,
-

Re: [devel] [PATCH 1/1] base: Improve backtrace print in daemon.c [#2853]

2018-05-30 Thread Ravi Sekhar Reddy Konda

Hi Hans,

Ack, code review only

Regards,
Ravi

-Original Message-
From: Hans Nordeback [mailto:hans.nordeb...@ericsson.com] 
Sent: Wednesday, May 16, 2018 12:54 PM
To: anders.wid...@ericsson.com; Ravi Sekhar Reddy Konda 
; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Hans Nordeback 

Subject: [PATCH 1/1] base: Improve backtrace print in daemon.c [#2853]

---
 src/base/daemon.c   | 52 ++---
 tools/cluster_sim_uml/build_uml |  1 +
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/src/base/daemon.c b/src/base/daemon.c index 361dd8dd6..2ad0dcd2d 
100644
--- a/src/base/daemon.c
+++ b/src/base/daemon.c
@@ -608,14 +608,18 @@ static void fatal_signal_handler(int sig, siginfo_t 
*siginfo, void *ctx)
const int BT_ARRAY_SIZE = 20;
void *bt_array[BT_ARRAY_SIZE];
size_t bt_size;
-   int fd;
char bt_header[40];
+   char cmd_buf[200];
+   char addr2line_buf[120];
+   Dl_info dl_info;
+   FILE *fp;
 
-   if ((fd = open(bt_filename, O_RDWR | O_CREAT, 0644)) < 0) {
+   int fd = open(bt_filename, O_RDWR | O_CREAT, 0644);
+
+   if (fd < 0)
goto done;
-   }
 
-   snprintf(bt_header, sizeof(bt_header), "signal: %d pid: %u uid: %u\n",
+   snprintf(bt_header, sizeof(bt_header), "signal: %d pid: %u uid: 
+%u\n\n",
 sig, siginfo->si_pid, siginfo->si_uid);
 
if (write(fd, bt_header, strlen(bt_header)) < 0) { @@ -624,6 +628,45 @@ 
static void fatal_signal_handler(int sig, siginfo_t *siginfo, void *ctx)
}
 
bt_size = plibc_backtrace(bt_array, BT_ARRAY_SIZE);
+
+   if (system("which addr2line") == 0) {
+   for (int i = 0; i < bt_size; ++i) {
+   memset(_info, 0, sizeof(dl_info));
+   dladdr(bt_array[i], _info);
+   ptrdiff_t offset = bt_array[i] - dl_info.dli_fbase;
+
+   snprintf(cmd_buf, sizeof(cmd_buf),
+"addr2line %tx -p -f -e %s",
+offset, dl_info.dli_fname);
+
+   fp = popen(cmd_buf, "r");
+   if (fp == NULL) {
+   syslog(LOG_ERR,
+  "popen failed: %s", strerror(errno));
+   } else {
+   if (fgets(addr2line_buf,
+ sizeof(addr2line_buf),
+ fp) != NULL) {
+   snprintf(cmd_buf, sizeof(cmd_buf),
+"# %d %s",
+i, addr2line_buf);
+   if (write(fd, cmd_buf,
+ strlen(cmd_buf)) < 0) {
+   syslog(LOG_ERR,
+  "write failed: %s",
+  strerror(errno));
+   }
+   }
+   pclose(fp);
+   }
+   }
+   }
+
+   if (write(fd, "\n", 1) < 0) {
+   syslog(LOG_ERR,
+  "write failed: %s", strerror(errno));
+   }
+
plibc_backtrace_symbols_fd(bt_array, bt_size, fd);
 
close(fd);
@@ -677,6 +720,7 @@ static void install_fatal_signal_handlers(void)
 time_string, getpid());
 
struct sigaction action;
+
memset(, 0, sizeof(action));
action.sa_sigaction = fatal_signal_handler;
sigfillset(_mask);
diff --git a/tools/cluster_sim_uml/build_uml b/tools/cluster_sim_uml/build_uml 
index b9f224360..16d49d03e 100755
--- a/tools/cluster_sim_uml/build_uml
+++ b/tools/cluster_sim_uml/build_uml
@@ -176,6 +176,7 @@ cmd_create_rootfs()
 test -e /usr/bin/lsof && install /usr/bin/lsof usr/bin
 test -e /bin/pidof && install /bin/pidof usr/bin
 test -e /usr/sbin/tcpdump && install /usr/sbin/tcpdump usr/sbin
+test -e /usr/bin/addr2line && install /usr/bin/addr2line usr/bin
 if test -e /usr/bin/gdb; then
install /usr/bin/gdb usr/bin
if test -d /usr/share/gdb; then
--
2.17.0


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] rded: run controller promotion code in new thread [#2857]

2018-05-23 Thread Ravi Sekhar Reddy Konda

Hi Gary,

Ack, code review only 

Regards,
Ravi
- Original Message -
From: gary@dektech.com.au
To: hans.nordeb...@ericsson.com, ravisekhar.ko...@oracle.com, 
anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net, gary@dektech.com.au
Sent: Friday, May 18, 2018 11:20:34 AM GMT +05:30 Chennai, Kolkata, Mumbai, New 
Delhi
Subject: [PATCH 1/1] rded: run controller promotion code in new thread [#2857]

Currently, the consensus code relating to node promotion
is run from the main thread. We can improve rded's
responsiveness by moving this code into another thread.
---
 src/rde/rded/rde_cb.h|  3 +-
 src/rde/rded/rde_main.cc |  6 +++-
 src/rde/rded/role.cc | 82 ++--
 src/rde/rded/role.h  |  2 ++
 4 files changed, 61 insertions(+), 32 deletions(-)

diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h
index f5ad689c3..877687341 100644
--- a/src/rde/rded/rde_cb.h
+++ b/src/rde/rded/rde_cb.h
@@ -53,7 +53,8 @@ enum RDE_MSG_TYPE {
   RDE_MSG_NEW_ACTIVE_CALLBACK = 5,
   RDE_MSG_NODE_UP = 6,
   RDE_MSG_NODE_DOWN = 7,
-  RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8
+  RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8,
+  RDE_MSG_ACTIVE_PROMOTION_SUCCESS = 9
 };
 
 struct rde_peer_info {
diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
index c5b4b8283..c59aa4536 100644
--- a/src/rde/rded/rde_main.cc
+++ b/src/rde/rded/rde_main.cc
@@ -55,7 +55,8 @@ const char *rde_msg_name[] = {"-",
   "RDE_MSG_NEW_ACTIVE_CALLBACK(5)"
   "RDE_MSG_NODE_UP(6)",
   "RDE_MSG_NODE_DOWN(7)",
-  "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)"};
+  "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)",
+  "RDE_MSG_ACTIVE_PROMOTION_SUCCESS(9)"};
 
 static RDE_CONTROL_BLOCK _rde_cb;
 static RDE_CONTROL_BLOCK *rde_cb = &_rde_cb;
@@ -186,6 +187,9 @@ static void handle_mbx_event() {
 LOG_WA("Received takeover request when not active");
   }
 } break;
+case RDE_MSG_ACTIVE_PROMOTION_SUCCESS:
+  role->NodePromoted();
+  break;
 default:
   LOG_ER("%s: discarding unknown message type %u", __FUNCTION__, 
msg->type);
   break;
diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
index 1b5a6ae89..b6a5df51a 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -22,6 +22,7 @@
 #include "rde/rded/role.h"
 #include 
 #include 
+#include 
 #include "base/getenv.h"
 #include "base/logtrace.h"
 #include "base/ncs_main_papi.h"
@@ -63,6 +64,55 @@ void Role::MonitorCallback(const std::string& key, const 
std::string& new_value,
   osafassert(status == NCSCC_RC_SUCCESS);
 }
 
+void Role::PromoteNode(const uint64_t cluster_size) {
+  TRACE_ENTER();
+  SaAisErrorT rc;
+
+  Consensus consensus_service;
+
+  rc = consensus_service.PromoteThisNode(true, cluster_size);
+  if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) {
+LOG_ER("Unable to set active controller in consensus service");
+opensaf_reboot(0, nullptr,
+   "Unable to set active controller in consensus service");
+  }
+
+  if (rc == SA_AIS_ERR_EXIST) {
+LOG_WA("Another controller is already active");
+return;
+  }
+
+  RDE_CONTROL_BLOCK* cb = rde_get_control_block();
+
+  // send msg to main thread
+  rde_msg* msg = static_cast(malloc(sizeof(rde_msg)));
+  msg->type = RDE_MSG_ACTIVE_PROMOTION_SUCCESS;
+  uint32_t status;
+  status = m_NCS_IPC_SEND(>mbx, msg, NCS_IPC_PRIORITY_HIGH);
+  osafassert(status == NCSCC_RC_SUCCESS);
+}
+
+void Role::NodePromoted() {
+  ExecutePreActiveScript();
+  LOG_NO("Switched to ACTIVE from %s", to_string(role()));
+  role_ = PCS_RDA_ACTIVE;
+  rde_rda_send_role(role_);
+
+  Consensus consensus_service;
+  RDE_CONTROL_BLOCK* cb = rde_get_control_block();
+
+  // register for callback if active controller is changed
+  // in consensus service
+  if (cb->monitor_lock_thread_running == false) {
+cb->monitor_lock_thread_running = true;
+consensus_service.MonitorLock(MonitorCallback, cb->mbx);
+  }
+  if (cb->monitor_takeover_req_thread_running == false) {
+cb->monitor_takeover_req_thread_running = true;
+consensus_service.MonitorTakeoverRequest(MonitorCallback, cb->mbx);
+  }
+}
+
 Role::Role(NODE_ID own_node_id)
 : known_nodes_{},
   role_{PCS_RDA_QUIESCED},
@@ -83,36 +133,8 @@ timespec* Role::Poll(timespec* ts) {
   timeout = ts;
 } else {
   RDE_CONTROL_BLOCK* cb = rde_get_control_block();
-  SaAisErrorT rc;
-  Consensus consensus_service;
-
-  rc = consensus_service.PromoteThisNode(true, cb->cluster_members.size());
-  if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) {
-LOG_ER("Unable to set active controller in consensus service");
-opensaf_reboot(0, nullptr,
-   "Unable to set active controller in consensus service");
-  }
-
-  if (rc == SA_AIS_ERR_EXIST) {
-

Re: [devel] [PATCH 1/1] lck: fix errors when displaying SaLckResource class [#2070]

2018-05-15 Thread Ravi Sekhar Reddy Konda

Hi Alex,

Ack, code review only

Regards,
Ravi

-Original Message-
From: Alex Jones [mailto:ajo...@rbbn.com] 
Sent: Monday, May 07, 2018 8:13 PM
To: Ravi Sekhar Reddy Konda <ravisekhar.ko...@oracle.com>
Cc: opensaf-devel@lists.sourceforge.net; Alex Jones <ajo...@rbbn.com>
Subject: [PATCH 1/1] lck: fix errors when displaying SaLckResource class [#2070]

When getting IMM info for a lock resource, SaLckResource, the information is 
often not correct.

Both lckd and lcknd are not updating IMM correctly when SaLckResource 
information changes at runtime.

Write test cases which make sure these attributes are being updated correctly.
And fix the issues.
---
 src/lck/Makefile.am   |  5 ++-
 src/lck/apitest/test_saLckLimitGet.cc |  4 ++
 src/lck/lckd/gld_evt.c| 17 +---
 src/lck/lckd/gld_rsc.c| 26 ++--
 src/lck/lckd/gld_standby.c|  2 +-
 src/lck/lcknd/glnd_client.c   | 76 +--
 src/lck/lcknd/glnd_client.h   |  4 --
 src/lck/lcknd/glnd_evt.c  | 48 ++
 src/lck/lcknd/glnd_res.c  | 24 +++
 9 files changed, 121 insertions(+), 85 deletions(-)

diff --git a/src/lck/Makefile.am b/src/lck/Makefile.am index 
db3e043e1..2aa64b4a5 100644
--- a/src/lck/Makefile.am
+++ b/src/lck/Makefile.am
@@ -200,7 +200,10 @@ bin_lcktest_SOURCES = \
src/lck/apitest/tet_glsv_util.c \
src/lck/apitest/tet_gla.c \
src/lck/apitest/tet_gla_conf.c \
-   src/lck/apitest/tet_gld.c
+   src/lck/apitest/tet_gld.c \
+   src/lck/apitest/test_ErrUnavailable.cc \
+   src/lck/apitest/test_saLckLimitGet.cc \
+   src/lck/apitest/test_saLckResourceClass.cc
 
 bin_lcktest_LDADD = \
   lib/libSaLck.la \
diff --git a/src/lck/apitest/test_saLckLimitGet.cc 
b/src/lck/apitest/test_saLckLimitGet.cc
index 74c9194d4..dbf804ac1 100644
--- a/src/lck/apitest/test_saLckLimitGet.cc
+++ b/src/lck/apitest/test_saLckLimitGet.cc
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ais/include/saLck.h"
 #include "lck/apitest/lcktest.h"
 
@@ -153,6 +154,9 @@ static void saLckLimitGet_08(void)
 
   rc = saLckFinalize(lckHandle);
   assert(rc == SA_AIS_OK);
+
+  // wait for resources to clean up
+  sleep(2);
 }
 
 static void saLckLimitGet_09(void)
diff --git a/src/lck/lckd/gld_evt.c b/src/lck/lckd/gld_evt.c index 
6134093f1..c6a33282e 100644
--- a/src/lck/lckd/gld_evt.c
+++ b/src/lck/lckd/gld_evt.c
@@ -144,7 +144,7 @@ static uint32_t gld_rsc_open(GLSV_GLD_EVT *evt)
NCSMDS_INFO snd_mds;
uint32_t res = NCSCC_RC_FAILURE;
;
-   SaAisErrorT error;
+   SaAisErrorT error = SA_AIS_OK;
uint32_t node_id;
bool node_first_rsc_open = false;
GLSV_GLD_GLND_RSC_REF *glnd_rsc = NULL; @@ -347,14 +347,14 @@ static 
uint32_t gld_rsc_close(GLSV_GLD_EVT *evt)
glnd_rsc->rsc_info->saf_rsc_no_of_users =
glnd_rsc->rsc_info->saf_rsc_no_of_users - 1;
 
+   if (evt->info.rsc_details.lcl_ref_cnt == 0)
+   gld_rsc_rmv_node_ref(gld_cb, glnd_rsc->rsc_info, glnd_rsc,
+node_details, orphan_flag);
+
/*Checkkpoint resource close event */
glsv_gld_a2s_ckpt_rsc_details(
gld_cb, evt->evt_type, evt->info.rsc_details, node_details->dest_id,
evt->info.rsc_details.lcl_ref_cnt);
-
-   if (evt->info.rsc_details.lcl_ref_cnt == 0)
-   gld_rsc_rmv_node_ref(gld_cb, glnd_rsc->rsc_info, glnd_rsc,
-node_details, orphan_flag);
 end:
TRACE_LEAVE2("Return value %u", rc);
return rc;
@@ -426,19 +426,24 @@ uint32_t gld_rsc_ref_set_orphan(GLSV_GLD_GLND_DETAILS 
*node_details,  {
GLSV_GLD_GLND_RSC_REF *glnd_rsc_ref;
 
+   TRACE_ENTER2("rsc_id: %i orphan: %i lck_mode: %i", rsc_id, orphan,
+   lck_mode);
+
/* Find the rsc_info based on resource id */
glnd_rsc_ref = (GLSV_GLD_GLND_RSC_REF *)ncs_patricia_tree_get(
_details->rsc_info_tree, (uint8_t *)_id);
if ((glnd_rsc_ref == NULL) || (glnd_rsc_ref->rsc_info == NULL)) {
LOG_ER("Patricia tree get failed");
+   TRACE_LEAVE();
return NCSCC_RC_FAILURE;
}
 
glnd_rsc_ref->rsc_info->can_orphan = orphan;
glnd_rsc_ref->rsc_info->orphan_lck_mode = lck_mode;
-   if (orphan == true)
+   if (orphan == false)
glnd_rsc_ref->rsc_info->saf_rsc_stripped_cnt++;
 
+   TRACE_LEAVE();
return NCSCC_RC_SUCCESS;
 }
 
diff --git a/src/lck/lckd/gld_rsc.c b/src/lck/lckd/gld_rsc.c index 
ed2bd5a71..7a45cd716 100644
--- a/src/lck/lckd/gld_rsc.c
+++ b/src/lck/lckd/gld_rsc.c
@@ -297,12 +297,16 @@ void gld_free_rsc_info(GLSV_GLD_CB *gld_cb, 
GLSV_GLD_RSC_I

Re: [devel] [PATCH 1/1] plm: don't instantiate child EEs twice when unlocking parent EE [#2846]

2018-05-09 Thread Ravi Sekhar Reddy Konda

Hi Alex,

Ack for the patch, code  review only

Regards,
Ravi

-Original Message-
From: Alex Jones [mailto:ajo...@rbbn.com] 
Sent: Thursday, May 03, 2018 9:08 PM
To: mathi.np@gmail.com; Ravi Sekhar Reddy Konda 
<ravisekhar.ko...@oracle.com>
Cc: opensaf-devel@lists.sourceforge.net; Alex Jones <ajo...@rbbn.com>
Subject: [PATCH 1/1] plm: don't instantiate child EEs twice when unlocking 
parent EE [#2846]

Child EEs (VMs) can fail to boot up when unlocking the parent EE.

The current code resets the VM when unlocking the parent EE. This is done in 
plms_move_chld_ent_to_insvc(). Later in the unlock function, the child EEs are 
reset again. libvirt does not like these resets being done in less than 1 
second, and often will not boot the VM.

Don't reset the child EEs twice when unlocking the parent EE.
---
 src/plm/plmd/plms_adm_fsm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/plm/plmd/plms_adm_fsm.c b/src/plm/plmd/plms_adm_fsm.c index 
8f5725cd8..a29dc28e0 100644
--- a/src/plm/plmd/plms_adm_fsm.c
+++ b/src/plm/plmd/plms_adm_fsm.c
@@ -4520,7 +4520,10 @@ static SaUint32T plms_ent_unlock(PLMS_ENTITY *ent, 
PLMS_TRACK_INFO *trk_info,
 
if ((PLMS_EE_ENTITY == head->plm_entity->entity_type) &&
(!plms_rdness_flag_is_set(head->plm_entity,
- SA_PLM_RF_DEPENDENCY))) {
+ SA_PLM_RF_DEPENDENCY)) &&
+   /* child EEs have already been instantiated above */
+   head->plm_entity->parent->entity_type !=
+   PLMS_EE_ENTITY) {
ret_err = plms_ee_instantiate(head->plm_entity,
  false, true);
if (NCSCC_RC_SUCCESS != ret_err) {
--
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] clmd: Increase message priority of CLMSV_CLMS_MDS_NODE_EVT to be sent to main thread [#2842]

2018-04-25 Thread Ravi Sekhar Reddy Konda

Hi Minh,

Ack for the patch, code review only

Regards,
Ravi

-Original Message-
From: Minh Chau [mailto:minh.c...@dektech.com.au] 
Sent: Thursday, April 26, 2018 4:52 AM
To: anders.wid...@ericsson.com; hans.nordeb...@ericsson.com; 
ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
Subject: [PATCH 1/1] clmd: Increase message priority of CLMSV_CLMS_MDS_NODE_EVT 
to be sent to main thread [#2842]

In the event of stop/start standby controller, the node is stopped that 
generates the MDS event CLMSV_CLMS_MDS_NODE_EVT. This event is being sent to 
main thread with NORMAL priority. When the node is started again, the other 
event like CLMSV_CLUSTER_JOIN_REQ is being sent with HIGH priority.

The race happens as CLMSV_CLMS_MDS_NODE_EVT is processed after the event 
CLMSV_CLUSTER_JOIN_REQ, possibly caused by the priority.

The patch sets priority of CLMSV_CLMS_MDS_NODE_EVT as high as the others so 
that the order of messages processed in main thread should depend on the timing 
order of events that occurred.
---
 src/clm/clmd/clms_mds.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/clm/clmd/clms_mds.cc b/src/clm/clmd/clms_mds.cc index 
a1f5348..58552cc 100644
--- a/src/clm/clmd/clms_mds.cc
+++ b/src/clm/clmd/clms_mds.cc
@@ -1097,7 +1097,7 @@ static uint32_t clms_mds_node_event(struct 
ncsmds_callback_info *mds_info) {
 clmsv_evt->info.node_mds_info.node_id = mds_info->info.node_evt.node_id;
 clmsv_evt->info.node_mds_info.nodeup = SA_TRUE;
 
-rc = m_NCS_IPC_SEND(_cb->mbx, clmsv_evt, NCS_IPC_PRIORITY_NORMAL);
+rc = m_NCS_IPC_SEND(_cb->mbx, clmsv_evt, 
+ NCS_IPC_PRIORITY_HIGH);
 if (rc != NCSCC_RC_SUCCESS) {
   TRACE("IPC send failed %d", rc);
   free(clmsv_evt);
--
2.7.4

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] dtm: Add --delete option to osaflog command for deleting log streams [#2837]

2018-04-18 Thread Ravi Sekhar Reddy Konda

HI Anders,

Ack, please update the pr doc while pushing to the repo

Thanks,
Ravi

-Original Message-
From: Anders Widell [mailto:anders.wid...@ericsson.com] 
Sent: Friday, April 13, 2018 9:02 PM
To: Ravi Sekhar Reddy Konda <ravisekhar.ko...@oracle.com>
Cc: opensaf-devel@lists.sourceforge.net; Anders Widell 
<anders.wid...@ericsson.com>
Subject: [PATCH 1/1] dtm: Add --delete option to osaflog command for deleting 
log streams [#2837]

Make it possible to delete log streams in the internal OpenSAF log server.  
This will free up resources for log streams that are no longer used, as well as 
make it possible to create a new log stream with the same name but different 
configuration options (max file size and number of backups).
---
 src/dtm/README  |  7 --
 src/dtm/tools/osaflog.cc| 49 +
 src/dtm/transport/log_server.cc | 19 ++--  
src/dtm/transport/log_server.h  |  2 ++
 4 files changed, 59 insertions(+), 18 deletions(-)

diff --git a/src/dtm/README b/src/dtm/README index c671cfd39..430ff1950 100644
--- a/src/dtm/README
+++ b/src/dtm/README
@@ -185,8 +185,11 @@ Options:
   server to disk even when no LOGSTREAM
   is specified.
 --print   print the messages stored on disk for the
-  specified LOGSTREAM. This option is default
-  when no option is specified.
+  specified LOGSTREAM(s). This option is the
+  default when no option is specified.
+--delete  Delete the specified LOGSTREAM(s) by
+  removing allocated resources in the log
+  server. Does not delete log files from disk.
 --max-file-size=SIZE  Set the maximum size of the log file to
   SIZE bytes. The log file will be rotated
   when it exceeds this size. Suffixes k, M and diff --git 
a/src/dtm/tools/osaflog.cc b/src/dtm/tools/osaflog.cc index 
572c68ad1..c5946eea7 100644
--- a/src/dtm/tools/osaflog.cc
+++ b/src/dtm/tools/osaflog.cc
@@ -47,6 +47,7 @@ bool Flush();
 base::UnixServerSocket* CreateSocket();  uint64_t Random64Bits(uint64_t seed); 
 bool PrettyPrint(const std::string& log_stream);
+bool Delete(const std::string& log_stream);
 std::list OpenLogFiles(const std::string& log_stream);  std::string 
PathName(const std::string& log_stream, int suffix);  uint64_t GetInode(int 
fd); @@ -61,21 +62,23 @@ int main(int argc, char** argv) {
   struct option long_options[] = {{"max-file-size", required_argument, 0, 'm'},
   {"max-backups", required_argument, 0, 'b'},
   {"flush", no_argument, 0, 'f'},
-  {"print", required_argument, nullptr, 'p'},
+  {"print", no_argument, nullptr, 'p'},
+  {"delete", no_argument, nullptr, 
+ 'd'},
   {0, 0, 0, 0}};
 
   uint64_t max_file_size = 0;
   uint64_t max_backups = 0;
-  char *pretty_print_argument = NULL;
   int option = 0;
 
   int long_index = 0;
   bool flush_result =  true;
   bool print_result =  true;
+  bool delete_result =  true;
   bool max_file_size_result = true;
   bool number_of_backups_result = true;
   bool flush_set = false;
   bool pretty_print_set = false;
+  bool delete_set = false;
   bool max_file_size_set = false;
   bool max_backups_set = false;
 
@@ -88,10 +91,12 @@ int main(int argc, char** argv) {
long_options, _index)) != -1) {
 switch (option) {
  case 'p':
-   pretty_print_argument = optarg;
pretty_print_set = true;
flush_set = true;
  break;
+ case 'd':
+   delete_set = true;
+ break;
  case 'f':
flush_set = true;
  break;
@@ -115,12 +120,13 @@ int main(int argc, char** argv) {
 }
   }
 
-  if (argc - optind == 1) {
- flush_result = Flush();
- flush_set = false;
- print_result = PrettyPrint(argv[optind]);
- pretty_print_set = false;
-  } else if (argc - optind > 1) {
+  if (argc > optind && !pretty_print_set && !delete_set) {
+pretty_print_set = true;
+flush_set = true;
+  }
+
+  if ((argc <= optind && (pretty_print_set || delete_set)) ||
+  (pretty_print_set && delete_set)) {
  PrintUsage(argv[0]);
  exit(EXIT_FAILURE);
   }
@@ -129,7 +135,14 @@ int main(int argc, char** argv) {
  flush_result = Flush();
   }
   if (pretty_print_set == true) {
- print_result = PrettyPrint(pretty_print_argument);
+while (print_result && optind < argc) {
+  print_result = PrettyPrint(argv[

Re: [devel] [PATCH 0/1] Review Request for base: Re-factor the timer implementation [#2440]

2018-04-17 Thread Ravi Sekhar Reddy Konda

Hi Anders,

I see you have provided an alternate implementation based on AIS TMR API 
instead of legacy NCS Timer API, this is  good.
I hope you have completely taken care of backward compatibility,  cases like in 
the current OpenSAF Services we are using NCS Timer API, can I use new AIS TMR 
API for the new timers, that is at a time can  SAF Service use different timer 
implementations for different Timers?

Me and Syam are reviewing code, we need couple of more days to complete the 
review. Will let you our review comments once done.

This enhancement needs PR doc update also, while we complete the review, can 
you provide the PR doc update also
  
Thanks,
Ravi


-Original Message-
From: Anders Widell [mailto:anders.wid...@ericsson.com] 
Sent: Monday, April 09, 2018 10:00 PM
To: Ravi Sekhar Reddy Konda <ravisekhar.ko...@oracle.com>
Cc: opensaf-devel@lists.sourceforge.net; Anders Widell 
<anders.wid...@ericsson.com>
Subject: [PATCH 0/1] Review Request for base: Re-factor the timer 
implementation [#2440]

Summary: base: Re-factor the timer implementation [#2440] Review request for 
Ticket(s): 2440 Peer Reviewer(s): Ravi Pull request to: 
Affected branch(es): develop
Development branch: ticket-2440
Base revision: b83be452a25a37c7f5b568b436d1af544afb7350
Personal repository: git://git.code.sf.net/u/anders-w/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 49be7d5f610fb7e23d8df575007fa3b21a5e0946
Author: Anders Widell <anders.wid...@ericsson.com>
Date:   Mon, 9 Apr 2018 16:10:41 +0200

base: Re-factor the timer implementation [#2440]

Re-factor the NCS timer implementation and provide the AIS TMR API as an 
alternative way to use the timer implementation, alongside with the old NCS 
API. The AIS TMR API is intended for internal use in OpenSAF services for now, 
but it could be exported as an official OpenSAF API later if it turns out to be 
working well. OpenSAF services using the TMR API instead of the NCS API will no 
longer need a dedicated timer thread.

The new timer implementation is using a C++ STL multiset for the timer queue, 
and a Linux timerfd for the selection object.



Added Files:

 src/ais/include/saTmr.h
 src/base/handle/external_mutex.h
 src/base/handle/handle.cc
 src/base/handle/handle.h
 src/base/handle/object_db.cc
 src/base/handle/object_db.h
 src/base/handle/object.h
 src/base/ncssysf_tmr.cc
 src/base/tests/sa_tmr_test.cc
 src/base/timer/saTmr.cc
 src/base/timer/timer.h
 src/base/timer/timer_handle.cc
 src/base/timer/timer_handle.h


Removed Files:
--
 src/base/sysf_tmr.c


Complete diffstat:
--
 cppcheck_append.cc   |   12 +
 src/ais/Makefile.am  |3 +
 src/ais/include/saTmr.h  |  148 ++
 src/base/Makefile.am |   13 +-
 src/base/handle/external_mutex.h |   66 +++
 src/base/handle/handle.cc|   63 +++
 src/base/handle/handle.h |  107 
 src/base/handle/object.h |   52 ++
 src/base/handle/object_db.cc |   91 
 src/base/handle/object_db.h  |  109 
 src/base/ncssysf_tmr.cc  |  254 +
 src/base/ncssysf_tmr.h   |  135 +++--
 src/base/ncssysf_tsk.h   |4 +-
 src/base/sysf_tmr.c  | 1085 --
 src/base/tests/sa_tmr_test.cc| 1079 +
 src/base/tests/sysf_tmr_test.cc  |  205 +--
 src/base/timer/saTmr.cc  |  569 
 src/base/timer/timer.h   |   55 ++
 src/base/timer/timer_handle.cc   |  190 +++
 src/base/timer/timer_handle.h|  101 
 20 files changed, 3149 insertions(+), 1192 deletions(-)


Testing Commands:
-

make check


Testing, Expected Results:
--

Unit tests shall pass


Conditions of Submission:
-

Ack from reviewer(s), or on 2018-04-16 if no comments have been received.


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper

Re: [devel] [PATCH 1/1] imm: make version parameter in immutil_xxx non-const [#2830]

2018-04-16 Thread Ravi Sekhar Reddy Konda

Hi Vu,
Ack with a comment,  make sure to reset local_version wherever API is called in 
a loop 
Thanks,
Ravi

-Original Message-
From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] 
Sent: Thursday, April 05, 2018 4:09 PM
To: ravisekhar.ko...@oracle.com; hans.nordeb...@ericsson.com; 
anders.wid...@ericsson.com; lennart.l...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: [PATCH 1/1] imm: make version parameter in immutil_xxx non-const 
[#2830]

The version in saImmO{m,i}Initialize is input/output parameter and is declared 
as non-constant for both IMM OM and OI API according to SAF spec.
But in immutil wrapper library, some are declared as constant and don't update 
the in/out version before returning from such wrappers.

This patch makes that parameter non-const and do update the version before 
returning from wrapper APIs. Also fix the wrong usage of these wrapper, passed 
const version, in some services/applications.
---
 src/amf/amfd/imm.cc  | 11 +++
 src/amf/amfnd/util.cc|  3 ++-
 src/log/apitest/imm_tstutil.c|  5 -
 src/log/apitest/logtest.c|  9 ++---
 src/log/apitest/logtestfr.c  |  6 --
 src/log/apitest/tet_log_runtime_cfgobj.c |  3 ++-
 src/log/logd/lgs_config.cc   |  3 ++-
 src/log/logd/lgs_imm.cc  | 15 ++-
 src/log/logd/lgs_imm_gcfg.cc |  7 +--
 src/osaf/immutil/immutil.c   | 20 +++-
 src/osaf/immutil/immutil.h   |  6 +++---
 src/smf/smfd/SmfAdminState.cc|  4 ++--
 src/smf/smfd/SmfExecControlHdl.cc|  3 ++-
 13 files changed, 64 insertions(+), 31 deletions(-)

diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc index 47c0e5a..8c70325 
100644
--- a/src/amf/amfd/imm.cc
+++ b/src/amf/amfd/imm.cc
@@ -1461,6 +1461,7 @@ done:
 SaAisErrorT avd_imm_init(void *avd_cb) {
   SaAisErrorT error = SA_AIS_OK;
   AVD_CL_CB *cb = (AVD_CL_CB *)avd_cb;
+  SaVersionT local_version = immVersion;
 
   TRACE_ENTER();
 
@@ -1471,13 +1472,13 @@ SaAisErrorT avd_imm_init(void *avd_cb) {
 
   cb->avd_imm_status = AVD_IMM_INIT_ONGOING;
   if ((error = immutil_saImmOiInitialize_2(>immOiHandle, _callbacks,
-   )) != SA_AIS_OK) {
+   _version)) != 
+ SA_AIS_OK) {
 LOG_ER("saImmOiInitialize failed %u", error);
 goto done;
   }
 
   if ((error = immutil_saImmOmInitialize(>immOmHandle, nullptr,
- )) != SA_AIS_OK) {
+ _version)) != SA_AIS_OK) 
+ {
 LOG_ER("saImmOmInitialize failed %u", error);
 goto done;
   }
@@ -2075,6 +2076,7 @@ void avd_imm_update_runtime_attrs(void) {
  */
 static void *avd_imm_reinit_bg_thread(void *_cb) {
   SaAisErrorT rc = SA_AIS_OK;
+  SaVersionT local_version;
   AVD_CL_CB *cb = (AVD_CL_CB *)_cb;
   AVD_EVT *evt;
   uint32_t status;
@@ -2098,9 +2100,10 @@ static void *avd_imm_reinit_bg_thread(void *_cb) {
 
 avd_cb->immOiHandle = 0;
 avd_cb->is_implementer = false;
+local_version = immVersion;
 
 if ((rc = immutil_saImmOiInitialize_2(>immOiHandle, _callbacks,
-  )) != SA_AIS_OK) {
+  _version)) != 
+ SA_AIS_OK) {
   LOG_ER("saImmOiInitialize failed %u", rc);
   osaf_mutex_unlock_ordie(_reinit_mutex);
   exit(EXIT_FAILURE);
@@ -2141,7 +2144,7 @@ static void *avd_imm_reinit_bg_thread(void *_cb) {
   /* Lets re-initialize Om interface also. */
   (void)immutil_saImmOmFinalize(cb->immOmHandle);
   if ((rc = immutil_saImmOmInitialize(>immOmHandle, nullptr,
-  )) != SA_AIS_OK) {
+  _version)) != 
+ SA_AIS_OK) {
 LOG_ER("saImmOmInitialize failed %u", rc);
 continue;
   }
diff --git a/src/amf/amfnd/util.cc b/src/amf/amfnd/util.cc index 
f6dbb49..38bf426 100644
--- a/src/amf/amfnd/util.cc
+++ b/src/amf/amfnd/util.cc
@@ -250,8 +250,9 @@ const char *avnd_failed_state_file_location(void) {  
SaAisErrorT saImmOmInitialize_cond(SaImmHandleT *immHandle,
const SaImmCallbacksT *immCallbacks,
const SaVersionT *version) {
+  SaVersionT local_version = *version;
   if (avnd_cb->scs_absence_max_duration == 0) {
-return immutil_saImmOmInitialize(immHandle, immCallbacks, version);
+return immutil_saImmOmInitialize(immHandle, immCallbacks, 
+ _version);
   }
 
   SaVersionT localVer = *version;
diff --git a/src/log/apitest/imm_tstutil.c b/src/log/apitest/imm_tstutil.c 
index 194dffa..2143f83 100644
--- a/src/log/apitest/imm_tstutil.c
+++ b/src/log/apitest/imm_tstutil.c
@@ -32,6 +32,7 @@ bool get_multivalue_type_string_from_imm(SaImmHandleT 
*omHandle,  {
SaAisErrorT

Re: [devel] [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

2018-04-16 Thread Ravi Sekhar Reddy Konda

Hi Hans,

 

As I said earlier this patch is for the cases where Remote fencing is not 
enabled. Stonith is valid only when Remote fencing is enabled.

 

Also the ideal solution in this scenario is CLM taking complete responsibility 
of fencing the node and AMF should depend on CLM Notification for doing role 
failover

In that case we won't see two Active SU's at the same time.

The patch is a temporary solution only where we are trying to Isolate the 
faulted node immediately.

 

Thanks,

Ravi

 

 

From: Hans Nordebäck [mailto:hans.nordeb...@ericsson.com] 
Sent: Friday, April 13, 2018 5:08 PM
To: Ravi Sekhar Reddy Konda <ravisekhar.ko...@oracle.com>; Anders Widell 
<anders.wid...@ericsson.com>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: SV: [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot 
[#2833]

 

Hi Ravi,

 

stonith is not only valid for virutalized environment, I assume stonith 
supports other e.g. ipmi in a legacy environment. The probability for 
"flickering" may be higher in a virtualized environment,

but for redundancy there should be two interfaces configured, which is the 
normal configuration in legacy. If the problem in this ticket is solved by 
using stonith I don't see a need for adding this patch.

BTW do this patch work when stonith is enabled?

/Regards HansN

 

On 04/13/2018 10:59 AM, Ravi Sekhar Reddy Konda wrote:

HI Hans,

 

The use case that we are addressing here is link flickering  when remote 
fencing is not enabled, Also remote fencing using Stonith is valid only in 
Virtualization environments. I have not tested using Stonith enabled as the use 
case is in the case where remote fencing is disabled.

 

Thanks,

Ravi 

 

 

From: Hans Nordebäck [mailto:hans.nordeb...@ericsson.com] 
Sent: Friday, April 13, 2018 1:10 AM
To: ravi-sekhar HYPERLINK 
"mailto:ravisekhar.ko...@oracle.com;<ravisekhar.ko...@oracle.com>; Anders 
Widell HYPERLINK "mailto:anders.wid...@ericsson.com;<anders.wid...@ericsson.com>
Cc: HYPERLINK 
"mailto:opensaf-devel@lists.sourceforge.net"opensaf-devel@lists.sourceforge.net
Subject: SV: [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

 

Hi Ravi,

 

I think stonith, implemented in ticket #1859, handles this case. This 
"flickering" was one the (manual) tests verifying the added stonith support.

It is important to have a separate interface for stonith, to be able to perform 
the remote fencing, similar to use a back plane.

Have you tested with stonith enabled? 

 

/Regards HansN 

  _  

Från: ravi-sekhar mailto:ravisekhar.ko...@oracle.com"ravisekhar.ko...@oracle.com>
Skickat: den 12 april 2018 15:29:13
Till: Hans Nordebäck; Anders Widell
Kopia: HYPERLINK 
"mailto:opensaf-devel@lists.sourceforge.net"opensaf-devel@lists.sourceforge.net;
 ravi-sekhar
Ämne: [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833] 

 

---
 scripts/opensaf_reboot | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
index df65c26..b219c39 100644
--- a/scripts/opensaf_reboot
+++ b/scripts/opensaf_reboot
@@ -37,6 +37,9 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH
 if [ -f "$pkgsysconfdir/fmd.conf" ]; then
   . "$pkgsysconfdir/fmd.conf"
 fi
+if [ -f "$pkgsysconfdir/nid.conf" ]; then
+  . "$pkgsysconfdir/nid.conf"
+fi
 
 NODE_ID_FILE=$pkglocalstatedir/node_id
 
@@ -118,7 +121,17 @@ else
 # uncomment the following line if debugging errors that keep 
restarting the node
 # exit 0
 
+    # If the application is using different interface for cluster 
communication, please
+    # add your application specific isolation commands here
+
 logger -t "opensaf_reboot" "Rebooting local node; 
timeout=$OPENSAF_REBOOT_TIMEOUT"
+  
+    # Isolate the node
+    if [ "$MDS_TRANSPORT" = "TIPC" ]; then
+   tipc-config -bd eth:$TIPC_ETH_IF
+    else
+   $icmd pkill -STOP osafdtmd
+    fi
 
 # Start a reboot supervision background process. Note that a 
similar
 # supervision is also done in the opensaf_reboot() function in 
LEAP.
@@ -128,12 +141,6 @@ else
 (sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" > 
"/proc/sysrq-trigger") &
 fi
 
-   # Stop some important opensaf processes to prevent bad things 
from happening
-   $icmd pkill -STOP osafamfwd
-   $icmd pkill -STOP osafamfnd
-   $icmd pkill -STOP osafamfd
-   $icmd pkill -STOP osaffmd
-
 # Flush OpenSAF internal log server messages to disk.
 $bindir/osaflog --flush
 
-- 
1.9.1

 
---

Re: [devel] [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

2018-04-16 Thread Ravi Sekhar Reddy Konda

HI Anders,

In the opensaf_reboot script we added these commands as safe mechanism,  so 
that we won't see issues like AMF taking Active role and start assigning. But 
as  part of this patch we are bringing down the communication mechanism, prior 
to killing these core saf services. So  I thought there is no need to bring 
them again.  Also if applications are using different communication mechanism 
we are recommending user to isolate applications  prior to bringing down tipc 
or dtm in the opensaf_reboot.  

Still I don't see any issue in having those commands also in the 
opensaf_reboot, I can retain them as safe mechanism. 

Thanks,
Ravi



-Original Message-
From: Anders Widell [mailto:anders.wid...@ericsson.com] 
Sent: Friday, April 13, 2018 5:18 PM
To: ravi-sekhar ; hans.nordeb...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

A question: why did you remove the "pkill -STOP osafamfwd" etc commands?

regards,

Anders Widell


On 04/12/2018 03:29 PM, ravi-sekhar wrote:
> ---
>   scripts/opensaf_reboot | 19 +--
>   1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot index 
> df65c26..b219c39 100644
> --- a/scripts/opensaf_reboot
> +++ b/scripts/opensaf_reboot
> @@ -37,6 +37,9 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH
>   if [ -f "$pkgsysconfdir/fmd.conf" ]; then
> . "$pkgsysconfdir/fmd.conf"
>   fi
> +if [ -f "$pkgsysconfdir/nid.conf" ]; then
> +  . "$pkgsysconfdir/nid.conf"
> +fi
>   
>   NODE_ID_FILE=$pkglocalstatedir/node_id
>   
> @@ -118,7 +121,17 @@ else
>   # uncomment the following line if debugging errors that keep 
> restarting the node
>   # exit 0
>   
> +# If the application is using different interface for 
> cluster communication, please
> +# add your application specific isolation commands 
> + here
> +
>   logger -t "opensaf_reboot" "Rebooting local node; 
> timeout=$OPENSAF_REBOOT_TIMEOUT"
> +
> +# Isolate the node
> +if [ "$MDS_TRANSPORT" = "TIPC" ]; then
> +   tipc-config -bd eth:$TIPC_ETH_IF
> +else
> +   $icmd pkill -STOP osafdtmd
> +fi
>   
>   # Start a reboot supervision background process. Note that a 
> similar
>   # supervision is also done in the opensaf_reboot() function in 
> LEAP.
> @@ -128,12 +141,6 @@ else
>   (sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" > 
> "/proc/sysrq-trigger") &
>   fi
>   
> - # Stop some important opensaf processes to prevent bad things 
> from happening
> - $icmd pkill -STOP osafamfwd
> - $icmd pkill -STOP osafamfnd
> - $icmd pkill -STOP osafamfd
> - $icmd pkill -STOP osaffmd
> -
>   # Flush OpenSAF internal log server messages to disk.
>   $bindir/osaflog --flush
>   


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

2018-04-13 Thread Ravi Sekhar Reddy Konda

HI Hans,

 

The use case that we are addressing here is link flickering  when remote 
fencing is not enabled, Also remote fencing using Stonith is valid only in 
Virtualization environments. I have not tested using Stonith enabled as the use 
case is in the case where remote fencing is disabled.

 

Thanks,

Ravi 

 

 

From: Hans Nordebäck [mailto:hans.nordeb...@ericsson.com] 
Sent: Friday, April 13, 2018 1:10 AM
To: ravi-sekhar ; Anders Widell 

Cc: opensaf-devel@lists.sourceforge.net
Subject: SV: [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

 

Hi Ravi,

 

I think stonith, implemented in ticket #1859, handles this case. This 
"flickering" was one the (manual) tests verifying the added stonith support.

It is important to have a separate interface for stonith, to be able to perform 
the remote fencing, similar to use a back plane.

Have you tested with stonith enabled? 

 

/Regards HansN 

  _  

Från: ravi-sekhar mailto:ravisekhar.ko...@oracle.com"ravisekhar.ko...@oracle.com>
Skickat: den 12 april 2018 15:29:13
Till: Hans Nordebäck; Anders Widell
Kopia: HYPERLINK 
"mailto:opensaf-devel@lists.sourceforge.net"opensaf-devel@lists.sourceforge.net;
 ravi-sekhar
Ämne: [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833] 

 

---
 scripts/opensaf_reboot | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
index df65c26..b219c39 100644
--- a/scripts/opensaf_reboot
+++ b/scripts/opensaf_reboot
@@ -37,6 +37,9 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH
 if [ -f "$pkgsysconfdir/fmd.conf" ]; then
   . "$pkgsysconfdir/fmd.conf"
 fi
+if [ -f "$pkgsysconfdir/nid.conf" ]; then
+  . "$pkgsysconfdir/nid.conf"
+fi
 
 NODE_ID_FILE=$pkglocalstatedir/node_id
 
@@ -118,7 +121,17 @@ else
 # uncomment the following line if debugging errors that keep 
restarting the node
 # exit 0
 
+    # If the application is using different interface for cluster 
communication, please
+    # add your application specific isolation commands here
+
 logger -t "opensaf_reboot" "Rebooting local node; 
timeout=$OPENSAF_REBOOT_TIMEOUT"
+  
+    # Isolate the node
+    if [ "$MDS_TRANSPORT" = "TIPC" ]; then
+   tipc-config -bd eth:$TIPC_ETH_IF
+    else
+   $icmd pkill -STOP osafdtmd
+    fi
 
 # Start a reboot supervision background process. Note that a 
similar
 # supervision is also done in the opensaf_reboot() function in 
LEAP.
@@ -128,12 +141,6 @@ else
 (sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" > 
"/proc/sysrq-trigger") &
 fi
 
-   # Stop some important opensaf processes to prevent bad things 
from happening
-   $icmd pkill -STOP osafamfwd
-   $icmd pkill -STOP osafamfnd
-   $icmd pkill -STOP osafamfd
-   $icmd pkill -STOP osaffmd
-
 # Flush OpenSAF internal log server messages to disk.
 $bindir/osaflog --flush
 
-- 
1.9.1
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfnd: unlock before releasing the monitoring thread to avoid deadlock [#2818]

2018-04-04 Thread Ravi Sekhar Reddy Konda

Hi Minh,

Did you get time to look at this patch, please consider reviewing it with 
priority

Thanks,
Ravi

-Original Message-
From: ravi-sekhar [mailto:ravisekhar.ko...@oracle.com] 
Sent: Thursday, March 29, 2018 11:30 AM
To: hans.nordeb...@ericsson.com; minh.c...@dektech.com.au; 
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; ravi-sekhar 

Subject: [PATCH 1/1] amfnd: unlock before releasing the monitoring thread to 
avoid deadlock [#2818]

---
 src/amf/amfnd/mon.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/amf/amfnd/mon.cc b/src/amf/amfnd/mon.cc index 9cdfc37..4932d50 
100644
--- a/src/amf/amfnd/mon.cc
+++ b/src/amf/amfnd/mon.cc
@@ -161,6 +161,8 @@ uint32_t avnd_mon_req_del(AVND_CB *cb, SaUint64T pid) {
 
   mon_rec = (AVND_MON_REQ *)m_NCS_DBLIST_FIND_FIRST(pid_mon_list);
 
+  m_NCS_UNLOCK(>mon_lock, NCS_LOCK_WRITE);
+
   /* No more PIDs exists in the pid_mon_list for monitoring */
   if (!mon_rec) {
 /* destroy the task */
@@ -173,8 +175,6 @@ uint32_t avnd_mon_req_del(AVND_CB *cb, SaUint64T pid) {
 }
   }
 
-  m_NCS_UNLOCK(>mon_lock, NCS_LOCK_WRITE);
-
   return rc;
 }
 
--
1.9.1

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] imm: improve cascade delete [#2667]

2018-03-29 Thread Ravi Sekhar Reddy Konda

Hi Vu,
 
 Ack, tested the functionality
 coding wise, I don't have any new comments other then what Hans has given 
please address them before pushing 
 one generic comment, you added lot of new routines, please add function 
headers for them 
 
 Thanks,
 Ravi

-Original Message-
From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] 
Sent: Monday, March 12, 2018 8:26 AM
To: ravisekhar.ko...@oracle.com; hans.nordeb...@ericsson.com; 
zoran.milinko...@ericsson.com; anders.wid...@ericsson.com; 
lennart.l...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: [PATCH 1/1] imm: improve cascade delete [#2667]

When an object is deleted, and the object has children, the delete meesage is 
sent for each deleted object to PBE.  Since there are a lot of messages in the 
cascade delete from IMMND to PBE at once, there is a limitation that the 
cascade delete should not be done on object that contains more than 1 
object.  More than 1 object may cause buffer overload (e.g. 
TIPC_ERR_OVERLOAD), and messages might be lost.

The improvement should send only one message to PBE which will contain only the 
root object. The rest of cascade delete will be on PBE side.
---
 src/imm/common/immpbe_dump.cc | 548 ++
 src/imm/immnd/ImmModel.cc |  18 +-
 src/imm/immnd/ImmModel.h  |   3 +-
 src/imm/immnd/immnd_evt.c | 210 
 4 files changed, 508 insertions(+), 271 deletions(-)

diff --git a/src/imm/common/immpbe_dump.cc b/src/imm/common/immpbe_dump.cc 
index 11af674..17d8eb9 100644
--- a/src/imm/common/immpbe_dump.cc
+++ b/src/imm/common/immpbe_dump.cc
@@ -33,6 +33,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include "base/osaf_extended_name.h"
@@ -146,7 +148,24 @@ static const char *preparedSql[] = {
 
 static sqlite3_stmt *preparedStmt[SQL_STMT_SIZE] = {NULL};
 
-static int prepareSqlStatements(sqlite3 *dbHandle) {
+typedef struct {
+  unsigned obj_id;
+  unsigned class_id;
+  char *dn;
+} ObjectInfo;
+
+typedef std::set ObjectSet; typedef std::map ReverseDnMap; typedef std::map 
+ClassNameMap;
+
+// Used for collecting objects and classes typedef std::map ClassInstanceMap;
+
+static ObjectSet sObjectSet;
+static ReverseDnMap sReverseDnMap;
+static ClassNameMap sClassNameMap;
+
+static bool prepareSqlStatements(sqlite3 *dbHandle) {
   int i;
   int rc;
 
@@ -155,11 +174,11 @@ static int prepareSqlStatements(sqlite3 *dbHandle) {
 NULL);
 if (rc != SQLITE_OK) {
   LOG_ER("Failed to prepare SQL statement for: %s", preparedSql[i]);
-  return -1;
+  return false;
 }
   }
 
-  return 0;
+  return true;
 }
 
 int finalizeSqlStatement(void *stmt) {
@@ -546,6 +565,327 @@ void pbeAtomicSwitchFile(const char *filePath, 
std::string localTmpFilename) {
   }
 }
 
+static std::string reverseDn(std::string& input) {
+/* reverseDn() has been copied from ReverseDn() in imm_xmlw_dump.cc */
+  std::string result = "";
+  size_t start_cut = 0;
+  size_t comma_pos = 0;
+
+  do {
+size_t start_search = start_cut;
+while ((comma_pos = input.find(",", start_search)) ==
+   input.find("\\,", start_search) + 1)
+  start_search = input.find(",", start_search) +
+ 1; /* Skip the "\," by shifting start position*/
+
+/* Insert RDN to the begin of the result */
+if (!result.empty()) result.insert(0, ",");
+result.insert(0, input, start_cut, comma_pos - start_cut);
+
+/* Next RDN */
+start_cut = comma_pos + 1;
+  } while (comma_pos != std::string::npos);
+
+  return result;
+}
+
+static void reverseAndInsertDn(std::string ,
+   unsigned obj_id,
+   unsigned class_id) {
+  std::string revdn;
+  ObjectInfo *info;
+
+  revdn = reverseDn(dn);
+
+  info = (ObjectInfo *)malloc(sizeof(ObjectInfo));  info->obj_id = 
+ obj_id;  info->class_id = class_id;  info->dn = strdup(dn.c_str());
+
+  sObjectSet.insert(info);
+  sReverseDnMap[revdn] = info;
+}
+
+static ObjectInfo *findObjectInfo(std::string ) {
+  std::string revDn = reverseDn(dn);
+  auto obj = sReverseDnMap.find(revDn);
+  return (obj != sReverseDnMap.end()) ? obj->second : NULL; }
+
+static bool prepareLocalData(sqlite3 *dbHandle) {
+  const char *classSql = "SELECT class_id, class_name FROM classes";
+  const char *objSql = "SELECT dn, obj_id, class_id FROM objects";
+  sqlite3_stmt *stmt = NULL;
+  int rc;
+  bool ret = false;
+  unsigned obj_id;
+  unsigned class_id;
+  char *class_name;
+  std::string dn;
+  int count = 0;
+
+  TRACE_ENTER();
+
+  rc = sqlite3_prepare_v2(dbHandle, classSql, -1, , NULL);  if (rc 
+ != SQLITE_OK) {
+LOG_ER("Failed to prepare SQL statement for: %s", classSql);
+goto failed;
+  }
+
+  while((rc = sqlite3_step(stmt)) == SQLITE_ROW) {
+class_id = (unsigned int)sqlite3_column_int(stmt, 0);

Re: [devel] [PATCH 1/1] dtm: Fix the osaflog --flush command, and revert osaflog protocol [#2812]

2018-03-26 Thread Ravi Sekhar Reddy Konda

Hi Anders,

Ack, code review only

Regards,
Ravi

-Original Message-
From: Anders Widell [mailto:anders.wid...@ericsson.com] 
Sent: Monday, March 19, 2018 8:38 PM
To: Ravi Sekhar Reddy Konda <ravisekhar.ko...@oracle.com>
Cc: opensaf-devel@lists.sourceforge.net; Anders Widell 
<anders.wid...@ericsson.com>
Subject: [PATCH 1/1] dtm: Fix the osaflog --flush command, and revert osaflog 
protocol [#2812]

Fix the remaining review comment for ticket [#2731]: revert back to a 
text-based protocol between osaflog command and osaftransportd. Also fix the 
osaflog --flush command, that stopped working after ticket [#2731].
---
 src/dtm/common/osaflog_protocol.h |  7 -
 src/dtm/tools/osaflog.cc  | 55 ++--
 src/dtm/transport/log_server.cc   | 59 ---
 src/dtm/transport/log_server.h|  3 +-
 4 files changed, 47 insertions(+), 77 deletions(-)

diff --git a/src/dtm/common/osaflog_protocol.h 
b/src/dtm/common/osaflog_protocol.h
index db914e00a..61e9f6f39 100644
--- a/src/dtm/common/osaflog_protocol.h
+++ b/src/dtm/common/osaflog_protocol.h
@@ -24,13 +24,6 @@
 
 namespace Osaflog {
 
-enum Command { kFlush, kMaxbackups, kMaxfilesize, kFailure }; -struct Message {
-char marker[4];
-Command  command; // Command Enum
-size_t   value;   // Value based on the command
-};
-
 static constexpr const char* kServerSocketPath =
 PKGLOCALSTATEDIR "/osaf_log.sock";
 
diff --git a/src/dtm/tools/osaflog.cc b/src/dtm/tools/osaflog.cc index 
cf1e6b43c..1de0a85d6 100644
--- a/src/dtm/tools/osaflog.cc
+++ b/src/dtm/tools/osaflog.cc
@@ -158,9 +158,9 @@ void PrintUsage(const char* program_name) {
   program_name);
 }
 
-
-bool SendCommand(Osaflog::Message message,
- Osaflog::Command command) {
+bool SendCommand(const std::string& command) {
+  std::string request{std::string{"?"} + command};
+  std::string expected_reply{std::string{"!"} + command};
   auto sock = std::unique_ptr(CreateSocket());
 
   if (!sock) {
@@ -172,13 +172,12 @@ bool SendCommand(Osaflog::Message message,
   socklen_t addrlen = base::UnixSocket::SetAddress(Osaflog::kServerSocketPath,
_addr);
 
-  ssize_t result = sock->SendTo(, sizeof(message),
+  ssize_t result = sock->SendTo(request.data(), request.size(),
 _addr, addrlen);
   if (result < 0) {
 perror("Failed to send message to osaftransportd");
 return false;
-  } else if (static_cast(result) !=
- (sizeof(Osaflog::Message))) {
+  } else if (static_cast(result) != request.size()) {
 fprintf(stderr, "Failed to send message to osaftransportd\n");
 return false;
   }
@@ -214,50 +213,26 @@ bool SendCommand(Osaflog::Message message,
   if (result < 0) {
 perror("Failed to receive reply from osaftransportd");
 return false;
-  } else if (static_cast(result) !=
-   (sizeof(Osaflog::Message) )) {
-Osaflog::Message result_message;
-memset(_message, 0, sizeof(result_message));
-memcpy(_message, buf, result);
-if (result_message.command != command) {
-   fprintf(stderr, "Received unexpected reply from osaftransportd\n");
-   return false;
-}
+  } else if (static_cast(result) != expected_reply.size() ||
+ memcmp(buf, expected_reply.data(), result) != 0) {
+fprintf(stderr, "ERROR: osaftransportf replied '%s'\n",
+std::string{buf, static_cast(result)}.c_str());
+return false;
   }
   return true;
 }
 
 bool MaxTraceFileSize(size_t max_file_size) {
-  Osaflog::Message message;
-
-  memset(, 0, sizeof(message));
-  message.marker[0] = '?';
-  message.command = Osaflog::kMaxfilesize;
-  message.value = max_file_size;
-
-  return SendCommand(message, Osaflog::kMaxfilesize);
+  return SendCommand(std::string{"max-file-size "} +
+ std::to_string(max_file_size));
 }
 
-bool NoOfBackupFiles(size_t number_of_files) {
-  Osaflog::Message message;
-
-  memset(, 0, sizeof(message));
-  message.marker[0] = '?';
-  message.command = Osaflog::kMaxbackups;
-  message.value = number_of_files;
-
-  return SendCommand(message, Osaflog::kMaxbackups);
+bool NoOfBackupFiles(size_t max_backups) {
+  return SendCommand(std::string{"max-backups "} + 
+std::to_string(max_backups));
 }
 
 bool Flush() {
-  Osaflog::Message message;
-
-  memset(, 0, sizeof(message));
-  message.marker[0] = '?';
-  message.command = Osaflog::kFlush;
-  message.value = 0;
-
-  return SendCommand(message, Osaflog::kFlush);
+  return SendCommand(std::string{"flush"});
 }
 
 base::UnixServerSocket* CreateSocket() { diff --git 
a/src/dtm/transport/log_server.cc b/src/dtm/transport/log_server.cc index 
44fbe140a..76519cf35 100644
--- a/src/dtm/tra

Re: [devel] [PATCH 1/1] imm: fix race-condition in imm agent [#2810]

2018-03-20 Thread Ravi Sekhar Reddy Konda

Hi Vu,

Ack, code review only not tested

Thanks,
Ravi

-Original Message-
From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] 
Sent: Friday, March 16, 2018 2:51 PM
To: ravisekhar.ko...@oracle.com; hans.nordeb...@ericsson.com; 
zoran.milinko...@ericsson.com; anders.wid...@ericsson.com; 
lennart.l...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: [PATCH 1/1] imm: fix race-condition in imm agent [#2810]

IMM application gets coredump during upgrade due to failed assertion in IMMA 
library.

There was a race condition b/w IMMA internal thread (MDS thread) and IMM 
application thread (IMM dispatching). When a CCB was aborted from IMMND side, 
IMMA internal thread (MDS) did some changes on the event 
IMMA_EVT_ND2A_OI_CCB_ABORT_UC. If IMM application was dispatching on any IMM 
event during that time, race-condition on ccb record database could happen.

This patch adds lock/unlock to ensure no race happen and remove the assertion 
on `isAborted` at the nearly end of saImmOiAugmentCcbInitialize's work.
---
 src/imm/agent/imma_db.cc | 14 +++---
 src/imm/agent/imma_oi_api.cc | 12 ++--
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/src/imm/agent/imma_db.cc b/src/imm/agent/imma_db.cc index 
d0e3682..071edbe 100644
--- a/src/imm/agent/imma_db.cc
+++ b/src/imm/agent/imma_db.cc
@@ -533,11 +533,19 @@ void imma_oi_ccb_record_augment(IMMA_CLIENT_NODE 
*cl_node, SaImmOiCcbIdT ccbId,
 SaImmHandleT privateOmHandle,
 SaImmAdminOwnerHandleT privateAoHandle) {
   TRACE_ENTER();
-  struct imma_oi_ccb_record *tmp = imma_oi_ccb_record_find(cl_node, ccbId);
 
-  osafassert(tmp && tmp->isCcbAugOk);
+  struct imma_oi_ccb_record *tmp = imma_oi_ccb_record_find(cl_node, 
+ ccbId);  osafassert(tmp);  // Perform saImmOiAugmentCcbInitialize() on 
+ just-aborted CCB,  // but since most of OiAugmentCcbInitialize's work 
+ have been finished,  // we don't interrupt the on-going work such as 
+ change error code, finalize  // private om handle, etc, they will be 
+ handled when IMM app dispatches  // CCB abort callback event.
+  if (tmp->isAborted) {
+TRACE_1("Abort upcall received by mds thread on this CCB 0x%llx", 
+ ccbId);  }
 
-  osafassert(!(tmp->isAborted));
+  osafassert(tmp->isCcbAugOk);
 
   osafassert(!(tmp->isCritical));
 
diff --git a/src/imm/agent/imma_oi_api.cc b/src/imm/agent/imma_oi_api.cc index 
28cea8a..29fb39d 100644
--- a/src/imm/agent/imma_oi_api.cc
+++ b/src/imm/agent/imma_oi_api.cc
@@ -4077,12 +4077,20 @@ done:
   }
 
   if (rc == SA_AIS_OK || rc == SA_AIS_ERR_TRY_AGAIN) {
-/* mark oi_ccb_record with privateOmHandle to avoid repeated open/close
+/* Mark oi_ccb_record with privateOmHandle to avoid repeated 
+ open/close
of private-om-handle for each try again or each ccb op. The handle
is closed when the ccb is terminated (apply-uc or abort-uc).
- */
+
+   And the CCB record could be changed in MDS thread if CCB is aborted.
+   Lock/unlock is here to ensure no race-condition b/w the MDS thread
+   and IMM application thread. */
+m_NCS_LOCK(>cb_lock, NCS_LOCK_WRITE);
+
 imma_oi_ccb_record_augment(cl_node, ccbId, privateOmHandle,
privateAoHandle);
+
+m_NCS_UNLOCK(>cb_lock, NCS_LOCK_WRITE);
+
 if (privateAoHandle) {
   *ownerHandle = privateAoHandle;
 }
--
1.9.1

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfd: Trigger dependent SI assignment if currActiveAssignment is less than preferred active assignment [#2803]

2018-03-18 Thread Ravi Sekhar Reddy Konda

Hi Minh,

Ack, reviewed & tested

Thanks,
Ravi

-Original Message-
From: Minh Chau [mailto:minh.c...@dektech.com.au] 
Sent: Wednesday, March 14, 2018 5:23 AM
To: hans.nordeb...@ericsson.com; ravisekhar.ko...@oracle.com; 
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
Subject: [PATCH 1/1] amfd: Trigger dependent SI assignment if 
currActiveAssignment is less than preferred active assignment [#2803]

In SI dependency configuration that set NwayActive SI as dependent SI, which is 
assigned to all SUs hosted on all nodes. After stop and restart SCs, the 
NwayActive SI becomes PARTIALLY_ASSIGNED.

The reason of PARTIALLY_ASSIGNED SI is that the SI currently is not assigned in 
SC nodes. This patch triggers assignment for dependent SI if the SI has not had 
enough preferred active assignment.

Please note that the additional case in this patch only hits if the SC absence 
feature is enabled. In normal cluster, the dependency state should firstly go 
from READ_TO_ASSIGN and the SG procedure will create active assignments up to 
the preferred number.
---
 src/amf/amfd/si_dep.cc | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfd/si_dep.cc b/src/amf/amfd/si_dep.cc index 
a4ccbe7..f63b1b0 100644
--- a/src/amf/amfd/si_dep.cc
+++ b/src/amf/amfd/si_dep.cc
@@ -799,7 +799,10 @@ void avd_sidep_assign_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
   } else {
 /*Check sponsors state once agian then take action*/
 sidep_update_si_self_dep_state(dep_si);
-if (dep_si->si_dep_state == AVD_SI_READY_TO_ASSIGN) {
+if (dep_si->si_dep_state == AVD_SI_READY_TO_ASSIGN ||
+(dep_si->si_dep_state == AVD_SI_ASSIGNED &&
+dep_si->saAmfSINumCurrActiveAssignments <
+dep_si->pref_active_assignments())) {
   if ((sidep_sg_red_si_process_assignment(avd_cb, dep_si) ==
NCSCC_RC_FAILURE) &&
   (dep_si->num_dependents != 0)) { @@ -980,6 +983,10 @@ void 
sidep_take_action_on_dependents(AVD_SI *si) {
   sidep_process_ready_to_unassign_depstate(dep_si);
 } else if (dep_si->si_dep_state == AVD_SI_READY_TO_ASSIGN) {
   sidep_si_dep_state_evt_send(avd_cb, dep_si, AVD_EVT_ASSIGN_SI_DEP_STATE);
+} else if (dep_si->si_dep_state == AVD_SI_ASSIGNED &&
+si->sg_of_si->sg_fsm_state == AVD_SG_FSM_STABLE &&
+si->saAmfSINumCurrActiveAssignments < si->pref_active_assignments()) {
+  sidep_si_dep_state_evt_send(avd_cb, dep_si, 
+ AVD_EVT_ASSIGN_SI_DEP_STATE);
 }
   }
 
--
2.7.4

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amf: do not dereference null pointer [#2791]

2018-03-18 Thread Ravi Sekhar Reddy Konda

Fine Gary, my only worry is this is a crucial MDS issue which might get un 
noticed.
Please raise a MDS ticket referring to this ticket along with MDS logs, we will 
try to look into it.

>From AMF side, I am fine to have sanity check so ACK for the patch 

Thanks,
Ravi

-Original Message-
From: Gary Lee [mailto:gary@dektech.com.au] 
Sent: Friday, March 16, 2018 6:24 PM
To: Ravi Sekhar Reddy Konda <ravisekhar.ko...@oracle.com>
Cc: hans.nordeb...@ericsson.com; minh.c...@dektech.com.au; 
opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] amf: do not dereference null pointer [#2791]

Hi Ravi

Yes, I'm not very familiar with the mds code so I haven't fixed it there.
Should we have this sanity check in AMF anyway?

Thanks
Gary

> On 16 Mar 2018, at 11:43 pm, Ravi Sekhar Reddy Konda 
> <ravisekhar.ko...@oracle.com> wrote:
> 
> Hi Gary,
> 
> The only case I see where MDS can return NCSCC_RC_SUCCESS and still 
> sndrsp.o_rsp is NULL is in the case of Timeouts.
> In this case the fix might avoid the core, but the core problem will 
> still be there and it might affect other flows or services also I 
> think the better solution is to return NCSCC_RC_REQ_TIMOUT from the 
> MDS and let the Application handle it
> 
> Thanks,
> Ravi
> 
> -Original Message-
> From: Gary Lee [mailto:gary@dektech.com.au]
> Sent: Thursday, March 01, 2018 11:02 AM
> To: hans.nordeb...@ericsson.com; ravisekhar.ko...@oracle.com; 
> minh.c...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net; Gary Lee 
> <gary@dektech.com.au>
> Subject: [PATCH 1/1] amf: do not dereference null pointer [#2791]
> 
> Callers of ava_mds_send() assume *o_msg is not null, if the return code is 
> NCSCC_RC_SUCCESS.
> ---
> src/amf/agent/ava_mds.cc | 4 
> 1 file changed, 4 insertions(+)
> 
> diff --git a/src/amf/agent/ava_mds.cc b/src/amf/agent/ava_mds.cc index 
> 440885332..cd139365d 100644
> --- a/src/amf/agent/ava_mds.cc
> +++ b/src/amf/agent/ava_mds.cc
> @@ -378,6 +378,10 @@ uint32_t ava_mds_send(AVA_CB *cb, AVSV_NDA_AVA_MSG 
> *i_msg,
>   /* retrieve the response */
>   *o_msg = (AVSV_NDA_AVA_MSG *)mds_info.info.svc_send.info.sndrsp.o_rsp;
>   mds_info.info.svc_send.info.sndrsp.o_rsp = 0;
> +  if (*o_msg == nullptr) {
> +LOG_ER("No response received");
> +rc = NCSCC_RC_FAILURE;
> +  }
> }
>   } else
> /* just a 'normal' send */
> --
> 2.14.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] imm: coredump during scale-in on large configuration [#2794]

2018-03-15 Thread Ravi Sekhar Reddy Konda

Hi Vu,

Ack for the patch, code review only

Thanks,
Ravi
-Original Message-
From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] 
Sent: Monday, March 12, 2018 12:43 PM
To: ravisekhar.ko...@oracle.com; hans.nordeb...@ericsson.com; 
zoran.milinko...@ericsson.com; anders.wid...@ericsson.com; 
lennart.l...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: [PATCH 1/1] imm: coredump during scale-in on large configuration 
[#2794]

When IMMND restarts (e.g: OUT OF ORDER detection), it may get message from 
active IMMD which is originated from just-dead IMMND process.
In such case, we are in confused situation - messages come from local IMMND, 
but not me (reply_dest != cb->immnd_mdest_id)!

This patch discards such messages, notify the case to syslog instead of 
aborting the IMMND progress.
---
 src/imm/immnd/immnd_evt.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c index 
228b7dd..43611a3 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -10766,6 +10766,18 @@ static uint32_t immnd_evt_proc_fevs_rcv(IMMND_CB *cb, 
IMMND_EVT *evt,
(m_IMMSV_UNPACK_HANDLE_LOW(clnt_hdl) == cb->node_id);
 
if (originatedAtThisNd) {
+   /* Get the message comes from local IMMND but not me
+  (cb->immnd_mdest_id). Probably IMMND just restarts
+  (e.g: OUT OF ORDER detection), and this message belongs
+  to previous (dead) IMMND. So, discard this message.
+*/
+   if (reply_dest && reply_dest != cb->immnd_mdest_id) {
+   LOG_WA("DISCARD FEVS message sent by previous dead 
IMMND");
+   dequeue_outgoing(cb);
+   TRACE_LEAVE();
+   return NCSCC_RC_SUCCESS;
+   }
+
osafassert(!reply_dest || (reply_dest == cb->immnd_mdest_id) ||
   isObjSync);
if (cb->fevs_replies_pending) {
--
1.9.1

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] imm: fix unknown event type in imma_proc_free_pointers [#2779]

2018-03-13 Thread Ravi Sekhar Reddy Konda

Hi Vu,

Ack, code review only

Regards,
Ravi

-Original Message-
From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] 
Sent: Tuesday, March 13, 2018 12:50 PM
To: ravisekhar.ko...@oracle.com; hans.nordeb...@ericsson.com; 
zoran.milinko...@ericsson.com; anders.wid...@ericsson.com; 
lennart.l...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: [PATCH 1/1] imm: fix unknown event type in imma_proc_free_pointers 
[#2779]

The message type IMMA_EVT_ND2A_PROC_STALE_CLIENTS was introduced in IMM, but 
missed mention it in `void imma_proc_free_pointers()`.
---
 src/imm/agent/imma_proc.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/imm/agent/imma_proc.cc b/src/imm/agent/imma_proc.cc index 
886b50c..ec62b98 100644
--- a/src/imm/agent/imma_proc.cc
+++ b/src/imm/agent/imma_proc.cc
@@ -1401,6 +1401,9 @@ void imma_proc_free_pointers(IMMA_CB *cb, IMMA_EVT *evt) {
 case IMMA_EVT_ND2A_OI_CCB_ABORT_UC:
   break;
 
+case IMMA_EVT_ND2A_PROC_STALE_CLIENTS:
+  break;
+
 default:
   TRACE_4("Unknown event type %u", evt->type);
   break;
--
1.9.1

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] osaf: add example config for etcd [#2784]

2018-03-11 Thread Ravi Sekhar Reddy Konda

Hi Gary,

Ack, with some comments

1) The current example that you have given is for static etcd cluster 
configuration
   In the Cloud environment, the Cluster will be growing and shrinking 
dynamically, accordingly we might need to add or remove new nodes to the etcd 
cluster
   It will be better to document how to add or remove new nodes to the already 
existing etcd cluster.  

2) In general etcd cluster should be using different interface then OpenSAF 
Cluster, this has to be mentioned.  

3) Etcd has to be started before OpenSAF Cluster, Please document this

The comments 2 & 3 are not specific to etcd, they are genric for any plugin, 
you can update in the PR doc

Thanks,
Ravi

-Original Message-
From: Gary Lee [mailto:gary@dektech.com.au] 
Sent: Wednesday, March 07, 2018 9:13 AM
To: Ravi Sekhar Reddy Konda <ravisekhar.ko...@oracle.com>; 
anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net; Gary Lee <gary@dektech.com.au>
Subject: [PATCH 1/1] osaf: add example config for etcd [#2784]

---
 src/osaf/consensus/plugins/etcd.readme | 32 
 1 file changed, 32 insertions(+)
 create mode 100644 src/osaf/consensus/plugins/etcd.readme

diff --git a/src/osaf/consensus/plugins/etcd.readme 
b/src/osaf/consensus/plugins/etcd.readme
new file mode 100644
index 0..bdfa3fa19
--- /dev/null
+++ b/src/osaf/consensus/plugins/etcd.readme
@@ -0,0 +1,32 @@
+Example etcd configuration
+==
+
+This document describes how to install and configure etcd on each node 
+of an OpenSAF cluster. Note: it is also possible to run etcd outside 
+the OpenSAF cluster, or on a subset of the cluster.
+
+etcd is generally available as a binary package in Linux distributions.
+
+For example, on Ubuntu:
+
+sudo apt-get install etcd
+
+Locate etcd.conf for your distribution. For example, it is located at 
+/etc/default/etcd.conf on Ubuntu 17.10.
+
+The configuration below should help you get an initial etcd cluster 
+running on a five node cluster.
+
+ETCD_LISTEN_PEER_URLS="https://urldefense.proofpoint.com/v2/url?u=http-3A__0.0.0.0-3A2380=DwIBAg=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=xBh_3WtlS1YjXd3Bui_nVjh5qwhU2UamdAhSfqynLU4=NcTlL9cPPPsyINUV-DQMQSmFULCWBHUmnI9D5e6fpPU=ysb3ibuJwVlYWC-frO_GUqxFuBLClnVPM-p8F-ODjFQ=;
+ETCD_LISTEN_CLIENT_URLS="https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A2379=DwIBAg=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=xBh_3WtlS1YjXd3Bui_nVjh5qwhU2UamdAhSfqynLU4=NcTlL9cPPPsyINUV-DQMQSmFULCWBHUmnI9D5e6fpPU=EwKM9n3W2BWk4sFL3GfQQ4VKraDrsqr75jA-ncRaDRI=;
+ETCD_INITIAL_ADVERTISE_PEER_URLS="https://urldefense.proofpoint.com/v2/url?u=http-3A__0.0.0.0-3A2380=DwIBAg=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=xBh_3WtlS1YjXd3Bui_nVjh5qwhU2UamdAhSfqynLU4=NcTlL9cPPPsyINUV-DQMQSmFULCWBHUmnI9D5e6fpPU=ysb3ibuJwVlYWC-frO_GUqxFuBLClnVPM-p8F-ODjFQ=;
+ETCD_INITIAL_CLUSTER="SC-1=https://urldefense.proofpoint.com/v2/url?u=http-3A__x.x.x.x-3A2380-2CSC-2D2-3Dhttp-3A__x.x.x.x-3A2380-2CPL-2D3-3Dhttp-3A__x.x.x.x-3A2380-2CPL-2D4-3Dhttp-3A__x.x.x.x-3A2380-2CPL-2D5-3Dhttp-3A__x.x.x.x-3A2380=DwIBAg=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=xBh_3WtlS1YjXd3Bui_nVjh5qwhU2UamdAhSfqynLU4=NcTlL9cPPPsyINUV-DQMQSmFULCWBHUmnI9D5e6fpPU=emd3BtY0LXcD2kf3V4ZiY8YEvJkUOERGEFzGKcEdDy0=;
+ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
+ETCD_ADVERTISE_CLIENT_URLS="https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A2379=DwIBAg=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=xBh_3WtlS1YjXd3Bui_nVjh5qwhU2UamdAhSfqynLU4=NcTlL9cPPPsyINUV-DQMQSmFULCWBHUmnI9D5e6fpPU=EwKM9n3W2BWk4sFL3GfQQ4VKraDrsqr75jA-ncRaDRI=;
+
+Replace x.x.x.x with appropriate IP addresses.
+
+A sample etcd v2 plugin is provided. It assumes etcd is running locally.
+
+If you have configured etcd to run elsewhere, please add the 
+'--endpoints' option to etcdctl in the plugin.
--
2.14.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] plmd: enable dynamic tracing [#2796]

2018-03-08 Thread Ravi Sekhar Reddy Konda

Hi Alex,

 

I think that is enough, I am fine  as long as user has option to use the 
routine for debugging

 

Thanks,

Ravi

 

From: Jones, Alex [mailto:ajo...@rbbn.com] 
Sent: Thursday, March 08, 2018 6:00 PM
To: Ravi Sekhar Reddy Konda <ravisekhar.ko...@oracle.com>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] plmd: enable dynamic tracing [#2796]

 

Hi Ravi,

 

  Some of the other subsystems like MSG and LCK have the same issue, and there 
is no private USR2 code in them. I have had good success attaching gdb to the 
process and calling the dump routine from there. Then the user does not have to 
recompile.

 

  Do you feel that is enough?

 

Alex

  _  

From: Ravi Sekhar Reddy Konda mailto:ravisekhar.ko...@oracle.com"ravisekhar.ko...@oracle.com>
Sent: Thursday, March 8, 2018 12:07:14 AM
To: Jones, Alex
Cc: HYPERLINK 
"mailto:opensaf-devel@lists.sourceforge.net"opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] plmd: enable dynamic tracing [#2796] 

 

  _  

NOTICE: This email was received from an EXTERNAL sender

  _  


Hi Alex,

Ack for the patch, but have one comment

I understand using SIGUSR2 for Control Block dumping is not the generic way as 
we are using USR2 signal for dynamically enabling Traces. we added this because 
at times Traces are not enough for debugging and dump routine dumps the 
complete Control Block information, State and handlers which will be very much 
use full in debugging. 

My suggestion is instead of deleting the usr2_sig_handler and Signal initialize 
code, Comment out the code and add a comment saying that if user wants to do 
dump PLM CB for debugging they can uncomment the code

Thanks,
Ravi 
-Original Message-
From: Alex Jones [mailto:ajo...@rbbn.com] 
Sent: Thursday, March 08, 2018 2:09 AM
To: Ravi Sekhar Reddy Konda mailto:ravisekhar.ko...@oracle.com"ravisekhar.ko...@oracle.com>
Cc: HYPERLINK 
"mailto:opensaf-devel@lists.sourceforge.net"opensaf-devel@lists.sourceforge.net;
 Alex Jones mailto:ajo...@rbbn.com"ajo...@rbbn.com>
Subject: [PATCH 1/1] plmd: enable dynamic tracing [#2796]

Dynamic tracing does not work with plmd.

plmd overrides the USR2 signal with its own dump routine.

Remove the signal hander code for USR2 in plmd.
---
src/plm/plmd/plms_main.c | 20 
1 file changed, 20 deletions(-)

diff --git a/src/plm/plmd/plms_main.c b/src/plm/plmd/plms_main.c index 
23b019444..5de1f461e 100644
--- a/src/plm/plmd/plms_main.c
+++ b/src/plm/plmd/plms_main.c
@@ -70,20 +70,6 @@ static void sigusr1_handler(int sig)
ncs_sel_obj_ind(_cb->usr1_sel_obj);
}

-static void usr2_sig_handler(int sig)
-{
- PLMS_CB *cb = plms_cb;
- PLMS_EVT *evt;
- evt = (PLMS_EVT *)malloc(sizeof(PLMS_EVT));
- memset(evt, 0, sizeof(PLMS_EVT));
- evt->req_res = PLMS_REQ;
- evt->req_evt.req_type = PLMS_DUMP_CB_EVT_T;
- (void)sig;
- /* Put it in PLMS's Event Queue */
- m_NCS_IPC_SEND(>mbx, (NCSCONTEXT)evt, NCS_IPC_PRIORITY_HIGH);
- signal(SIGUSR2, usr2_sig_handler);
-}
-
/
* Name : plms_db_init
*
@@ -327,12 +313,6 @@ static uint32_t plms_init()
rc = NCSCC_RC_FAILURE;
goto done;
}
- /* Initialize a signal handler for debugging purpose */
- if ((signal(SIGUSR2, usr2_sig_handler)) == SIG_ERR) {
- LOG_ER("signal USR2 failed: %s", strerror(errno));
- rc = NCSCC_RC_FAILURE;
- goto done;
- }

if (!cb->nid_started && plms_amf_register() != NCSCC_RC_SUCCESS) {
LOG_ER("AMF Initialization failed");
--
2.13.6
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] plmd: enable dynamic tracing [#2796]

2018-03-07 Thread Ravi Sekhar Reddy Konda

Hi Alex,

Ack for the patch, but have one comment

I understand using  SIGUSR2 for Control Block dumping is not the generic way as 
we are using USR2 signal for dynamically enabling Traces.  we added this 
because at times Traces are not enough for debugging and dump routine dumps the 
complete Control Block information,  State and handlers  which will be very 
much use full in debugging. 

My suggestion is instead of deleting the usr2_sig_handler and Signal initialize 
code, Comment out the code and add a comment saying that if user wants to do 
dump PLM CB for debugging they can uncomment the code

Thanks,
Ravi 
-Original Message-
From: Alex Jones [mailto:ajo...@rbbn.com] 
Sent: Thursday, March 08, 2018 2:09 AM
To: Ravi Sekhar Reddy Konda <ravisekhar.ko...@oracle.com>
Cc: opensaf-devel@lists.sourceforge.net; Alex Jones <ajo...@rbbn.com>
Subject: [PATCH 1/1] plmd: enable dynamic tracing [#2796]

Dynamic tracing does not work with plmd.

plmd overrides the USR2 signal with its own dump routine.

Remove the signal hander code for USR2 in plmd.
---
 src/plm/plmd/plms_main.c | 20 
 1 file changed, 20 deletions(-)

diff --git a/src/plm/plmd/plms_main.c b/src/plm/plmd/plms_main.c index 
23b019444..5de1f461e 100644
--- a/src/plm/plmd/plms_main.c
+++ b/src/plm/plmd/plms_main.c
@@ -70,20 +70,6 @@ static void sigusr1_handler(int sig)
ncs_sel_obj_ind(_cb->usr1_sel_obj);
 }
 
-static void usr2_sig_handler(int sig)
-{
-   PLMS_CB *cb = plms_cb;
-   PLMS_EVT *evt;
-   evt = (PLMS_EVT *)malloc(sizeof(PLMS_EVT));
-   memset(evt, 0, sizeof(PLMS_EVT));
-   evt->req_res = PLMS_REQ;
-   evt->req_evt.req_type = PLMS_DUMP_CB_EVT_T;
-   (void)sig;
-   /* Put it in PLMS's Event Queue */
-   m_NCS_IPC_SEND(>mbx, (NCSCONTEXT)evt, NCS_IPC_PRIORITY_HIGH);
-   signal(SIGUSR2, usr2_sig_handler);
-}
-
 /
  * Name  : plms_db_init
  *
@@ -327,12 +313,6 @@ static uint32_t plms_init()
rc = NCSCC_RC_FAILURE;
goto done;
}
-   /* Initialize a signal handler for debugging purpose */
-   if ((signal(SIGUSR2, usr2_sig_handler)) == SIG_ERR) {
-   LOG_ER("signal USR2 failed: %s", strerror(errno));
-   rc = NCSCC_RC_FAILURE;
-   goto done;
-   }
 
if (!cb->nid_started && plms_amf_register() != NCSCC_RC_SUCCESS) {
LOG_ER("AMF Initialization failed");
--
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfd: Handle su_cnt_adm_opr properly in Nodegroup adm resp procesing [#2588]

2018-02-22 Thread Ravi Sekhar Reddy Konda

Hi Minh,

Can you review this patch, its simple fix only. 

Thanks,
Ravi

-Original Message-
From: ravi-sekhar [mailto:ravisekhar.ko...@oracle.com] 
Sent: Thursday, February 22, 2018 4:07 PM
To: minh.c...@dektech.com.au; hans.nordeb...@ericsson.com; 
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; ravi-sekhar 

Subject: [PATCH 1/1] amfd: Handle su_cnt_adm_opr properly in Nodegroup adm resp 
procesing [#2588]

---
 src/amf/amfd/sgproc.cc | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/amf/amfd/sgproc.cc b/src/amf/amfd/sgproc.cc index 
610c205..2bee875 100644
--- a/src/amf/amfd/sgproc.cc
+++ b/src/amf/amfd/sgproc.cc
@@ -1669,7 +1669,8 @@ void avd_su_si_assign_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
 (su->sg_of_su->ng_using_saAmfSGAdminState == true))) {
   AVD_AMF_NG *ng = su->su_on_node->admin_ng;
   // Got response from AMFND for assignments decrement su_cnt_admin_oper.
-  if ((ng != nullptr) &&
+  if (su->su_on_node->su_cnt_admin_oper >=1 ) {
+if ((ng != nullptr) &&
   (ng->admin_ng_pend_cbk.admin_oper == SA_AMF_ADMIN_SHUTDOWN) ||
   (ng->admin_ng_pend_cbk.admin_oper == SA_AMF_ADMIN_LOCK)) &&
  (su->saAmfSUNumCurrActiveSIs == 0) && @@ -1677,9 +1678,10 @@ void 
avd_su_si_assign_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
  (AVSV_SUSI_ACT_DEL ==
   n2d_msg->msg_info.n2d_su_si_assign.msg_act))) ||
(ng->admin_ng_pend_cbk.admin_oper == SA_AMF_ADMIN_UNLOCK))) {
-su->su_on_node->su_cnt_admin_oper--;
-TRACE("node:'%s', su_cnt_admin_oper:%u", su->su_on_node->name.c_str(),
+  su->su_on_node->su_cnt_admin_oper--;
+  TRACE("node:'%s', su_cnt_admin_oper:%u", 
+ su->su_on_node->name.c_str(),
   su->su_on_node->su_cnt_admin_oper);
+}
   }
   process_su_si_response_for_ng(su, SA_AIS_OK);
 } else if (su->su_any_comp_undergoing_restart_admin_op() == true) {
--
1.9.1

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfnd: remove duplicate log entry [#2783]

2018-02-19 Thread Ravi Sekhar Reddy Konda

Ack, code review only

Regards,
Ravi

- Original Message -
From: gary@dektech.com.au
To: hans.nordeb...@ericsson.com, minh.c...@dektech.com.au, 
ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net, gary@dektech.com.au
Sent: Monday, February 19, 2018 7:42:15 AM GMT +05:30 Chennai, Kolkata, Mumbai, 
New Delhi
Subject: [PATCH 1/1] amfnd: remove duplicate log entry [#2783]

---
 src/amf/amfnd/err.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/amf/amfnd/err.cc b/src/amf/amfnd/err.cc
index a0529b96c..1d6eb3757 100644
--- a/src/amf/amfnd/err.cc
+++ b/src/amf/amfnd/err.cc
@@ -458,9 +458,6 @@ uint32_t avnd_err_process(AVND_CB *cb, AVND_COMP *comp,
   // add an entry to syslog if recovery method has changed
   log_recovery_escalation(*comp, previous_esc_rcvr, esc_rcvr);
 
-  LOG_NO("'%s' faulted due to '%s' : Recovery is '%s'", comp->name.c_str(),
- g_comp_err[err_info->src], g_comp_rcvr[esc_rcvr - 1]);
-
   if (((comp->su->is_ncs == true) && (esc_rcvr != SA_AMF_COMPONENT_RESTART)) ||
   esc_rcvr == SA_AMF_NODE_FAILFAST) {
 LOG_ER("%s Faulted due to:%s Recovery is:%s", comp->name.c_str(),
@@ -478,6 +475,9 @@ uint32_t avnd_err_process(AVND_CB *cb, AVND_COMP *comp,
 }
   }
 
+  LOG_NO("'%s' faulted due to '%s' : Recovery is '%s'", comp->name.c_str(),
+ g_comp_err[err_info->src], g_comp_rcvr[esc_rcvr - 1]);
+
   /* execute the recovery */
   rc = avnd_err_recover(cb, comp->su, comp, esc_rcvr);
 
-- 
2.14.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] Review Request for doc: update overview PR for split brain prevention with consensus service [#64]

2018-02-15 Thread Ravi Sekhar Reddy Konda


Thanks Gary,

Also I asked for the sample configuration of raft cluster when raft servers are 
part of OpenSAF cluster 

A Raft(etcd) cluster should use different interface other then what OpenSAF is 
using, so it will better to document this.
In general it will be good to have sample configuration about how to farm a 
raft cluster in the README as well as PR doc. 

Thanks,
Ravi
 


- Original Message -
From: gary@dektech.com.au
To: anders.wid...@ericsson.com, ravisekhar.ko...@oracle.com, 
hans.nordeb...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Sent: Monday, February 12, 2018 2:48:15 PM GMT +05:30 Chennai, Kolkata, Mumbai, 
New Delhi
Subject: Re: Review Request for doc: update overview PR for split brain 
prevention with consensus service [#64]

Hi Ravi/Anders

AndersW> This is slightly out of scope since there are many RAFT 
implementations, but I agree it could be a good idea to provide a sample 
configuration for etcd along with the sample etcd plugin.

 I will try to provide a sample plugin for an external etcd server, and maybe a 
sample plugin for another RAFT based key-value store.

Thanks
Gary



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] Review Request for doc: update overview PR for split brain prevention with consensus service [#64]

2018-02-15 Thread Ravi Sekhar Reddy Konda

Hi Anders,

In case where Raft Cluster is outside the OpenSAF Cluster 
Then how can the quorum concept applies to OpenSAF cluster, because raft 
servers are outside the OpenSAF Cluster
they wont be able to determine which OpenSAF partition has more no of nodes 

Thanks,
Ravi

- Original Message -
From: anders.wid...@ericsson.com
To: ravisekhar.ko...@oracle.com, gary@dektech.com.au, 
hans.nordeb...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Sent: Thursday, February 8, 2018 3:21:15 PM GMT +05:30 Chennai, Kolkata, 
Mumbai, New Delhi
Subject: Re: Review Request for doc: update overview PR for split brain 
prevention with consensus service [#64]

See my comments inline, marked AndersW>

regards,

Anders Widell

On 02/08/2018 10:36 AM, Ravi Sekhar Reddy Konda wrote:
> Hi Gary,
>
> Have query regarding quorum selection when raft servers are external to the 
> OpenSAF Cluster
>
> In the document we are saying  "The consensus service uses quorum to prevent 
> state changes in network partitions that don't include more than half of the 
> nodes in the cluster"
>
> => This is possible if the raft server is installed on the OpenSAF Cluster 
> Nodes, as Raft decides which partition has more no of nodes.
> but in the case where raft servers run on external nodes outside of the 
> OpenSAF Cluster, how the quorum is decided

AndersW> If the consensus service is running on external servers then 
you need to have an appropriate number of them (probably three or five). 
Quorum is determined as the majority of these external servers, and is 
not in any way related to majority of the OpenSAF nodes. The consensus 
service will prevent split-brain within the OpenSAF cluster, but in case 
of a network partition it will not guarantee that the active system 
controller will be located in the largest partition. This situation is 
actually similar to the situation when you use TIPC for internal OpenSAF 
communication. You can have a split-brain in the TIPC network (for 
example due to misconfiguration or a bug in TIPC), but at the same time 
have full connectivity on the IP network which is used by RAFT. I think 
there were some review comments about this for ticket [#64] and I will 
write a follow-up ticket where we can address the possibility of moving 
the active system controller to a node in the largest network partition.

>
>
> => If the Raft Servers are external to OpenSAF Cluster, do we need to make 
> any configuration so that etcd client on the OpenSAF nodes
> communicates with Raft Leader
> Also it will be good if we give some details about how to install and 
> configure raft(raft servers within and external to the opensaf cluster)

AndersW> This is slightly out of scope since there are many RAFT 
implementations, but I agree it could be a good idea to provide a sample 
configuration for etcd along with the sample etcd plugin.

>
> Thanks,
> Ravi
>
> -Original Message-
> From: Gary Lee [mailto:gary@dektech.com.au]
> Sent: Friday, January 26, 2018 11:28 AM
> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell 
> <anders.wid...@ericsson.com>; Ravi Sekhar Reddy Konda 
> <ravisekhar.ko...@oracle.com>
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Review Request for doc: update overview PR for split brain 
> prevention with consensus service [#64]
>
> Hi
>
> I have updated the OpenSAF Overview PR document for ticket #64.
>
> Please have a look.
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_opensaf_tickets_-5Fdiscuss_thread_0d47d4b9_5489_attachment_OpenSAF-5FOverview-5FPR.odt=DwICaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=xBh_3WtlS1YjXd3Bui_nVjh5qwhU2UamdAhSfqynLU4=xCEIb5x0gLGfoZW5uOWz23MZa6HzmOa6Vhywz3WeIQs=RF6RsX3xhby4k4PnwA8WEXCWKg0JbFyGNgaiery9iDk=
>
> Thanks
> Gary
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] Review Request for doc: update overview PR for split brain prevention with consensus service [#64]

2018-02-08 Thread Ravi Sekhar Reddy Konda

Hi Gary,

Have query regarding quorum selection when raft servers are external to the 
OpenSAF Cluster

In the document we are saying  "The consensus service uses quorum to prevent 
state changes in network partitions that don't include more than half of the 
nodes in the cluster"

=> This is possible if the raft server is installed on the OpenSAF Cluster 
Nodes, as Raft decides which partition has more no of nodes.
but in the case where raft servers run on external nodes outside of the OpenSAF 
Cluster, how the quorum is decided


=> If the Raft Servers are external to OpenSAF Cluster, do we need to make any 
configuration so that etcd client on the OpenSAF nodes
communicates with Raft Leader
Also it will be good if we give some details about how to install and configure 
raft(raft servers within and external to the opensaf cluster)

Thanks,
Ravi

-Original Message-
From: Gary Lee [mailto:gary@dektech.com.au] 
Sent: Friday, January 26, 2018 11:28 AM
To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell 
<anders.wid...@ericsson.com>; Ravi Sekhar Reddy Konda 
<ravisekhar.ko...@oracle.com>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Review Request for doc: update overview PR for split brain prevention 
with consensus service [#64]

Hi

I have updated the OpenSAF Overview PR document for ticket #64.

Please have a look.

https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_opensaf_tickets_-5Fdiscuss_thread_0d47d4b9_5489_attachment_OpenSAF-5FOverview-5FPR.odt=DwICaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=xBh_3WtlS1YjXd3Bui_nVjh5qwhU2UamdAhSfqynLU4=xCEIb5x0gLGfoZW5uOWz23MZa6HzmOa6Vhywz3WeIQs=RF6RsX3xhby4k4PnwA8WEXCWKg0JbFyGNgaiery9iDk=

Thanks
Gary


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfnd: Discard new assignment while su is under failover [#2773]

2018-02-07 Thread Ravi Sekhar Reddy Konda

Hi Minh,

Ack, code review only

Thanks,
Ravi

-Original Message-
From: Minh Chau [mailto:minh.c...@dektech.com.au] 
Sent: Tuesday, February 06, 2018 11:45 AM
To: hans.nordeb...@ericsson.com; gary@dektech.com.au; 
ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
Subject: [PATCH 1/1] amfnd: Discard new assignment while su is under failover 
[#2773]

When two errors happen to component and both escalates to a su failover. For 
the first su failover recovery, amfd will send new assignment to su. However, 
the second error happens just before the su receives new assignment. Currently, 
amfnd creates a susi record for the new assignment of first recovery, but amfnd 
will not issue csi callback to component since component is being failed. When 
the new assignment of second recovery comes, amfnd finds that the susi record 
had been created, and not process this new assignment. At the end, component is 
not getting any csi callback in both recovery.

This patch makes amfnd not to process the first new assignment while su is 
being failed over. This case is similar to the assignment removal is being 
discarded while su is under failover.
---
 src/amf/amfnd/su.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/amf/amfnd/su.cc b/src/amf/amfnd/su.cc index 21be12b..caa5356 
100644
--- a/src/amf/amfnd/su.cc
+++ b/src/amf/amfnd/su.cc
@@ -401,6 +401,12 @@ uint32_t avnd_evt_avd_info_su_si_assign_evh(AVND_CB *cb, 
AVND_EVT *evt) {
   cb->rcv_msg_id = info->msg_id;
 
   if (info->msg_act == AVSV_SUSI_ACT_ASGN) {
+if (sufailover_in_progress(su) || sufailover_during_nodeswitchover(su) ||
+ cb->term_state == AVND_TERM_STATE_NODE_FAILOVER_TERMINATING){
+  TRACE_2("Discarding new assignment for '%s', flag:%x",
+  su->name.c_str(), su->flag);
+  goto done;
+}
 /* SI rank and CSI capability (originally from SaAmfCtCsType)
  * was introduced in version 5 of the node director supported protocol.
  * If the protocol is older, take action */
--
2.7.4

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 0/5] Review Request for Add support for split brain prevention V2 [#64]

2018-01-26 Thread Ravi Sekhar Reddy Konda

Hi Gary,

 Thanks for sharing test cases 
  I am going through Raft Algorithm and etcd implementation, I need some time 
to review.
   will get back to you once done.
   
Thanks,
Ravi

-Original Message-
From: Gary Lee [mailto:gary@dektech.com.au] 
Sent: Thursday, January 25, 2018 12:26 PM
To: Ravi Sekhar Reddy Konda <ravisekhar.ko...@oracle.com>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 0/5] Review Request for Add support for split brain 
prevention V2 [#64]

Hi Ravi

The test cases are basically from Anders' design proposal. Eg.

- election clashes
- 2N si-swap
- isolating the network of the active controller, restore network before the 
node is up
- isolating the network of the standby controller, restore network before the 
node is up
- isolating the network of the active controller for longer durations
- isolating the network of the standby controller for longer durations
- above repeated with fencing enabled/disabled
- normal failover

The main constraint is we currently don't ensure the active controller comes 
from the larger network partition.
If the key-value store is available during a split brain, there could be 
improvements we could do in this regard.
We will address this in a future release.

Gary

On 25/1/18, 4:04 pm, "Ravi Sekhar Reddy Konda" <ravisekhar.ko...@oracle.com> 
wrote:

Hi Gary,

I started reviewing this patch series
Can you please list the test cases that you have considered for testing 
this( both with and without enabling Remote fencing)
Also please let us know if you  see any design constraints with the 
approach 

Thanks,
Ravi

-Original Message-
From: Gary Lee [mailto:gary@dektech.com.au] 
Sent: Friday, January 19, 2018 5:09 PM
To: hans.nordeb...@ericsson.com; anders.wid...@ericsson.com; 
quyen@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 0/5] Review Request for Add support for split brain 
prevention V2 [#64]

Summary: Add support for split brain prevention V2 [#64] Review request for 
Ticket(s): 64 Peer Reviewer(s): Anders, Hans Pull request to: *** LIST THE 
PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development 
branch: ticket-64 Base revision: e1e0d2c0dc45a5ca7789f19d58dde0a41ed19354
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsy 
 Build systemy 
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy 
 OpenSAF servicesy 
 Core libraries  n 
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

Changes from V1:

* fixed most cppcheck/cpplint errors in osaf/consensus
* disable self-fencing if remote-fencing is enabled
* reboot active controller if it loses quorum (write access)
* better error handling

revision 7ab1280243311058a6848c4da2b9738ab73dc861
Author: Gary Lee <gary@dektech.com.au>
Date:   Fri, 19 Jan 2018 22:29:42 +1100

doc: update README and makefiles [#64]



revision 625928304450399548c353473dea631a44aeecbe
Author: Gary Lee <gary@dektech.com.au>
Date:   Fri, 19 Jan 2018 22:28:59 +1100

fmd: update consensus service during controller failover [#64]



revision 42539da74893d5ce246242cf8d33c7875ea50fe8
Author: Gary Lee <gary@dektech.com.au>
Date:   Fri, 19 Jan 2018 22:26:19 +1100

amfd: update consensus service when performing SI swap [#64]

When a node goes down and split-brain prevention is enabled, check that we 
still have write access to the consensus service.
If not and fencing is disabled, reboot the node to prevent split brain.



revision b6fcd4bede291ba5996b838c3fb784842648581e
Author: Gary Lee <gary@dektech.com.au>
Date:   Fri, 19 Jan 2018 22:23:14 +1100

rded: add split brain prevention support [#64]

* consult with consensus service before promoting node to active
* add watch thread and self-fence if it detects active controller
  has been changed (if remote fencing is disabled)



revision 656b670a91a10e385604c98366239a28cde925f7
Author: Gary Lee <gary@dektech.com.au>
Date:   Fri, 19 Jan 2018 22:22:53 +1100

osaf: add consensus API [#64]



Added Files:

 src/osaf/consensus/Makefile
 src/osaf/consensus/keyvalue.cc
 src/osaf/consensus/keyvalue.h
 src/osaf/consensus/plugins/etcd.plug

Re: [devel] [PATCH 1/1] imm: fix wrong printouts and incorrect behavior of immadm/immcfg [#2751]

2018-01-24 Thread Ravi Sekhar Reddy Konda

Hi Vu,

Ack for the patch

Regards,
Ravi

-Original Message-
From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] 
Sent: Wednesday, January 24, 2018 7:32 PM
To: zoran.milinko...@ericsson.com; ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: [PATCH 1/1] imm: fix wrong printouts and incorrect behavior of 
immadm/immcfg [#2751]

Fix wrong printouts and incorrect behavior of immadm/immcfg.
Refer to the ticket #2751 for more info.
---
 src/imm/tools/imm_admin.c | 4 ++--
 src/imm/tools/imm_cfg.c   | 7 ---
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/imm/tools/imm_admin.c b/src/imm/tools/imm_admin.c index 
040df28..db026f9 100644
--- a/src/imm/tools/imm_admin.c
+++ b/src/imm/tools/imm_admin.c
@@ -318,7 +318,7 @@ int main(int argc, char *argv[])
if (operationId != -1) {
fprintf(
stderr,
-   "Cannot set admin operation more then 
once");
+   "Cannot set admin operation more then 
once\n");
exit(EXIT_FAILURE);
}
operationId = strtoll(optarg, (char **)NULL, 10); @@ 
-332,7 +332,7 @@ int main(int argc, char *argv[])
if (operationId != -1) {
fprintf(
stderr,
-   "Cannot set admin operation more then 
once");
+   "Cannot set admin operation more then 
once\n");
exit(EXIT_FAILURE);
}
operationId = SA_IMM_PARAM_ADMOP_ID_ESC; diff --git 
a/src/imm/tools/imm_cfg.c b/src/imm/tools/imm_cfg.c index 4573063..73c5c2e 
100644
--- a/src/imm/tools/imm_cfg.c
+++ b/src/imm/tools/imm_cfg.c
@@ -403,7 +403,7 @@ new_attr_mod(const SaNameT *objectName, char *nameval, 
SaImmAttrFlagsT *flags)
error = get_attrValueType(attrDefinitions, name,
  >modAttr.attrValueType, flags);
if (error == SA_AIS_ERR_NOT_EXIST) {
-   fprintf(stderr, "Class '%s' does not exist\n", className);
+   fprintf(stderr, "Attribute '%s' does not exist\n", name);
res = -1;
goto done;
}
@@ -661,8 +661,8 @@ int object_create(const SaNameT **objectNames, const 
SaImmClassNameT className,
stderr,
"error - 
saImmOmAdminOwnerSet FAILED: %s\n",
saf_error(error));
-   goto done;
}
+   goto done;
}
}
}
@@ -2069,7 +2069,8 @@ static int imm_operation(int argc, char *argv[])
}
 
if (!transaction_mode) {
-   if (ccbHandle != -1) {
+   /* Don't apply the CCB if there is any error during CCB 
preparation */
+   if (ccbHandle != -1 && rc == 0) {
if ((error = immutil_saImmOmCcbApply(ccbHandle)) !=
SA_AIS_OK) {
if (error == SA_AIS_ERR_TIMEOUT)
--
1.9.1

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 0/6] Review Request for dtm: Derive Node ID from IPv4 address [#2758]

2018-01-19 Thread Ravi Sekhar Reddy Konda

Hi Anders,

Ack, reviewed & tested (combinations also)
Update the README(if you are pushing #2759 immediately, you can do after #2759) 
one minor comment, in generate_nodeid you hardcoded chassis_id & subslot_id
 +  CHASSIS_ID=2
+   SUBSLOT_ID=15
 
I understand we are not taking them as config values, but better to define them 
in #defines

Thanks,
Ravi

-Original Message-
From: Anders Widell [mailto:anders.wid...@ericsson.com] 
Sent: Friday, January 12, 2018 5:54 PM
To: Ravi Sekhar Reddy Konda <ravisekhar.ko...@oracle.com>
Cc: opensaf-devel@lists.sourceforge.net; Anders Widell 
<anders.wid...@ericsson.com>
Subject: [PATCH 0/6] Review Request for dtm: Derive Node ID from IPv4 address 
[#2758]

Summary: dtm: Derive Node ID from IPv4 address [#2758] Review request for 
Ticket(s): 2758 Peer Reviewer(s): Ravi Pull request to: 
Affected branch(es): develop
Development branch: ticket-2758
Base revision: 34a070372ff7cfe3caae7ec4e11a6681e19cdf31
Personal repository: git://git.code.sf.net/u/anders-w/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts y
 SAF servicesy
 OpenSAF servicesy
 Core libraries  y
 Samples n
 Tests   y
 Other   n


Comments (indicate scope for each "y" above):
-

revision c9161898793a57ce1db18be5343b9ea2a372271e
Author: Anders Widell <anders.wid...@ericsson.com>
Date:   Fri, 12 Jan 2018 13:08:48 +0100

nid: Make chassis_id, slot_id and subslot_id in /etc/opensaf optional [#2758]

The files chassis_id, slot_id and subslot_id in /etc/opensaf no longer have to 
be present. When they are missing, OpenSAF will derive the Node ID from the 
TIPC address or the IPv4 address of the node.



revision 2500da3505f42aec23cc51eb0ec58091dadb14ea
Author: Anders Widell <anders.wid...@ericsson.com>
Date:   Fri, 12 Jan 2018 13:08:48 +0100

msg: Allow any unsigned 32-bit value to be used as Node ID [#2758]



revision bc8ef895e4eb480416ab841d8c4ce1bf152fad82
Author: Anders Widell <anders.wid...@ericsson.com>
Date:   Fri, 12 Jan 2018 13:08:48 +0100

lck: Allow any unsigned 32-bit value to be used as Node ID [#2758]



revision 629672485cba6356c73ca31d3eea5d475e1b4cf8
Author: Anders Widell <anders.wid...@ericsson.com>
Date:   Fri, 12 Jan 2018 13:08:48 +0100

clm: Allow any unsigned 32-bit value to be used as Node ID [#2758]

Also fix the CLM API tests so that they longer assume that there is a node with 
node ID 0x2010f in the cluster.



revision c10a688016d950ab16d09e08c63519de1c56e449
Author: Anders Widell <anders.wid...@ericsson.com>
Date:   Fri, 12 Jan 2018 13:08:48 +0100

ckpt: Allow any unsigned 32-bit value to be used as Node ID [#2758]



revision 8b56c8eec0da20b95c46772b682b23413e666dfd
Author: Anders Widell <anders.wid...@ericsson.com>
Date:   Fri, 12 Jan 2018 13:08:48 +0100

dtm: Derive Node ID from IPv4 address [#2758]

If the /var/lib/opensaf/node_id file doesn't exist when DTM starts, DTM will 
create the file and use the IPv4 address as node ID. When using IPv6, the file 
must still be configured manually. IPv6 support may be added in a future ticket.



Complete diffstat:
--
 src/ckpt/apitest/test_cpsv.h|  2 +-
 src/ckpt/ckptnd/cpnd_res.c  | 18 ++---
 src/ckpt/common/cpsv_evt.c  |  2 +-
 src/clm/apitest/clmtest.cc  |  6 ++
 src/clm/apitest/tet_saClmClusterNodeGet.cc  | 23 +++---
 src/clm/apitest/tet_saClmClusterNodeGetAsync.cc | 15 ++--
 src/clm/apitest/tet_saClmClusterTrack.cc|  2 +-
 src/clm/apitest/tet_saClmClusterTrackStop.cc|  4 +-
 src/clm/apitest/tet_saClmDispatch.cc|  6 +-
 src/clm/apitest/tet_saClmResponse.cc|  2 +-
 src/dtm/dtmnd/dtm.h |  1 +
 src/dtm/dtmnd/dtm_main.cc   | 98 -
 src/dtm/dtmnd/dtm_node_db.cc|  7 ++
 src/dtm/dtmnd/dtm_read_config.cc|  6 --
 src/lck/apitest/tet_gla.c   |  4 +-
 src/lck/apitest/tet_gld.c   |  4 +-
 src/lck/apitest/tet_glnd.c  |  2 +-
 src/lck/apitest/tet_glsv.h  |  2 +-
 src/msg/apitest/tet_mqa.c   |  4 +-
 src/msg/apitest/tet_mqsv.h  |  2 +-
 src/nid/opensafd.in | 49 ++---
 21 files changed, 149 insertions(+), 110 deletions(-)


Testing Commands:
-

Start OpenSAF with IPv4 TCP transport, without configuring /etc/opensaf/slot_id 
or /var/lib/opensaf/node_id


Testing, Expected Results:
--

OpenSAF shall use the IPv4 address as Node ID.


Condit

Re: [devel] [PATCH 1/1] imm: immnd asserts at veterans due to mismatched data during sync [#2748]

2018-01-15 Thread Ravi Sekhar Reddy Konda

Hi Vu,

Ack, code review only

Thanks,
Ravi
-Original Message-
From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] 
Sent: Tuesday, January 09, 2018 6:54 PM
To: ravisekhar.ko...@oracle.com; zoran.milinko...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: [PATCH 1/1] imm: immnd asserts at veterans due to mismatched data 
during sync [#2748]

Don't allow to make any changes to IMM data model after the SYNC abort has been 
sent to active IMMD but the abort response not yet arrived at the coord yet. 
Otherwise, all veteran nodes will be restarted at sync finalize due to data 
mismatch.
---
 src/imm/immnd/ImmModel.cc  | 74 ++---
 src/imm/immnd/ImmModel.h   |  2 ++
 src/imm/immnd/immnd_cb.h   |  1 -
 src/imm/immnd/immnd_evt.c  | 82 +++---
 src/imm/immnd/immnd_init.h |  5 +++
 src/imm/immnd/immnd_proc.c | 14 
 6 files changed, 95 insertions(+), 83 deletions(-)

diff --git a/src/imm/immnd/ImmModel.cc b/src/imm/immnd/ImmModel.cc index 
fcd354e..4d875f4 100644
--- a/src/imm/immnd/ImmModel.cc
+++ b/src/imm/immnd/ImmModel.cc
@@ -2260,26 +2260,90 @@ void immModel_implementerDelete(IMMND_CB *cb, const 
char *implementerName) {
   ImmModel::instance(>immModel)->implementerDelete(implementerName);
 }
 
+void immModel_sendSyncAbortAt(IMMND_CB *cb, struct timespec time) {
+  ImmModel::instance(>immModel)->SendSyncAbortAt(time);
+}
+
+void immModel_getSyncAbortRsp(IMMND_CB *cb) {
+  ImmModel::instance(>immModel)->GetSyncAbortRsp();
+}
+
 /**/
 
 ImmModel::ImmModel() : loaderPid(-1) {}
 
+//>
+// When sSyncAbortingAt != {0,0}, it means the SYNC has been aborted // 
+by coord and that abort message not yet broadcasted to IMMNDs.
+// Therefore, NODE STATE at IMMND coord (IMM_NODE_FULLY_AVAILABLE) is 
+// different with veteran node(s) (IMM_NODE_R_AVAILABLE) at the moment.
+// So, any change to IMM data model at the coord will result data 
+mismatchs // comparing with veterans, and lead to veterans restarted at 
+sync finalize // later on.
+//
+// In addition, we check the time here to avoid the worst cases // such 
+as the abort message arrived at active IMMD, but later on // it fails 
+to send the response (e.g: hang IMMD, or reboot IMMD, etc.) // For that 
+reason, if we don't receive the response within 6 seconds, // consider 
+sending abort failed.
+//
+//<
+
+// Store the start time of sync abort sent static struct timespec 
+sSyncAbortSentAt; void ImmModel::SendSyncAbortAt(timespec& time) {
+  sSyncAbortSentAt = time;
+}
+
+void ImmModel::GetSyncAbortRsp() {
+  sSyncAbortSentAt.tv_sec  = 0;
+  sSyncAbortSentAt.tv_nsec = 0;
+}
+
+static bool is_sync_aborting() {
+  bool unset = (sSyncAbortSentAt.tv_nsec == 0 &&
+sSyncAbortSentAt.tv_sec == 0);
+
+  if (unset) return false;
+
+  time_t duration_in_second = 0x;  struct timespec now, 
+ duration;  osaf_clock_gettime(CLOCK_MONOTONIC, );  
+ osaf_timespec_subtract(, , );  
+ duration_in_second = duration.tv_sec;  if (duration_in_second > 
+ DEFAULT_TIMEOUT_SEC) {
+sSyncAbortSentAt.tv_sec  = 0;
+sSyncAbortSentAt.tv_nsec = 0;
+  }
+
+  return (duration_in_second <= DEFAULT_TIMEOUT_SEC); }
+
 bool ImmModel::immNotWritable() {
+  bool notwritable = true;
   switch (sImmNodeState) {
 case IMM_NODE_R_AVAILABLE:
 case IMM_NODE_UNKNOWN:
 case IMM_NODE_ISOLATED:
-  return true;
+  break;
 
 case IMM_NODE_W_AVAILABLE:
 case IMM_NODE_FULLY_AVAILABLE:
-case IMM_NODE_LOADING:
-  return false;
+case IMM_NODE_LOADING: {
+  notwritable = false;
+  break;
+}
 
-default:
+default: {
   LOG_ER("Impossible node state, will terminate");
+  abort();
+}
   }
-  abort();
+
+  // When the sync abort has been sent by coord but the abort sync rsp 
+ not yet  // arrived at the coord yet, we will not allow to make any 
+ change to IMM data  // model to avoid data mismatch at finalize sync.
+  return (notwritable || is_sync_aborting());
 }
 
 /* immNotPbeWritable returning true means:
diff --git a/src/imm/immnd/ImmModel.h b/src/imm/immnd/ImmModel.h index 
9e4c54a..6bdd1b9 100644
--- a/src/imm/immnd/ImmModel.h
+++ b/src/imm/immnd/ImmModel.h
@@ -391,6 +391,8 @@ class ImmModel {
   void getNonCriticalCcbs(IdVector& cv);
   void getOldCriticalCcbs(IdVector& cv, SaUint32T* pbeConn,
   unsigned int* pbeNodeId, SaUint32T* pbeId);
+  void SendSyncAbortAt(timespec& time);  void GetSyncAbortRsp();
   bool immNotWritable();
   bool immNotPbeWritable(bool isPrtoClient = true);
   void* getPbeOi(SaUint32T* pbeConn, unsigned int* pbeNode, diff --git 
a/src/imm/immnd/immnd_cb.h b/src/imm/immnd/immnd_cb.h index 1a20ac3..7614d27 
100644
--- a/src/imm/immnd/immnd_cb.h
+++ b/src/imm/immnd/immnd_cb.h
@@ -129,7 +129,6 @@ typedef struct immnd_cb_tag {
   uint8_t mIntroduced;  //

Re: [devel] [PATCH 1/1] amfd: Avoid IMM reinitialization in OpenSAF components termination phase V2 [#2737]

2018-01-04 Thread Ravi Sekhar Reddy Konda

Hi Minh,

Ack for the patch

Regards,
Ravi

-Original Message-
From: Minh Chau [mailto:minh.c...@dektech.com.au] 
Sent: Friday, December 29, 2017 2:02 PM
To: hans.nordeb...@ericsson.com; ravisekhar.ko...@oracle.com; 
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
Subject: [PATCH 1/1] amfd: Avoid IMM reinitialization in OpenSAF components 
termination phase V2 [#2737]

This commit reverts commit of d231ba43, which depends on node id to set the 
node state as SHUTTING_DOWN. In node shutting down phase, the node id can be 
removed from amfd since CLMD are termitated first.
This commit also introduces a new IMM state to determine whether IMM service to 
be initialized.
---
 src/amf/amfd/cb.h |  1 +
 src/amf/amfd/imm.cc   | 13 +++--
 src/amf/amfd/ndfsm.cc |  3 ++-
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/src/amf/amfd/cb.h b/src/amf/amfd/cb.h index c7d7ddd..60bb554 100644
--- a/src/amf/amfd/cb.h
+++ b/src/amf/amfd/cb.h
@@ -63,6 +63,7 @@ typedef enum {
   AVD_IMM_INIT_BASE = 1,
   AVD_IMM_INIT_ONGOING = 2,
   AVD_IMM_INIT_DONE = 3,
+  AVD_IMM_TERMINATING = 4,
 } AVD_IMM_INIT_STATUS;
 /*
  * Sync state of the Standby.
diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc index aef988f..47c0e5a 
100644
--- a/src/amf/amfd/imm.cc
+++ b/src/amf/amfd/imm.cc
@@ -2185,25 +2185,18 @@ void avd_imm_reinit_bg(void) {
   pthread_t thread;
   pthread_attr_t attr;
   int rc = 0;
-  AVD_AVND *node = nullptr;
 
   TRACE_ENTER();
   if (avd_cb->avd_imm_status == AVD_IMM_INIT_ONGOING) {
 TRACE("Already IMM init is going in another thread");
 return;
   }
-  node = avd_node_find_nodeid(avd_cb->node_id_avd);
-  if (node == nullptr) {
-LOG_ER("%s: invalid node ID (%x)", __FUNCTION__,
-avd_cb->node_id_avd);
-return;
-  }
 
-  if (node->node_state == AVD_AVND_STATE_SHUTTING_DOWN) {
-// the node is shutting down phase, no need to reinitialize
-// IMM service
+  if (avd_cb->avd_imm_status == AVD_IMM_TERMINATING) {
+TRACE("IMMND/IMMD are being terminated by AMFND");
 return;
   }
+
   avd_cb->avd_imm_status = AVD_IMM_INIT_ONGOING;
 
   LOG_NO("Re-initializing with IMM");
diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc index 
8501bad..9d54df1 100644
--- a/src/amf/amfd/ndfsm.cc
+++ b/src/amf/amfd/ndfsm.cc
@@ -587,9 +587,10 @@ void avd_node_down_evh(AVD_CL_CB *cb, AVD_EVT *evt)
 n2d_msg->msg_info.n2d_node_down_info.msg_id) != NCSCC_RC_SUCCESS) {
   /* log error that the director is not able to send the message */
   LOG_ER("%s:%u: %u", __FILE__, __LINE__, node->node_info.nodeId);
+  goto done;
 }
+cb->avd_imm_status = AVD_IMM_TERMINATING;
   }
-  avd_node_state_set(node, AVD_AVND_STATE_SHUTTING_DOWN);
 done:
   avsv_dnd_msg_free(n2d_msg);
   evt->info.avnd_msg = nullptr;
--
2.7.4

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfd: Avoid IMM reinitialization in OpenSAF components termination phase V2 [#2737]

2018-01-04 Thread Ravi Sekhar Reddy Konda

Hi Minh,

Started reviewing, will get back to you by tomorrow

Regards,
Ravi
-Original Message-
From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] 
Sent: Thursday, January 04, 2018 4:27 PM
To: hans.nordeb...@ericsson.com; Ravi Sekhar Reddy Konda 
<ravisekhar.ko...@oracle.com>; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] amfd: Avoid IMM reinitialization in OpenSAF components 
termination phase V2 [#2737]

Hi Ravi,

Have you had time to have look at this patch?

Thanks,

Minh


On 29/12/17 19:31, Minh Chau wrote:
> This commit reverts commit of d231ba43, which depends on node id to 
> set the node state as SHUTTING_DOWN. In node shutting down phase, the 
> node id can be removed from amfd since CLMD are termitated first.
> This commit also introduces a new IMM state to determine whether IMM 
> service to be initialized.
> ---
>   src/amf/amfd/cb.h |  1 +
>   src/amf/amfd/imm.cc   | 13 +++--
>   src/amf/amfd/ndfsm.cc |  3 ++-
>   3 files changed, 6 insertions(+), 11 deletions(-)
>
> diff --git a/src/amf/amfd/cb.h b/src/amf/amfd/cb.h index 
> c7d7ddd..60bb554 100644
> --- a/src/amf/amfd/cb.h
> +++ b/src/amf/amfd/cb.h
> @@ -63,6 +63,7 @@ typedef enum {
> AVD_IMM_INIT_BASE = 1,
> AVD_IMM_INIT_ONGOING = 2,
> AVD_IMM_INIT_DONE = 3,
> +  AVD_IMM_TERMINATING = 4,
>   } AVD_IMM_INIT_STATUS;
>   /*
>* Sync state of the Standby.
> diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc index 
> aef988f..47c0e5a 100644
> --- a/src/amf/amfd/imm.cc
> +++ b/src/amf/amfd/imm.cc
> @@ -2185,25 +2185,18 @@ void avd_imm_reinit_bg(void) {
> pthread_t thread;
> pthread_attr_t attr;
> int rc = 0;
> -  AVD_AVND *node = nullptr;
>   
> TRACE_ENTER();
> if (avd_cb->avd_imm_status == AVD_IMM_INIT_ONGOING) {
>   TRACE("Already IMM init is going in another thread");
>   return;
> }
> -  node = avd_node_find_nodeid(avd_cb->node_id_avd);
> -  if (node == nullptr) {
> -LOG_ER("%s: invalid node ID (%x)", __FUNCTION__,
> -avd_cb->node_id_avd);
> -return;
> -  }
>   
> -  if (node->node_state == AVD_AVND_STATE_SHUTTING_DOWN) {
> -// the node is shutting down phase, no need to reinitialize
> -// IMM service
> +  if (avd_cb->avd_imm_status == AVD_IMM_TERMINATING) {
> +TRACE("IMMND/IMMD are being terminated by AMFND");
>   return;
> }
> +
> avd_cb->avd_imm_status = AVD_IMM_INIT_ONGOING;
>   
> LOG_NO("Re-initializing with IMM"); diff --git 
> a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc index 8501bad..9d54df1 
> 100644
> --- a/src/amf/amfd/ndfsm.cc
> +++ b/src/amf/amfd/ndfsm.cc
> @@ -587,9 +587,10 @@ void avd_node_down_evh(AVD_CL_CB *cb, AVD_EVT *evt)
>   n2d_msg->msg_info.n2d_node_down_info.msg_id) != NCSCC_RC_SUCCESS) {
> /* log error that the director is not able to send the message */
> LOG_ER("%s:%u: %u", __FILE__, __LINE__, 
> node->node_info.nodeId);
> +  goto done;
>   }
> +cb->avd_imm_status = AVD_IMM_TERMINATING;
> }
> -  avd_node_state_set(node, AVD_AVND_STATE_SHUTTING_DOWN);
>   done:
> avsv_dnd_msg_free(n2d_msg);
> evt->info.avnd_msg = nullptr;


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] fmd: convert to C++ [#2750]

2018-01-04 Thread Ravi Sekhar Reddy Konda

Hi Gary,

Ack (Code review only)

Regards,
Ravi

-Original Message-
From: Gary Lee [mailto:gary@dektech.com.au] 
Sent: Thursday, December 28, 2017 2:12 PM
To: ravisekhar.ko...@oracle.com; anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net; Gary Lee 
Subject: [PATCH 1/1] fmd: convert to C++ [#2750]

Source files renamed to .cc
Apply changes required to compile succesfully with g++
---
 src/fm/Makefile.am   |  8 
 src/fm/fmd/fm.h  |  2 +-
 src/fm/fmd/{fm_amf.c => fm_amf.cc}   |  6 +++---
 src/fm/fmd/fm_cb.h   |  4 ++--
 src/fm/fmd/{fm_main.c => fm_main.cc} | 21 ++---
 src/fm/fmd/{fm_mds.c => fm_mds.cc}   |  6 +++---
 src/fm/fmd/{fm_rda.c => fm_rda.cc}   |  0
 7 files changed, 23 insertions(+), 24 deletions(-)  rename 
src/fm/fmd/{fm_amf.c => fm_amf.cc} (98%)  rename src/fm/fmd/{fm_main.c => 
fm_main.cc} (97%)  rename src/fm/fmd/{fm_mds.c => fm_mds.cc} (99%)  rename 
src/fm/fmd/{fm_rda.c => fm_rda.cc} (100%)

diff --git a/src/fm/Makefile.am b/src/fm/Makefile.am index ad666b905..d48a9146c 
100644
--- a/src/fm/Makefile.am
+++ b/src/fm/Makefile.am
@@ -41,10 +41,10 @@ bin_osaffmd_CPPFLAGS = \
$(AM_CPPFLAGS)
 
 bin_osaffmd_SOURCES = \
-   src/fm/fmd/fm_amf.c \
-   src/fm/fmd/fm_main.c \
-   src/fm/fmd/fm_mds.c \
-   src/fm/fmd/fm_rda.c
+   src/fm/fmd/fm_amf.cc \
+   src/fm/fmd/fm_main.cc \
+   src/fm/fmd/fm_mds.cc \
+   src/fm/fmd/fm_rda.cc
 
 bin_osaffmd_LDADD = \
lib/libSaAmf.la \
diff --git a/src/fm/fmd/fm.h b/src/fm/fmd/fm.h index 79734f241..ce71c4105 100644
--- a/src/fm/fmd/fm.h
+++ b/src/fm/fmd/fm.h
@@ -73,7 +73,7 @@
 #include "fm_evt.h"
 
 extern void amfnd_down_callback(void);
-extern void ava_install_amf_down_cb(void (*cb)(void));
+extern "C" void ava_install_amf_down_cb(void (*cb)(void));
 extern uint32_t initialize_for_assignment(FM_CB *cb, SaAmfHAStateT ha_state);  
extern void fm_tmr_stop(FM_TMR *tmr);  #endif  // FM_FMD_FM_H_ diff --git 
a/src/fm/fmd/fm_amf.c b/src/fm/fmd/fm_amf.cc similarity index 98% rename from 
src/fm/fmd/fm_amf.c rename to src/fm/fmd/fm_amf.cc index 9e6eaf7de..5be2bf201 
100644
--- a/src/fm/fmd/fm_amf.c
+++ b/src/fm/fmd/fm_amf.cc
@@ -34,14 +34,14 @@
 **/
 
 #include "fm.h"
-uint32_t gl_fm_hdl;
+extern uint32_t gl_fm_hdl;
 
 uint32_t fm_amf_init(FM_AMF_CB *fm_amf_cb);  static uint32_t 
fm_amf_register(FM_AMF_CB *fm_amf_cb);  static uint32_t 
fm_amf_healthcheck_start(FM_AMF_CB *fm_amf_cb);  static FM_AMF_CB 
*fm_amf_take_hdl(void);  static void fm_amf_give_hdl(void); -static char 
*ha_role_string[] = {"ACTIVE", "STANDBY", "QUIESCED", "QUIESCING"};
+static const char *ha_role_string[] = {"ACTIVE", "STANDBY", "QUIESCED", 
+"QUIESCING"};
 
 void amfnd_down_callback(void)
 {
@@ -79,7 +79,7 @@ FM_AMF_CB *fm_amf_take_hdl(void)
FM_CB *fm_cb = NULL;
 
/* Take handle */
-   fm_cb = ncshm_take_hdl(NCS_SERVICE_ID_GFM, gl_fm_hdl);
+   fm_cb = static_cast(ncshm_take_hdl(NCS_SERVICE_ID_GFM, 
+gl_fm_hdl));
 
return _cb->fm_amf_cb;
 }
diff --git a/src/fm/fmd/fm_cb.h b/src/fm/fmd/fm_cb.h index 248d70bd5..f8559b9c5 
100644
--- a/src/fm/fmd/fm_cb.h
+++ b/src/fm/fmd/fm_cb.h
@@ -32,7 +32,7 @@
 #include 
 #include 
 
-uint32_t gl_fm_hdl;
+extern uint32_t gl_fm_hdl;
 
 typedef enum {
   FM_TMR_TYPE_MIN,
@@ -108,7 +108,7 @@ typedef struct fm_cb {
   bool peer_node_terminated;
 } FM_CB;
 
-extern char *role_string[];
+extern const char *role_string[];
 extern FM_CB *fm_cb;
 
 /*
diff --git a/src/fm/fmd/fm_main.c b/src/fm/fmd/fm_main.cc similarity index 97% 
rename from src/fm/fmd/fm_main.c rename to src/fm/fmd/fm_main.cc index 
5b47efe96..db8395ee7 100644
--- a/src/fm/fmd/fm_main.c
+++ b/src/fm/fmd/fm_main.cc
@@ -42,8 +42,8 @@ static const SaClmCallbacksT_4 clm_callbacks = {0, 0};  enum 
{ FD_TERM = 0, FD_AMF = 1, FD_MBX };
 
 FM_CB *fm_cb = NULL;
-char *role_string[] = {"UNDEFINED", "ACTIVE", "STANDBY", "QUIESCED",
-  "QUIESCING"};
+const char *role_string[] = {"UNDEFINED", "ACTIVE", "STANDBY", "QUIESCED",
+ "QUIESCING"};
 
 /*
  *   *
@@ -97,7 +97,7 @@ void rda_cb(uint32_t cb_hdl, PCS_RDA_CB_INFO *cb_info,
 
TRACE_ENTER();
 
-   evt = calloc(1, sizeof(FM_EVT));
+   evt = static_cast(calloc(1, sizeof(FM_EVT)));
if (NULL == evt) {
LOG_ER("calloc failed");
goto done;
@@ -107,7 +107,7 @@ void rda_cb(uint32_t cb_hdl, PCS_RDA_CB_INFO *cb_info,
evt->info.rda_info.role = cb_info->info.io_role;
 
rc = ncs_ipc_send(_cb->mbx, (NCS_IPC_MSG *)evt,
- MDS_SEND_PRIORITY_HIGH);
+

Re: [devel] [PATCH 1/1] imm: fix IMMND assert at veteran nodes during SYNC [#2748]

2018-01-03 Thread Ravi Sekhar Reddy Konda

Hi Vu,

Ack with minor comments inline
I have not tested as I am not able to reproduce the scenario

Thanks,
Ravi
- Original Message -
From: vu.m.ngu...@dektech.com.au
To: ravisekhar.ko...@oracle.com, zoran.milinko...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net, vu.m.ngu...@dektech.com.au
Sent: Friday, December 29, 2017 3:38:48 PM GMT +05:30 Chennai, Kolkata, Mumbai, 
New Delhi
Subject: [PATCH 1/1] imm: fix IMMND assert at veteran nodes during SYNC [#2748]

During sync, if saImmOmAdminOwnerInitialize or saImmOmCcbInitialize message
comes to active IMMD just right after IMMD_EVT_ND2D_SYNC_START message and
before IMMND_EVT_D2ND_SYNC_START message is arrived at IMMNDs, there is
possibily the request(s) is accepted at IMMND coord but is rejected at veterans.

This fix introduces a flag to say abort sync has been sent by coord
but not yet received the abort sync response from active IMMD.
Based on that information, IMMND coord will reject such messages
to align the result at veterans.
---
 src/imm/immnd/immnd_cb.h   |  2 ++
 src/imm/immnd/immnd_evt.c  | 65 +-
 src/imm/immnd/immnd_main.c |  1 +
 src/imm/immnd/immnd_proc.c | 12 +
 4 files changed, 74 insertions(+), 6 deletions(-)

diff --git a/src/imm/immnd/immnd_cb.h b/src/imm/immnd/immnd_cb.h
index 7614d27..e89966c 100644
--- a/src/imm/immnd/immnd_cb.h
+++ b/src/imm/immnd/immnd_cb.h
@@ -129,6 +129,8 @@ typedef struct immnd_cb_tag {
   uint8_t mIntroduced;  // Ack received on introduce message
   uint8_t mSyncRequested;   // true=> I am coord, other req sync
   uint8_t mPendSync;// 1=>sync announced but not received.
+  struct timespec mSyncAbortSentAt;  // Store the start time of sync abort sent
+  bool mSyncAborting;   // true if sync abort sent but not yet get respnse.
   uint8_t mSyncFinalizing;  // 1=>finalizeSync sent but not received.
   uint8_t mSync;// true => this node is being synced (client).
   uint8_t mCanBeCoord;  // If!=0 then SC, 2 => 2pbe arbitration, 4 =>
diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c
index 52d33dc..6b1ce43 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -10056,6 +10056,10 @@ uint32_t immnd_evt_proc_abort_sync(IMMND_CB *cb, 
IMMND_EVT *evt,
osafassert(cb->mRulingEpoch <= evt->info.ctrl.rulingEpoch);
cb->mRulingEpoch = evt->info.ctrl.rulingEpoch;
 
+   cb->mSyncAborting = false;
+   cb->mSyncAbortSentAt.tv_sec  = 0;
+   cb->mSyncAbortSentAt.tv_nsec = 0;
+
LOG_WA("Global ABORT SYNC received for epoch %u", cb->mRulingEpoch);
 
if (cb->mIsCoord) { /* coord should already be up to date. */
@@ -10948,6 +10952,36 @@ static void immnd_evt_proc_discard_node(IMMND_CB *cb, 
IMMND_EVT *evt,
TRACE_LEAVE();
 }
[Ravi]: As per general naming convention followed in opensaf, we use "_evt" in 
the routine name if it is a routine processing events from IMMD or IMMND
here as it is utility routine, its better not to use "_evt". You can 
name it as "immnd_is_sync_aborting".
Also you added good description as comments in the beginning of the 
routing, please add function header and move the below comments as 
description in the header  
+static bool immnd_evt_is_sync_aborting(IMMND_CB* cb)
+{
+   /*
+* When mSyncAborting = true, it means the SYNC has been aborted
+* by coord, and that abort message not yet broadcasted to IMMNDs.
+* Therefore, NODE STATE at IMMND coord (IMM_NODE_FULLY_AVAILABLE) is
+* different with veteran node(s) (IMM_NODE_R_AVAILABLE) at the moment.
+* So, IMMND_EVT_D2ND_ADMINIT or IMMND_EVT_D2ND_CCBINIT msg comes to
+* IMMND coord has to be rejected to synchronize with the result
+* at veterans.
+*
+* In addition, we check the time here to avoid the worst cases
+* such as the abort message arrived at active IMMD, but later on
+* it fails to send the response (e.g: hang IMMD, or reboot IMMD, etc.)
+* For that reason, if we don't receive the response within 6 seconds,
+* consider sending abort failed.
+*/
+   const unsigned kTimeout = 6; /* Inherit from DEFAULT_TIMEOUT_SEC */
+   time_t duration_in_second = 0x;
+   if (cb->mIsCoord && cb->mSyncAborting == true) {
+   struct timespec now, duration;
+   osaf_clock_gettime(CLOCK_MONOTONIC, );
+   osaf_timespec_subtract(, >mSyncAbortSentAt, );
+   duration_in_second = duration.tv_sec;
+   if (duration_in_second > kTimeout) cb->mSyncAborting = false;
+   }
+
+   return (duration_in_second <= kTimeout);
+}
+
 /
  * Name  : immnd_evt_proc_adminit_rsp
  *
@@ -10981,9 +11015,19 @@ static void immnd_evt_proc_adminit_rsp(IMMND_CB *cb, 
IMMND_EVT *evt,
conn =

Re: [devel] [PATCH 1/1] amfd: Change LOG_ER to LOG_WA if nodeid is not found in node shutting down [#2737]

2017-12-27 Thread Ravi Sekhar Reddy Konda

Hi Minh,

I think its better to notify using LOG_NO, instead of just log warning and 
return.
Because its not only during Shutting Down,  this routine is being called in 
many flows

Thanks,
Ravi

-Original Message-
From: Minh Chau [mailto:minh.c...@dektech.com.au] 
Sent: Thursday, December 28, 2017 7:32 AM
To: hans.nordeb...@ericsson.com; ravisekhar.ko...@oracle.com; 
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
Subject: [PATCH 1/1] amfd: Change LOG_ER to LOG_WA if nodeid is not found in 
node shutting down [#2737]

---
 src/amf/amfd/imm.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc index aef988f..f36584b 
100644
--- a/src/amf/amfd/imm.cc
+++ b/src/amf/amfd/imm.cc
@@ -2194,7 +2194,7 @@ void avd_imm_reinit_bg(void) {
   }
   node = avd_node_find_nodeid(avd_cb->node_id_avd);
   if (node == nullptr) {
-LOG_ER("%s: invalid node ID (%x)", __FUNCTION__,
+LOG_WA("%s: invalid node ID (%x)", __FUNCTION__,
 avd_cb->node_id_avd);
 return;
   }
--
2.7.4

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfd: Avoid IMM reinitialization in OpenSAF components termination phase [#2737]

2017-12-20 Thread Ravi Sekhar Reddy Konda

Ack, reviewed & tested

Regards,
Ravi

-Original Message-
From: Minh Chau [mailto:minh.c...@dektech.com.au] 
Sent: Tuesday, December 19, 2017 7:39 AM
To: hans.nordeb...@ericsson.com; ravisekhar.ko...@oracle.com; 
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
Subject: [PATCH 1/1] amfd: Avoid IMM reinitialization in OpenSAF components 
termination phase [#2737]

---
 src/amf/amfd/imm.cc   | 13 +
 src/amf/amfd/ndfsm.cc |  2 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc index bf7e3d3..aef988f 
100644
--- a/src/amf/amfd/imm.cc
+++ b/src/amf/amfd/imm.cc
@@ -2185,12 +2185,25 @@ void avd_imm_reinit_bg(void) {
   pthread_t thread;
   pthread_attr_t attr;
   int rc = 0;
+  AVD_AVND *node = nullptr;
 
   TRACE_ENTER();
   if (avd_cb->avd_imm_status == AVD_IMM_INIT_ONGOING) {
 TRACE("Already IMM init is going in another thread");
 return;
   }
+  node = avd_node_find_nodeid(avd_cb->node_id_avd);
+  if (node == nullptr) {
+LOG_ER("%s: invalid node ID (%x)", __FUNCTION__,
+avd_cb->node_id_avd);
+return;
+  }
+
+  if (node->node_state == AVD_AVND_STATE_SHUTTING_DOWN) {
+// the node is shutting down phase, no need to reinitialize
+// IMM service
+return;
+  }
   avd_cb->avd_imm_status = AVD_IMM_INIT_ONGOING;
 
   LOG_NO("Re-initializing with IMM");
diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc index 
ca2e3f6..8501bad 100644
--- a/src/amf/amfd/ndfsm.cc
+++ b/src/amf/amfd/ndfsm.cc
@@ -589,7 +589,7 @@ void avd_node_down_evh(AVD_CL_CB *cb, AVD_EVT *evt)
   LOG_ER("%s:%u: %u", __FILE__, __LINE__, node->node_info.nodeId);
 }
   }
-
+  avd_node_state_set(node, AVD_AVND_STATE_SHUTTING_DOWN);
 done:
   avsv_dnd_msg_free(n2d_msg);
   evt->info.avnd_msg = nullptr;
--
2.7.4

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] clmd: add dynamically created EEs to PLM entity group on standby [#2730]

2017-12-12 Thread Ravi Sekhar Reddy Konda

Hi Alex,

Ack, Code review only

Regards,
RAvi

-Original Message-
From: Alex Jones [mailto:alex.jo...@genband.com] 
Sent: Wednesday, December 06, 2017 10:45 PM
To: anders.wid...@ericsson.com; hans.nordeb...@ericsson.com; 
mathi.np@gmail.com; Ravi Sekhar Reddy Konda <ravisekhar.ko...@oracle.com>
Cc: opensaf-devel@lists.sourceforge.net; Alex Jones <alex.jo...@genband.com>
Subject: [PATCH 1/1] clmd: add dynamically created EEs to PLM entity group on 
standby [#2730]

If EEs and corresponding CLM nodes are dynamically created, after a middleware 
si-swap when the former standby has become active, then one of those EEs is 
rebooted, clmd has not enabled PLM readiness state tracking on the EE and will 
not know when it comes back. Thus, the node will not be allowed back into the 
cluster because it thinks it is not a member.

The dynamically created EE is not being added to the PLM entity group on the 
standby.

Add the dynamically created EE to the PLM entity group on the standby.
---
 src/clm/clmd/clms_evt.c   |  2 +-
 src/clm/clmd/clms_imm.c   |  2 +-
 src/clm/clmd/clms_mbcsv.c | 20 +++-
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/src/clm/clmd/clms_evt.c b/src/clm/clmd/clms_evt.c index 
4d7010d..b65036d 100644
--- a/src/clm/clmd/clms_evt.c
+++ b/src/clm/clmd/clms_evt.c
@@ -992,7 +992,7 @@ static uint32_t proc_mds_node_evt(CLMSV_CLMS_EVT *evt)
if (delete_existing_nodedown_records(node_id) == true) {
TRACE_LEAVE();
return rc;
-   } else if (node->member == SA_FALSE) {
+   } else if (node->member == SA_FALSE && node->admin_state != 
+SA_CLM_ADMIN_UNLOCKED) {
/* One possibility is that an admin operation has made
 * this a non-member */
TRACE_LEAVE();
diff --git a/src/clm/clmd/clms_imm.c b/src/clm/clmd/clms_imm.c index 
6809ce8..c245f67 100644
--- a/src/clm/clmd/clms_imm.c
+++ b/src/clm/clmd/clms_imm.c
@@ -2291,7 +2291,7 @@ SaAisErrorT clms_node_ccb_apply_cb(CcbUtilOperationData_t 
*opdata)
rc = saPlmEntityGroupRemove(clms_cb->ent_group_hdl,
entityNames, 1);
if (rc != SA_AIS_OK) {
-   LOG_ER("saPlmEntityGroupAdd FAILED rc = %d",
+   LOG_ER("saPlmEntityGroupRemove FAILED rc = %d",
   rc);
return rc;
}
diff --git a/src/clm/clmd/clms_mbcsv.c b/src/clm/clmd/clms_mbcsv.c index 
47e4494..6976b03 100644
--- a/src/clm/clmd/clms_mbcsv.c
+++ b/src/clm/clmd/clms_mbcsv.c
@@ -282,6 +282,9 @@ static uint32_t ckpt_proc_node_csync_rec(CLMS_CB *cb, 
CLMS_CKPT_REC *data)
CLMSV_CKPT_NODE *param = >param.node_csync_rec;
CLMS_CLUSTER_NODE *node = NULL, *tmp_node = NULL;
uint32_t rc = NCSCC_RC_SUCCESS;
+#ifdef ENABLE_AIS_PLM
+   SaNameT *entityNames = NULL;
+#endif
 
TRACE_ENTER2("node_name:%s", param->node_name.value);
 
@@ -315,6 +318,21 @@ static uint32_t ckpt_proc_node_csync_rec(CLMS_CB *cb, 
CLMS_CKPT_REC *data)
LOG_ER("Patricia add failed");
}
}
+#ifdef ENABLE_AIS_PLM
+   /* Add it to the plm entity group */
+   entityNames = >ee_name;
+   if (clms_cb->reg_with_plm == SA_TRUE) {
+   SaAisErrorT aisrc = saPlmEntityGroupAdd(
+   clms_cb->ent_group_hdl,
+   entityNames,
+   1,
+   SA_PLM_GROUP_SINGLE_ENTITY);
+   if (aisrc != SA_AIS_OK) {
+   LOG_ER("saPlmEntityGroupAdd FAILED rc = %d",
+   aisrc);
+   }
+   }
+#endif
}
TRACE_LEAVE();
return NCSCC_RC_SUCCESS;
@@ -357,7 +375,7 @@ static uint32_t ckpt_proc_node_del_rec(CLMS_CB *cb, 
CLMS_CKPT_REC *data)
rc = saPlmEntityGroupRemove(clms_cb->ent_group_hdl, entityNames,
1);
if (rc != SA_AIS_OK) {
-   LOG_ER("saPlmEntityGroupAdd FAILED rc = %d", rc);
+   LOG_ER("saPlmEntityGroupRemove FAILED rc = %d", rc);
return rc;
}
}
--
2.9.5


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] clmd: add dynamically created EEs to PLM entity group on standby [#2730]

2017-12-11 Thread Ravi Sekhar Reddy Konda

HI Alex, 

 

sorry for the delay, will review today and let you know

 

Thanks,

Ravi

From: Alex Jones [mailto:alex.jo...@genband.com] 
Sent: Monday, December 11, 2017 9:01 PM
To: Anders Widell <anders.wid...@ericsson.com>; hans.nordeb...@ericsson.com; 
mathi.np@gmail.com; Ravi Sekhar Reddy Konda <ravisekhar.ko...@oracle.com>
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] clmd: add dynamically created EEs to PLM entity group 
on standby [#2730]

 

Ravi and Mathi,

    If you have no PLM comments, I will push the patch tomorrow.

Alex

 

On 12/11/2017 10:28 AM, Anders Widell wrote:

  _  

NOTICE: This email was received from an EXTERNAL sender

  _  


Ack, tested without PLM.

regards,

Anders Widell


On 12/06/2017 06:14 PM, Alex Jones wrote:
> If EEs and corresponding CLM nodes are dynamically created, after a middleware
> si-swap when the former standby has become active, then one of those EEs is
> rebooted, clmd has not enabled PLM readiness state tracking on the EE and will
> not know when it comes back. Thus, the node will not be allowed back into the
> cluster because it thinks it is not a member.
>
> The dynamically created EE is not being added to the PLM entity group on the
> standby.
>
> Add the dynamically created EE to the PLM entity group on the standby.
> ---
> src/clm/clmd/clms_evt.c | 2 +-
> src/clm/clmd/clms_imm.c | 2 +-
> src/clm/clmd/clms_mbcsv.c | 20 +++-
> 3 files changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/src/clm/clmd/clms_evt.c b/src/clm/clmd/clms_evt.c
> index 4d7010d..b65036d 100644
> --- a/src/clm/clmd/clms_evt.c
> +++ b/src/clm/clmd/clms_evt.c
> @@ -992,7 +992,7 @@ static uint32_t proc_mds_node_evt(CLMSV_CLMS_EVT *evt)
> if (delete_existing_nodedown_records(node_id) == true) {
> TRACE_LEAVE();
> return rc;
> - } else if (node->member == SA_FALSE) {
> + } else if (node->member == SA_FALSE && node->admin_state != 
> SA_CLM_ADMIN_UNLOCKED) {
> /* One possibility is that an admin operation has made
> * this a non-member */
> TRACE_LEAVE();
> diff --git a/src/clm/clmd/clms_imm.c b/src/clm/clmd/clms_imm.c
> index 6809ce8..c245f67 100644
> --- a/src/clm/clmd/clms_imm.c
> +++ b/src/clm/clmd/clms_imm.c
> @@ -2291,7 +2291,7 @@ SaAisErrorT 
> clms_node_ccb_apply_cb(CcbUtilOperationData_t *opdata)
> rc = saPlmEntityGroupRemove(clms_cb->ent_group_hdl,
> entityNames, 1);
> if (rc != SA_AIS_OK) {
> - LOG_ER("saPlmEntityGroupAdd FAILED rc = %d",
> + LOG_ER("saPlmEntityGroupRemove FAILED rc = %d",
> rc);
> return rc;
> }
> diff --git a/src/clm/clmd/clms_mbcsv.c b/src/clm/clmd/clms_mbcsv.c
> index 47e4494..6976b03 100644
> --- a/src/clm/clmd/clms_mbcsv.c
> +++ b/src/clm/clmd/clms_mbcsv.c
> @@ -282,6 +282,9 @@ static uint32_t ckpt_proc_node_csync_rec(CLMS_CB *cb, 
> CLMS_CKPT_REC *data)
> CLMSV_CKPT_NODE *param = >param.node_csync_rec;
> CLMS_CLUSTER_NODE *node = NULL, *tmp_node = NULL;
> uint32_t rc = NCSCC_RC_SUCCESS;
> +#ifdef ENABLE_AIS_PLM
> + SaNameT *entityNames = NULL;
> +#endif
> 
> TRACE_ENTER2("node_name:%s", param->node_name.value);
> 
> @@ -315,6 +318,21 @@ static uint32_t ckpt_proc_node_csync_rec(CLMS_CB *cb, 
> CLMS_CKPT_REC *data)
> LOG_ER("Patricia add failed");
> }
> }
> +#ifdef ENABLE_AIS_PLM
> + /* Add it to the plm entity group */
> + entityNames = >ee_name;
> + if (clms_cb->reg_with_plm == SA_TRUE) {
> + SaAisErrorT aisrc = saPlmEntityGroupAdd(
> + clms_cb->ent_group_hdl,
> + entityNames,
> + 1,
> + SA_PLM_GROUP_SINGLE_ENTITY);
> + if (aisrc != SA_AIS_OK) {
> + LOG_ER("saPlmEntityGroupAdd FAILED rc = %d",
> + aisrc);
> + }
> + }
> +#endif
> }
> TRACE_LEAVE();
> return NCSCC_RC_SUCCESS;
> @@ -357,7 +375,7 @@ static uint32_t ckpt_proc_node_del_rec(CLMS_CB *cb, 
> CLMS_CKPT_REC *data)
> rc = saPlmEntityGroupRemove(clms_cb->ent_group_hdl, entityNames,
> 1);
> if (rc != SA_AIS_OK) {
> - LOG_ER("saPlmEntityGroupAdd FAILED rc = %d", rc);
> + LOG_ER("saPlmEntityGroupRemove FAILED rc = %d", rc);
> return rc;
> }
> }

 
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] dtm: Support pretty-printing OpenSAF logs using the osaflog command [#2709]

2017-12-10 Thread Ravi Sekhar Reddy Konda

Hi Anders,

Ack, Code review only

Regards,
Ravi

-Original Message-
From: Anders Widell [mailto:anders.wid...@ericsson.com] 
Sent: Friday, December 01, 2017 3:26 PM
To: Ravi Sekhar Reddy Konda <ravisekhar.ko...@oracle.com>
Cc: opensaf-devel@lists.sourceforge.net; Anders Widell 
<anders.wid...@ericsson.com>
Subject: [PATCH 1/1] dtm: Support pretty-printing OpenSAF logs using the 
osaflog command [#2709]

Add support to the osaflog command for parsing and pretty-printing the OpenSAF 
log messages. Initially, we only support simple pretty-printing by removing 
information which is not frequently needed when reading the logs. In future 
ticket(s), we can also add support for filtering log messages.

The following example command will print the MDS log:

osaflog mds.log
---
 src/dtm/common/osaflog_protocol.h |  20 
 src/dtm/tools/osaflog.cc  | 220 ++
 src/dtm/transport/log_server.cc   |  17 +--
 src/dtm/transport/log_server.h|   2 -
 4 files changed, 196 insertions(+), 63 deletions(-)

diff --git a/src/dtm/common/osaflog_protocol.h 
b/src/dtm/common/osaflog_protocol.h
index d580b61bc..61e9f6f39 100644
--- a/src/dtm/common/osaflog_protocol.h
+++ b/src/dtm/common/osaflog_protocol.h
@@ -17,7 +17,9 @@
 #define DTM_COMMON_OSAFLOG_PROTOCOL_H_
 
 #include 
+#include 
 #include 
+#include 
 #include "osaf/configmake.h"
 
 namespace Osaflog {
@@ -45,6 +47,24 @@ struct __attribute__((__packed__)) ClientAddress {
   uint32_t pid;
 };
 
+// Returns a pointer the space-separated field with index @a field_no 
+within // @buf of size @a size, or nullptr if @a buf does not contain 
+enough fields. @a // field_size is an output parameter where the length of the 
field is returned.
+static inline const char* GetField(const char* buf, size_t size, int field_no,
+   size_t* field_size) {
+  while (field_no != 0) {
+const char* pos = static_cast(memchr(buf, ' ', size));
+if (pos == nullptr) return nullptr;
+++pos;
+size -= pos - buf;
+buf = pos;
+--field_no;
+  }
+  const char* pos = static_cast(memchr(buf, ' ', size));
+  *field_size = (pos != nullptr) ? (pos - buf) : size;
+  return buf;
+}
+
 }  // namespace Osaflog
 
 #endif  // DTM_COMMON_OSAFLOG_PROTOCOL_H_ diff --git 
a/src/dtm/tools/osaflog.cc b/src/dtm/tools/osaflog.cc index 
c81df8bcd..385447b09 100644
--- a/src/dtm/tools/osaflog.cc
+++ b/src/dtm/tools/osaflog.cc
@@ -13,9 +13,11 @@
  *
  */
 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -24,62 +26,72 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#include 
 #include "base/time.h"
 #include "base/unix_server_socket.h"
 #include "dtm/common/osaflog_protocol.h"
+#include "osaf/configmake.h"
 
 namespace {
 
-uint64_t Random64Bits(uint64_t seed) {
-  std::mt19937_64 generator{base::TimespecToNanos(base::ReadRealtimeClock()) *
-seed};
-  return generator();
-}
+void PrintUsage(const char* program_name); bool Flush();
+base::UnixServerSocket* CreateSocket(); uint64_t Random64Bits(uint64_t 
+seed); bool PrettyPrint(const std::string& log_stream); std::list 
+OpenLogFiles(const std::string& log_stream); std::string PathName(const 
+std::string& log_stream, int suffix); uint64_t GetInode(int fd); bool 
+PrettyPrint(FILE* stream); bool PrettyPrint(const char* line, size_t 
+size);
 
-void PrintUsage(const char* program_name) {
-  printf(
-  "Usage: %s \n"
-  "Send a command to the node-local internal OpenSAF log server\n"
-  "\n"
-  "Opions:\n"
-  "  --flushFlush all buffered log messages from memory to disk\n",
-  program_name);
-}
-
-base::UnixServerSocket* CreateSocket() {
-  base::UnixServerSocket* sock = nullptr;
-  Osaflog::ClientAddress addr{};
-  addr.pid = getpid();
-  for (uint64_t i = 1; i <= 1000; ++i) {
-addr.random = Random64Bits(i);
-sock = new base::UnixServerSocket(addr.sockaddr(), addr.sockaddr_length(),
-  base::UnixSocket::kNonblocking);
-if (sock->fd() >= 0 || errno != EADDRINUSE) break;
-delete sock;
-sock = nullptr;
-sched_yield();
-  }
-  if (sock != nullptr && sock->fd() < 0) {
-delete sock;
-sock = nullptr;
-  }
-  return sock;
-}
+char buf[65 * 1024];
 
 }  // namespace
 
 int main(int argc, char** argv) {
-  if (argc != 2 || strcmp(argv[1], "--flush") != 0) {
+  bool flush_option = false;
+  if (argc >= 2 && strcmp(argv[1], "--flush") == 0) {
+flush_option = true;
+--argc;
+++argv;
+  }
+  if ((argc != 2) && (argc != 1 || flush_option == false)) {
 PrintUsage(argv[0]);
 exit(EXIT_FAILURE);
   }
+  bool flush_result = Flush();
+  bool print_result = true;
+  if (argc == 2) print_result = PrettyP

Re: [devel] [PATCH 1/1] plmd: fix mbc in PLM [#2724]

2017-12-03 Thread Ravi Sekhar Reddy Konda

Hi Alex,

Ack, reviewed & tested

Thanks,
Ravi
- Original Message -
From: alex.jo...@genband.com
To: mathi.np@gmail.com, ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net, alex.jo...@genband.com
Sent: Saturday, December 2, 2017 1:17:54 AM GMT +05:30 Chennai, Kolkata, 
Mumbai, New Delhi
Subject: [PATCH 1/1] plmd: fix mbc in PLM [#2724]

MBC isn't working in PLM, so no info is being checkpointed to the standby plmd.

When the code to handle more than 2 SCs was put in to PLM, the MBC selection
object was gotten at a later time -- after the while loop containing the "poll"
system call. Thus, the mbc file descriptor was never being set in the poll call.

Move the setting of the mbc file descriptor to inside the while loop, so it gets
set.
---
 src/plm/plmd/plms_main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/plm/plmd/plms_main.c b/src/plm/plmd/plms_main.c
index b512741..23b0194 100644
--- a/src/plm/plmd/plms_main.c
+++ b/src/plm/plmd/plms_main.c
@@ -482,12 +482,13 @@ int main(int argc, char *argv[])
fds[FD_AMF].fd = plms_cb->nid_started ? plms_cb->usr1_sel_obj.rmv_obj
  : plms_cb->amf_sel_obj;
fds[FD_AMF].events = POLLIN;
-   fds[FD_MBCSV].fd = plms_cb->mbcsv_sel_obj;
-   fds[FD_MBCSV].events = POLLIN;
fds[FD_MBX].fd = mbx_fd.rmv_obj;
fds[FD_MBX].events = POLLIN;
 
while (1) {
+   fds[FD_MBCSV].fd = plms_cb->mbcsv_sel_obj;
+   fds[FD_MBCSV].events = POLLIN;
+
if (plms_cb->oi_hdl != 0) {
fds[FD_IMM].fd = plms_cb->imm_sel_obj;
fds[FD_IMM].events = POLLIN;
-- 
2.9.5


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] plm: setup immutil wrapper profile [#2708]

2017-11-28 Thread Ravi Sekhar Reddy Konda

Hi Alex,

Ack, Code review only

Regards,
Ravi
- Original Message -
From: alex.jo...@genband.com
To: mathi.np@gmail.com, ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net, alex.jo...@genband.com
Sent: Tuesday, November 28, 2017 10:34:48 PM GMT +05:30 Chennai, Kolkata, 
Mumbai, New Delhi
Subject: [PATCH 1/1] plm: setup immutil wrapper profile [#2708]

If an immutil_XXX call fails, PLM asserts which reboots the node.

immutil wrapper profile is not set.

Set the immutil wrapper profile.
---
 src/plm/plmd/plms_imm.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/plm/plmd/plms_imm.c b/src/plm/plmd/plms_imm.c
index ef61b57..e3882e4 100644
--- a/src/plm/plmd/plms_imm.c
+++ b/src/plm/plmd/plms_imm.c
@@ -4384,6 +4384,12 @@ SaAisErrorT plms_imm_init(void)
 {
SaAisErrorT rc = SA_AIS_OK;
 
+   extern struct ImmutilWrapperProfile immutilWrapperProfile;
+
+   immutilWrapperProfile.errorsAreFatal = false;
+   immutilWrapperProfile.retryInterval = 1000;
+   immutilWrapperProfile.nTries = 180;
+
TRACE_ENTER();
 
do {
-- 
2.9.5


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] clm: WA Two active controllers observed at cluster [#2677]

2017-11-15 Thread Ravi Sekhar Reddy Konda

Hi Hans,

Ack, reviewed and tested

Thanks,
Ravi

-Original Message-
From: Hans Nordeback [mailto:hans.nordeb...@ericsson.com] 
Sent: Friday, November 10, 2017 1:59 PM
To: anders.wid...@ericsson.com; Ravi Sekhar Reddy Konda 
<ravisekhar.ko...@oracle.com>
Cc: opensaf-devel@lists.sourceforge.net; Hans Nordeback 
<hans.nordeb...@ericsson.com>
Subject: [PATCH 1/1] clm: WA Two active controllers observed at cluster [#2677]

With current #2542 solution there is a window in the nid phase when reboots are 
overlapped leading to two active controllers . This patch solves this problem.
---
 00-README.conf  | 11 ++-
 src/base/osaf_utility.c | 44 
 src/base/osaf_utility.h |  5 +
 src/clm/clmnd/main.c|  1 +
 src/nid/nodeinit.cc |  2 ++
 5 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/00-README.conf b/00-README.conf index 630660e3e..5861f1089 100644
--- a/00-README.conf
+++ b/00-README.conf
@@ -643,4 +643,13 @@ If the latency exceeds 4 seconds a sigalrm will be sent 
and the process will be  # echo 100 >  /proc/sys/net/ipv4/igmp_max_memberships
 # To collect gcov data compile and use program 
tools/devel/gcov_collect/osaf_gcov_dump.c.
 # Check the MULTICAST_PORT and MULTICAST_GROUP settings are the same as 
multicast group and port -# above.
\ No newline at end of file
+# above.
+
+If clm adm command for cluster reboot is issued an environment variable 
+OPENSAF_CLUSTER_REBOOT_WAIT_TIME_SEC can be set in opensafd script to 
+specify the time to wait for nodes to be started, except for the active node.
+Default is 3 seconds. A file, "clm_cluster_reboot_in_progress", is 
+created on each node, except on the active node. This file indicates 
+that a cluster reboot is in progress and all nodes needs to delay their 
+start, this to give the active a lead.
+
diff --git a/src/base/osaf_utility.c b/src/base/osaf_utility.c index 
f19871139..fc321dbc7 100644
--- a/src/base/osaf_utility.c
+++ b/src/base/osaf_utility.c
@@ -23,10 +23,54 @@
 #include 
 #include 
 #include 
+
+#include 
+#include 
+#include 
+#include 
 #include "base/ncssysf_def.h"
 #include "base/osaf_time.h"
 #include "osaf/configmake.h"
 
+void osaf_wait_for_active_to_start(void)
+{
+   struct stat statbuf;
+   static char file[NAME_MAX];
+   const char *wait_time_str = NULL;
+   unsigned int wait_time = kDfltClusterRebootWaitTimeSec;
+
+   if ((wait_time_str = getenv("OPENSAF_CLUSTER_REBOOT_WAIT_TIME_SEC")) != 
NULL) {
+   wait_time = strtol(wait_time_str, NULL, 0);
+   }
+   snprintf(file, sizeof(file), PKGLOGDIR "/%s", 
+kClmClusterRebootInProgress);
+
+   if (stat(file, ) != 0) {
+   syslog(LOG_NOTICE, "Reboot file %s not found, startup 
continue...", file);
+   return;
+   }
+
+   syslog(LOG_NOTICE, "Cluster reboot in progress, this node will start 
+in %u second(s)", wait_time);
+
+   sleep(wait_time);
+
+   if (unlink(file) == -1) {
+   syslog(LOG_ERR, "cannot remove file %s: %s", file, 
strerror(errno));
+   }
+}
+
+void osaf_create_cluster_reboot_in_progress_file(void)
+{
+   static char file[NAME_MAX];
+   snprintf(file, sizeof(file), PKGLOGDIR "/%s", 
kClmClusterRebootInProgress);
+   int fd;
+
+   if ((fd = open(file, O_RDWR | O_CREAT, 0644)) < 0) {
+   syslog(LOG_ERR, "Open %s failed, %s", file, strerror(errno));
+   return;
+   }
+   close(fd);
+}
+
 void osaf_abort(long i_cause)
 {
syslog(LOG_ERR, "osaf_abort(%ld) called from %p with errno=%d", 
i_cause, diff --git a/src/base/osaf_utility.h b/src/base/osaf_utility.h index 
b935c5003..825cf07d2 100644
--- a/src/base/osaf_utility.h
+++ b/src/base/osaf_utility.h
@@ -30,6 +30,8 @@
 extern "C" {
 #endif
 
+#define kClmClusterRebootInProgress "clm_cluster_reboot_in_progress"
+enum { kDfltClusterRebootWaitTimeSec = 3 };
 enum { kOsafUseSafeReboot = 1 };
 
 /**
@@ -71,6 +73,9 @@ extern void osaf_abort(long i_cause) __attribute__((
 
 extern void osaf_safe_reboot(void) __attribute__((nothrow));
 
+extern void osaf_wait_for_active_to_start(void);
+extern void osaf_create_cluster_reboot_in_progress_file(void);
+
 static inline void osaf_mutex_lock_ordie(pthread_mutex_t* io_mutex) {
   int result = pthread_mutex_lock(io_mutex);
   if (result != 0) osaf_abort(result);
diff --git a/src/clm/clmnd/main.c b/src/clm/clmnd/main.c index 
926c5b718..fa46b780f 100644
--- a/src/clm/clmnd/main.c
+++ b/src/clm/clmnd/main.c
@@ -125,6 +125,7 @@ static uint32_t clmna_mds_dec(struct ncsmds_callback_info 
*info)
// Reboot will be performed by CLMS for this node.
if (clmna_cb->node_info.node_id !=
msg->info.reboot_info.node_id) {
+   osaf_creat

Re: [devel] [PATCH 1/1] plm: terminate child EEs when parent is terminated [#2572]

2017-11-12 Thread Ravi Sekhar Reddy Konda

HI Alex,

Ack, but a minor change
In the util routine plms_is_chld, rhe return code variable name is "ret_err",
please change it to "is_child". Because this routine  returns true or false 
based on whether it is child or not, but does not return any error.
I understand it is existing code only, as you are modifying its better to 
correct it.

Thanks,
Ravi

-Original Message-
From: Alex Jones [mailto:alex.jo...@genband.com] 
Sent: Friday, November 10, 2017 7:55 PM
To: mathi.np....@gmail.com; Ravi Sekhar Reddy Konda 
<ravisekhar.ko...@oracle.com>
Cc: opensaf-devel@lists.sourceforge.net; Alex Jones <alex.jo...@genband.com>
Subject: [PATCH 1/1] plm: terminate child EEs when parent is terminated [#2572]

If the hypervisor EE is terminated by shutting down plmcd on the host, the 
child EEs (VMs) are not terminated and put into Uninstantiated state. Then when 
plmcd is restarted on the host, the child VMs are not started or restarted 
because PLM thinks they are instantiated.

Logic is wrong in plms_plmc_tcp_disconnect_process when checking for child EEs.

Terminate child EEs when the host is terminated.
---
 src/plm/common/plms_utils.h | 2 +-
 src/plm/plmd/plms_plmc.c| 6 ++
 src/plm/plmd/plms_utils.c   | 4 ++--
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/src/plm/common/plms_utils.h b/src/plm/common/plms_utils.h index 
551e1f8..5292e15 100644
--- a/src/plm/common/plms_utils.h
+++ b/src/plm/common/plms_utils.h
@@ -17,7 +17,7 @@
 #define PLM_COMMON_PLMS_UTILS_H_
 
 uint32_t plms_anc_chld_dep_adm_flag_is_set(PLMS_ENTITY *, PLMS_GROUP_ENTITY 
*); -uint32_t plms_is_chld(PLMS_ENTITY *, PLMS_ENTITY *);
+bool plms_is_chld(PLMS_ENTITY *, PLMS_ENTITY *);
 void plms_affected_ent_list_get(PLMS_ENTITY *, PLMS_GROUP_ENTITY **, SaBoolT); 
 uint32_t plms_chld_get(PLMS_ENTITY *, PLMS_GROUP_ENTITY **);  uint32_t 
plms_aff_he_find(PLMS_GROUP_ENTITY *, PLMS_GROUP_ENTITY **); diff --git 
a/src/plm/plmd/plms_plmc.c b/src/plm/plmd/plms_plmc.c index 8c694d9..0667099 
100644
--- a/src/plm/plmd/plms_plmc.c
+++ b/src/plm/plmd/plms_plmc.c
@@ -615,8 +615,7 @@ SaUint32T plms_plmc_tcp_disconnect_process(PLMS_ENTITY *ent)
}
 
/* Terminate all the dependent EEs.*/
-   if (NCSCC_RC_SUCCESS !=
-   plms_is_chld(ent, head->plm_entity)) {
+   if (plms_is_chld(ent, head->plm_entity)) {
ret_err = plms_ee_term(head->plm_entity, false,
   true /*mngt_cbk*/);
if (NCSCC_RC_SUCCESS != ret_err) {
@@ -2605,8 +2604,7 @@ SaUint32T plms_ee_term_failed_tmr_exp(PLMS_ENTITY *ent)
SA_PLM_NTFID_STATE_CHANGE_DEP);
 
/* Terminate all the dependent EEs.*/
-   if (NCSCC_RC_SUCCESS !=
-   plms_is_chld(ent, head->plm_entity)) {
+   if (plms_is_chld(ent, head->plm_entity)) {
ret_err = plms_ee_term(head->plm_entity, false,
   true /*mngt_cbk*/);
if (NCSCC_RC_SUCCESS != ret_err) {
diff --git a/src/plm/plmd/plms_utils.c b/src/plm/plmd/plms_utils.c index 
3d3b5eb..267b88b 100644
--- a/src/plm/plmd/plms_utils.c
+++ b/src/plm/plmd/plms_utils.c
@@ -840,9 +840,9 @@ void plms_aff_chld_ent_list_get(PLMS_ENTITY *root_ent, 
PLMS_ENTITY *ent,
 @return: false - If ent is not a child of root.
  true - If ent is a child of root.
 **/
-SaUint32T plms_is_chld(PLMS_ENTITY *root, PLMS_ENTITY *ent)
+bool plms_is_chld(PLMS_ENTITY *root, PLMS_ENTITY *ent)
 {
-   SaUint32T ret_err;
+   bool ret_err;
PLMS_GROUP_ENTITY *chld_list = NULL;
 
TRACE_ENTER2("Root Entity: %s", root->dn_name_str);
--
2.9.5


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfnd: Return TRY_AGAIN for pg track start/stop during headless sync [#2660]

2017-11-05 Thread Ravi Sekhar Reddy Konda

Hi Minh,

Ack, Reviewed & Tested.

Regards,
Ravi
-Original Message-
From: Minh Chau [mailto:minh.c...@dektech.com.au] 
Sent: Monday, November 06, 2017 8:15 AM
To: hans.nordeb...@ericsson.com; ravisekhar.ko...@oracle.com; 
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
Subject: [PATCH 1/1] amfnd: Return TRY_AGAIN for pg track start/stop during 
headless sync [#2660]

The problem with pg track start/stop is similar to data_update msg that caused 
msg id out of order in ticket #2601. Amfnd can buffer the pg track/stop 
messages in the same way as data_update, instead amfnd can also return 
TRY_AGAIN while the headless sync is still going on. That avoids buffering 
messages and re-attaching msg_id and resend the pg track messages after 
headless sync. The retry mechanism is done at application's side, who should 
already being handle TRY_AGAIN return code.
---
 src/amf/amfd/pg.cc  |  2 +-
 src/amf/amfnd/pg.cc | 10 ++
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/amf/amfd/pg.cc b/src/amf/amfd/pg.cc index c087d021..4d1a292 
100644
--- a/src/amf/amfd/pg.cc
+++ b/src/amf/amfd/pg.cc
@@ -74,7 +74,7 @@ void avd_pg_trk_act_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
 goto done;
 
   /* Update the receive id for the node */
-  m_AVD_SET_AVND_RCV_ID(cb, node, (n2d_msg->msg_info.n2d_reg_su.msg_id));
+  m_AVD_SET_AVND_RCV_ID(cb, node, 
+ (n2d_msg->msg_info.n2d_pg_trk_act.msg_id));
 
   if ((node->node_state == AVD_AVND_STATE_ABSENT) ||
   (node->node_state == AVD_AVND_STATE_GO_DOWN)) { diff --git 
a/src/amf/amfnd/pg.cc b/src/amf/amfnd/pg.cc index 441ddb2..2f4dbad 100644
--- a/src/amf/amfnd/pg.cc
+++ b/src/amf/amfnd/pg.cc
@@ -149,8 +149,9 @@ uint32_t avnd_evt_ava_pg_start_evh(AVND_CB *cb, AVND_EVT 
*evt) {
   TRACE_ENTER();
 
   // if headless, return TRY_AGAIN to application
-  if (cb->is_avd_down == true) {
-LOG_NO("Director is down. Return try again for PG start.");
+  if (cb->is_avd_down == true || cb->amfd_sync_required == true) {
+LOG_NO("Director is down(%d), or sync is required(%d). Return try again"
+"for PG start.", cb->is_avd_down, cb->amfd_sync_required);
 rc = avnd_amf_resp_send(cb, AVSV_AMF_PG_START, SA_AIS_ERR_TRY_AGAIN, 0,
 _info->dest, >mds_ctxt, nullptr, false);
 TRACE_LEAVE();
@@ -242,8 +243,9 @@ uint32_t avnd_evt_ava_pg_stop_evh(AVND_CB *cb, AVND_EVT 
*evt) {
   TRACE_ENTER();
 
   // if headless, return TRY_AGAIN to application
-  if (cb->is_avd_down == true) {
-LOG_NO("Director is down. Return try again for PG stop.");
+  if (cb->is_avd_down == true || cb->amfd_sync_required == true) {
+LOG_NO("Director is down(%d), or sync is required(%d). Return try again"
+"for PG stop.", cb->is_avd_down, cb->amfd_sync_required);
 rc = avnd_amf_resp_send(cb, AVSV_AMF_PG_STOP, SA_AIS_ERR_TRY_AGAIN, 0,
 _info->dest, >mds_ctxt, nullptr, false);
 TRACE_LEAVE();
--
2.7.4

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfd: Add retry mechanism for ClmTrackStart/Stop as job queue V2 [#2631]

2017-11-03 Thread Ravi Sekhar Reddy Konda

Hi Minh,

Ack, reviewed and tested

Thanks,
Ravi 

-Original Message-
From: Minh Chau [mailto:minh.c...@dektech.com.au] 
Sent: Friday, November 03, 2017 12:28 PM
To: hans.nordeb...@ericsson.com; ravisekhar.ko...@oracle.com; 
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
Subject: [PATCH 1/1] amfd: Add retry mechanism for ClmTrackStart/Stop as job 
queue V2 [#2631]

Currenty amfd does not retry ClmTrackStart/Stop after swicthover if the CLM 
APIs are still unavailable due to high loaded system or connectivity issue. 
This patch adds a retry mechanism for CLM APIs as other existing IMM/NTF APIs
---
 src/amf/amfd/clm.cc   | 79 +--
 src/amf/amfd/clm.h| 27 ++--
 src/amf/amfd/imm.cc   | 28 +
 src/amf/amfd/imm.h|  3 ++
 src/amf/amfd/role.cc  | 16 --
 src/amf/amfd/sg_2n_fsm.cc |  3 +-
 src/amf/amfd/sgproc.cc|  3 +-
 7 files changed, 135 insertions(+), 24 deletions(-)

diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc index d8342ca..2bcea2d 
100644
--- a/src/amf/amfd/clm.cc
+++ b/src/amf/amfd/clm.cc
@@ -475,46 +475,63 @@ done:
   return error;
 }
 
-SaAisErrorT avd_clm_track_start(void) {
+SaAisErrorT avd_clm_track_start(AVD_CL_CB* cb) {
   SaUint8T trackFlags = SA_TRACK_CURRENT | SA_TRACK_CHANGES_ONLY |
 SA_TRACK_VALIDATE_STEP | SA_TRACK_START_STEP;
 
   TRACE_ENTER();
-  SaAisErrorT error =
-  saClmClusterTrack_4(avd_cb->clmHandle, trackFlags, nullptr);
+  SaAisErrorT error = SA_AIS_OK;
+
+  if (cb->is_clm_track_started == true) {
+// abort all pending and unsuccessful jobs that stop tracking
+// because at this moment, amfd wants to start cluster tracking
+Fifo::remove(cb, JOB_TYPE_CLM);
+  }
+
+  error = saClmClusterTrack_4(cb->clmHandle, trackFlags, nullptr);
   if (error != SA_AIS_OK) {
 if (error == SA_AIS_ERR_TRY_AGAIN || error == SA_AIS_ERR_TIMEOUT ||
 error == SA_AIS_ERR_UNAVAILABLE) {
   LOG_WA("Failed to start cluster tracking %u", error);
+  error = SA_AIS_ERR_TRY_AGAIN;
 } else {
   LOG_ER("Failed to start cluster tracking %u", error);
 }
   } else {
-avd_cb->is_clm_track_started = true;
+cb->is_clm_track_started = true;
   }
   TRACE_LEAVE();
   return error;
 }
 
-SaAisErrorT avd_clm_track_stop(void) {
+SaAisErrorT avd_clm_track_stop(AVD_CL_CB* cb) {
   TRACE_ENTER();
-  SaAisErrorT error = saClmClusterTrackStop(avd_cb->clmHandle);
+  SaAisErrorT error = SA_AIS_OK;
+
+  if (cb->is_clm_track_started == false) {
+// abort all pending and unsuccessful jobs that start tracking
+// because at this moment, amfd wants to sttop cluster tracking
+Fifo::remove(cb, JOB_TYPE_CLM);
+  }
+
+  error = saClmClusterTrackStop(cb->clmHandle);
   if (error != SA_AIS_OK) {
 if (error == SA_AIS_ERR_TRY_AGAIN || error == SA_AIS_ERR_TIMEOUT ||
 error == SA_AIS_ERR_UNAVAILABLE) {
   LOG_WA("Failed to stop cluster tracking %u", error);
+  error = SA_AIS_ERR_TRY_AGAIN;
 } else if (error == SA_AIS_ERR_NOT_EXIST) {
   /* track changes was not started or stopped successfully */
   LOG_WA("Failed to stop cluster tracking %u", error);
-  avd_cb->is_clm_track_started = false;
+  cb->is_clm_track_started = false;
+  error = SA_AIS_OK;
 } else {
   LOG_ER("Failed to stop cluster tracking %u", error);
 }
   } else {
 TRACE("Sucessfully stops cluster tracking");
-avd_cb->is_clm_track_started = false;
+cb->is_clm_track_started = false;
   }
-
   TRACE_LEAVE();
   return error;
 }
@@ -550,7 +567,7 @@ static void *avd_clm_init_thread(void *arg) {
 
   if (cb->avail_state_avd == SA_AMF_HA_ACTIVE) {
 for (;;) {
-  error = avd_clm_track_start();
+  error = avd_clm_track_start(cb);
   if (error == SA_AIS_ERR_TRY_AGAIN || error == SA_AIS_ERR_TIMEOUT ||
   error == SA_AIS_ERR_UNAVAILABLE) {
 osaf_nanosleep();
@@ -584,3 +601,45 @@ SaAisErrorT avd_start_clm_init_bg(void) {
   pthread_attr_destroy();
   return SA_AIS_OK;
 }
+
+AvdJobDequeueResultT ClmTrackStart::exec(const AVD_CL_CB* cb) {
+  AvdJobDequeueResultT res;
+  TRACE_ENTER();
+
+  SaAisErrorT rc = avd_clm_track_start(const_cast(cb));
+  if (rc == SA_AIS_OK) {
+delete Fifo::dequeue();
+res = JOB_EXECUTED;
+  } else if (rc == SA_AIS_ERR_TRY_AGAIN) {
+TRACE("TRY-AGAIN");
+res = JOB_ETRYAGAIN;
+  } else {
+delete Fifo::dequeue();
+LOG_ER("%s: ClmTrackStart FAILED %u", __FUNCTION__, rc);
+res = JOB_ERR;
+  }
+
+  TRACE_LEAVE();
+  return res;
+}
+
+AvdJobDequeueResultT ClmTrackStop::exec(const AVD_CL_CB* cb) {
+  AvdJobDequeueResultT res;
+  TRACE_ENTER();
+
+  SaAisErrorT rc = avd_clm_track_stop(const_cast(cb));
+  if (rc == SA_AIS_OK) {
+delete Fifo::dequeue();
+res = JOB_EXECUTED;
+  } else if (rc == SA_AIS_ERR_TRY_AGAIN) {
+TRACE("TRY-AGAIN");
+res =

Re: [devel] [PATCH 1/1] amfnd: do not refresh opensaf components [#2627]

2017-11-03 Thread Ravi Sekhar Reddy Konda

Hi Gary,

Ack for the patch

Thanks,
Ravi



-Original Message-
From: Gary Lee [mailto:gary@dektech.com.au] 
Sent: Wednesday, October 25, 2017 12:22 PM
To: Ravi Sekhar Reddy Konda <ravisekhar.ko...@oracle.com>; Hans Nordeback 
<hans.nordeb...@ericsson.com>; Minh Hon Chau <minh.c...@dektech.com.au>
Cc: opensaf-devel@lists.sourceforge.net; Gary Lee <gary@dektech.com.au>
Subject: [PATCH 1/1] amfnd: do not refresh opensaf components [#2627]

If an OpenSAF component requires reload after an upgrade, then AMF should also 
be upgraded, requiring a node reboot. Thus there is no need to refresh 
component config.

The patch fixes a problem where OpenSAF has been upgraded and is about to be 
rebooted. Before the reboot occurs, IMMND aborts due to message loss, and AMFND 
is trying to reload IMMND's configuration. Since IMMND is down, AMFND is stuck 
in saImmOmInitialize(), before eventually being killed by the AMF watchdog.
---
 src/amf/amfnd/compdb.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amf/amfnd/compdb.cc b/src/amf/amfnd/compdb.cc index 
1aa3863ec..1ecd1a1c9 100644
--- a/src/amf/amfnd/compdb.cc
+++ b/src/amf/amfnd/compdb.cc
@@ -1687,7 +1687,7 @@ int avnd_comp_config_reinit(AVND_COMP *comp) {
   ** At first time instantiation of OpenSAF components we cannot go
   ** to IMM since we would deadloack.
   */
-  if (comp->config_is_valid) {
+  if (comp->config_is_valid || comp->su->is_ncs == true) {
 res = 0;
 goto done1;
   }
--
2.11.0

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfd: Add retry mechanism for ClmTrackStart/Stop as job queue [#2631]

2017-10-31 Thread Ravi Sekhar Reddy Konda

Hi Minh,

I am fine with the patch, but  have couple of queries, please see my comments 
inline

Regards,
Ravi

-Original Message-
From: Minh Chau [mailto:minh.c...@dektech.com.au] 
Sent: Friday, October 20, 2017 4:06 AM
To: hans.nordeb...@ericsson.com; ravisekhar.ko...@oracle.com; 
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
Subject: [PATCH 1/1] amfd: Add retry mechanism for ClmTrackStart/Stop as job 
queue [#2631]

Currenty amfd does not retry ClmTrackStart/Stop after swicthover if the CLM 
APIs are still unavailable due to high loaded system or connectivity issue. 
This patch adds a retry mechanism for CLM APIs as other existing IMM/NTF APIs
---
 src/amf/amfd/clm.cc   | 83 +--
 src/amf/amfd/clm.h| 27 +--
 src/amf/amfd/imm.h|  1 +
 src/amf/amfd/role.cc  | 15 +++--
 src/amf/amfd/sg_2n_fsm.cc |  3 +-
 src/amf/amfd/sgproc.cc|  3 +-
 6 files changed, 108 insertions(+), 24 deletions(-)

diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc index d8342ca..1c82620 
100644
--- a/src/amf/amfd/clm.cc
+++ b/src/amf/amfd/clm.cc
@@ -475,13 +475,16 @@ done:
   return error;
 }
 
-SaAisErrorT avd_clm_track_start(void) {
+SaAisErrorT avd_clm_track_start(AVD_CL_CB* cb) {
   SaUint8T trackFlags = SA_TRACK_CURRENT | SA_TRACK_CHANGES_ONLY |
 SA_TRACK_VALIDATE_STEP | SA_TRACK_START_STEP;
 
   TRACE_ENTER();
-  SaAisErrorT error =
-  saClmClusterTrack_4(avd_cb->clmHandle, trackFlags, nullptr);
+  SaAisErrorT error = SA_AIS_ERR_TRY_AGAIN;
+
+  if (cb->is_clm_track_started == true) goto done;
+
+  error = saClmClusterTrack_4(cb->clmHandle, trackFlags, nullptr);
   if (error != SA_AIS_OK) {
 if (error == SA_AIS_ERR_TRY_AGAIN || error == SA_AIS_ERR_TIMEOUT ||
 error == SA_AIS_ERR_UNAVAILABLE) { @@ -490,15 +493,20 @@ SaAisErrorT 
avd_clm_track_start(void) {
   LOG_ER("Failed to start cluster tracking %u", error);
 }
   } else {
-avd_cb->is_clm_track_started = true;
+cb->is_clm_track_started = true;
   }
+done:
   TRACE_LEAVE();
   return error;
 }
 
-SaAisErrorT avd_clm_track_stop(void) {
+SaAisErrorT avd_clm_track_stop(AVD_CL_CB* cb) {
   TRACE_ENTER();
-  SaAisErrorT error = saClmClusterTrackStop(avd_cb->clmHandle);
+  SaAisErrorT error = SA_AIS_ERR_TRY_AGAIN;
+
+  if (cb->is_clm_track_started == false) goto done;
+
+  error = saClmClusterTrackStop(cb->clmHandle);
   if (error != SA_AIS_OK) {
 if (error == SA_AIS_ERR_TRY_AGAIN || error == SA_AIS_ERR_TIMEOUT ||
 error == SA_AIS_ERR_UNAVAILABLE) { @@ -506,15 +514,16 @@ SaAisErrorT 
avd_clm_track_stop(void) {
 } else if (error == SA_AIS_ERR_NOT_EXIST) {
   /* track changes was not started or stopped successfully */
   LOG_WA("Failed to stop cluster tracking %u", error);
-  avd_cb->is_clm_track_started = false;
+  cb->is_clm_track_started = false;
+  error = SA_AIS_OK;
 } else {
   LOG_ER("Failed to stop cluster tracking %u", error);
 }
   } else {
 TRACE("Sucessfully stops cluster tracking");
-avd_cb->is_clm_track_started = false;
+cb->is_clm_track_started = false;
   }
-
+done:
   TRACE_LEAVE();
   return error;
 }
@@ -550,7 +559,7 @@ static void *avd_clm_init_thread(void *arg) {
 
   if (cb->avail_state_avd == SA_AMF_HA_ACTIVE) {
 for (;;) {
-  error = avd_clm_track_start();
+  error = avd_clm_track_start(cb);
   if (error == SA_AIS_ERR_TRY_AGAIN || error == SA_AIS_ERR_TIMEOUT ||
   error == SA_AIS_ERR_UNAVAILABLE) {
 osaf_nanosleep();
@@ -584,3 +593,57 @@ SaAisErrorT avd_start_clm_init_bg(void) {
   pthread_attr_destroy();
   return SA_AIS_OK;
 }
+
+AvdJobDequeueResultT ClmTrackStart::exec(const AVD_CL_CB* cb) {
+  AvdJobDequeueResultT res;
+  TRACE_ENTER();
+
+  SaAisErrorT rc = avd_clm_track_start(const_cast(cb));
+  if (rc == SA_AIS_OK) {
+delete Fifo::dequeue();
+res = JOB_EXECUTED;
+  } else if (rc == SA_AIS_ERR_TRY_AGAIN) {
+TRACE("TRY-AGAIN");
+res = JOB_ETRYAGAIN;
+  } else if (rc == SA_AIS_ERR_TIMEOUT) {
+TRACE("TIMEOUT");
+res = JOB_ETRYAGAIN;
+  } else if (rc == SA_AIS_ERR_UNAVAILABLE) {
+TRACE("UNAVAILABLE");
+res = JOB_ETRYAGAIN;
+  } else {
+delete Fifo::dequeue();
+LOG_ER("%s: ClmTrackStart FAILED %u", __FUNCTION__, rc);
+res = JOB_ERR;
+  }
+
+  TRACE_LEAVE();
+  return res;
+}
+
+AvdJobDequeueResultT ClmTrackStop::exec(const AVD_CL_CB* cb) {
+  AvdJobDequeueResultT res;
+  TRACE_ENTER();
+
+  SaAisErrorT rc = avd_clm_track_stop(const_cast(cb));
+  if (rc == SA_AIS_OK) {
+delete Fifo::dequeue();
+res = JOB_EXECUTED;
+  } else if (rc == SA_AIS_ERR_TRY_AGAIN) {
+TRACE("TRY-AGAIN");
+res = JOB_ETRYAGAIN;
+  } else if (rc == SA_AIS_ERR_TIMEOUT) {
+TRACE("TIMEOUT");
+res = JOB_ETRYAGAIN;
+  } else if (rc == SA_AIS_ERR_UNAVAILABLE) {
+TRACE("UNAVAILABLE");
+res =

Re: [devel] [PATCH 1/1] amfnd: fix segv in ncs_tmr_stop V2 [#2658]

2017-10-30 Thread Ravi Sekhar Reddy Konda

Hi Hans,

Ack (Code review only)

Regards,
Ravi
- Original Message -
From: hans.nordeb...@ericsson.com
To: gary@dektech.com.au, ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net, hans.nordeb...@ericsson.com
Sent: Monday, October 30, 2017 8:27:49 PM GMT +05:30 Chennai, Kolkata, Mumbai, 
New Delhi
Subject: [PATCH 1/1] amfnd: fix segv in ncs_tmr_stop V2 [#2658]

---
 src/amf/amfnd/di.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
index 7aac34260..2043c6064 100644
--- a/src/amf/amfnd/di.cc
+++ b/src/amf/amfnd/di.cc
@@ -1300,13 +1300,14 @@ void avnd_di_msg_ack_process(AVND_CB *cb, uint32_t mid) 
{
 
 // matching record
 if (msg_id == mid) {
+  cb->dnd_list.erase(iter);
+  // iter is now invalid, exit iterator loop asap
   if (rec->msg.info.avd->msg_type == AVSV_N2D_NODE_DOWN_MSG) {
 // first to stop timer to avoid processing timeout event
 // then perform last step clean up
 avnd_stop_tmr(cb, >resp_tmr);
 avnd_last_step_clean(cb);
   }
-  cb->dnd_list.erase(iter);
   TRACE("remove msg %u from queue", msg_id);
   avnd_diq_rec_del(cb, rec);
   break;
-- 
2.14.2


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amf: Buffer and resend data req messages in Headless state [#2601]

2017-10-25 Thread Ravi Sekhar Reddy Konda

Hi Minh & Gary,

As suggested its better to handle D2N_PRESENCE_SU and PG_TRACK in a separate 
Ticket

For DATA_REQUEST messages, I agree with the suggestions. I incorporated the 
suggestions and tested, its working fine.
I am sending the diff patch on top of the original patch, If you are fine I 
will push the patch

Thanks,
Ravi

- Original Message -
From: minh.c...@dektech.com.au
To: gary@dektech.com.au, ravisekhar.ko...@oracle.com, 
minh-c...@users.sf.net, hans.nordeb...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Sent: Wednesday, October 25, 2017 11:15:58 AM GMT +05:30 Chennai, Kolkata, 
Mumbai, New Delhi
Subject: Re: [devel] [PATCH 1/1] amf: Buffer and resend data req messages in 
Headless state [#2601]

Hi,

Agree with Ravi that the msg N2D_REG_SU should be queued too, otherwise 
there will be no D2N_PRESENCE_SU msg.

So the condition to queue N2D_INFO_SU_SI_ASSIGN, N2D_OPERATION_STATE and 
D2N_PRESENCE_SU should be:

cb->is_avd_down == true || cb->amfd_sync_required == true

The reason is the above 3 messages won't be processed until amfd 
finishes sync

Also agree with Gary that N2D_DATA_REQUEST should be queued with: 
cb->is_avd_down == false && cb->amfd_sync_required == true
The reason is the DATA is carried in sync_info msg. After amfd is up and 
before node_up is accepted, if the DATA_REQUEST msg is sent and rejected 
by amfd, it's still in the queue and will be resent after sync by 
avnd_diq_rec_send_buffered_msg(). However, the msgs in queue could have 
the msg_id not in ascending order.

For the PG track msg, I think we probably need to return TRY_AGAIN on 
waiting for acceptance of node_up too, because the pg track is resent 
when node_up is accepted (on stop resp_tmr timer), so the PG track may 
be duplicated if PG track is called just right after amfd is up. Also, 
it could cause the msg id out of order in the same way of DATA_REQUEST 
in this ticket too.

I can check the D2N_PRESENCE_SU and PG track and raise separated tickets 
if there're bugs (most likely)
The solution of the patch looks ok to me within Gary's comment.

Thanks,
Minh
On 25/10/17 14:58, Gary Lee wrote:
> Hi Ravi
>
> >From what I can see, the main problem is AMFND is sending messages after 
> >AMFD has come up, but before NODE_UP has been accepted by AMFD.
> If that is correct, then PG track messages, like DATA_REQUEST, should be in 
> the same situation as it’s only checking is_avd_down.
>
> uint32_t avnd_evt_ava_pg_start_evh(AVND_CB *cb, AVND_EVT *evt) {
>
>// if headless, return TRY_AGAIN to application
>if (cb->is_avd_down == true) {
>  LOG_NO("Director is down. Return try again for PG start.");
>   ..
>}
>..
> }
>
> I think we should only buffer these messages if “is_avd_down == false && 
> amfd_sync_required == true”.
> We should be able to drop DATA_REQUEST when it’s headless, since that type of 
> information is sync’ed with AVSV_N2D_ND_CSICOMP_STATE_INFO_MSG and 
> AVSV_N2D_ND_SISU_STATE_INFO_MSG.
>
> Thanks
> Gary
>
> On 24/10/17, 11:14 pm, "Ravi Sekhar Reddy Konda" 
> <ravisekhar.ko...@oracle.com> wrote:
>
>  Hi Gary,
>  
>  There are some messages which does not required to be queued like 
> PG_TRACK & NODE_DOWN messages
>  During SC absence period, we are returning SA_AMF_ERROR_TRY_AGAIN for PG 
> Track operations
>  
>  Currently along with this Fix we are queuing the following messages
>  AVSV_N2D_INFO_SU_SI_ASSIGN_MSG
>  AVSV_N2D_OPERATION_STATE_MSG
>  AVSV_N2D_DATA_REQUEST_MSG
>  
>  only missing event is Reg SU response event (AVSV_N2D_REG_SU_MSG), I 
> think this event has to be queued.
>  
>  if you agree with this. I will update the patch by adding queuing for 
> Reg SU response event also
>  
>  
>  Thanks,
>  Ravi
>  
>  -Original Message-
>  From: Gary Lee [mailto:gary@dektech.com.au]
>  Sent: Wednesday, October 18, 2017 10:57 AM
>  To: ravi-sekhar <ravisekhar.ko...@oracle.com>; 
> hans.nordeb...@ericsson.com
>  Cc: opensaf-devel@lists.sourceforge.net
>  Subject: Re: [devel] [PATCH 1/1] amf: Buffer and resend data req 
> messages in Headless state [#2601]
>  
>  Hi Ravi
>  
>  I’ve started looking at this. My initial thought is that perhaps we need 
> to queue up all messages when is_avd_down == false && amfd_sync_required == 
> true (ie. AMFD has come up but haven’t accepted node_up). What do you think?
>  
>  Will get back to you.
>  
>  /Gary
>  
>  -Original Message-
>  From: ravi-sekhar <ravisekhar.ko...@oracle.com>
>  Date: Tuesday, 17 October 2017 at 10:39 pm
>  To: <h

Re: [devel] [PATCH 1/1] amf: Buffer and resend data req messages in Headless state [#2601]

2017-10-24 Thread Ravi Sekhar Reddy Konda

Hi Gary,

There are some messages which does not required to be queued like PG_TRACK & 
NODE_DOWN messages
During SC absence period, we are returning SA_AMF_ERROR_TRY_AGAIN for PG Track 
operations 

Currently along with this Fix we are queuing the following messages 
AVSV_N2D_INFO_SU_SI_ASSIGN_MSG
AVSV_N2D_OPERATION_STATE_MSG
AVSV_N2D_DATA_REQUEST_MSG

only missing event is Reg SU response event (AVSV_N2D_REG_SU_MSG), I think this 
event has to be queued.

if you agree with this. I will update the patch by adding queuing for Reg SU 
response event also


Thanks,
Ravi

-Original Message-
From: Gary Lee [mailto:gary@dektech.com.au] 
Sent: Wednesday, October 18, 2017 10:57 AM
To: ravi-sekhar ; hans.nordeb...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] amf: Buffer and resend data req messages in 
Headless state [#2601]

Hi Ravi

I’ve started looking at this. My initial thought is that perhaps we need to 
queue up all messages when is_avd_down == false && amfd_sync_required == true 
(ie. AMFD has come up but haven’t accepted node_up). What do you think?

Will get back to you.

/Gary

-Original Message-
From: ravi-sekhar 
Date: Tuesday, 17 October 2017 at 10:39 pm
To: 
Cc: 
Subject: [devel] [PATCH 1/1] amf: Buffer and resend data req messages in 
Headless state [#2601]

---
 src/amf/amfnd/di.cc | 42 +++---
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
index 2dc023c..1e0d682 100644
--- a/src/amf/amfnd/di.cc
+++ b/src/amf/amfnd/di.cc
@@ -998,21 +998,30 @@ uint32_t avnd_di_object_upd_send(AVND_CB *cb, 
AVSV_PARAM_INFO *param) {
   uint32_t rc = NCSCC_RC_SUCCESS;
   TRACE_ENTER2("Comp '%s'", osaf_extended_name_borrow(>name));
 
-  if (cb->is_avd_down == true) {
-TRACE_LEAVE2("AVD is down. %u", rc);
-return rc;
-  }
-
-  memset(, 0, sizeof(AVND_MSG));
-
   /* populate the msg */
+  memset(, 0, sizeof(AVND_MSG));
   msg.info.avd = static_cast(calloc(1, 
sizeof(AVSV_DND_MSG)));
   msg.type = AVND_MSG_AVD;
   msg.info.avd->msg_type = AVSV_N2D_DATA_REQUEST_MSG;
-  msg.info.avd->msg_info.n2d_data_req.msg_id = ++(cb->snd_msg_id);
   msg.info.avd->msg_info.n2d_data_req.node_id = cb->node_info.nodeId;
   msg.info.avd->msg_info.n2d_data_req.param_info = *param;
 
+  if ((cb->is_avd_down == true) || (cb->amfd_sync_required == true)) {
+msg.info.avd->msg_info.n2d_data_req.msg_id = 0;
+if (avnd_diq_rec_add(cb, ) == nullptr) {
+  rc = NCSCC_RC_FAILURE;
+}
+LOG_NO(
+"avnd_di_object_upd_send() deferred as AMF director is 
offline(%d),"
+" or sync is required(%d)",
+cb->is_avd_down, cb->amfd_sync_required);
+
+TRACE_LEAVE2("AVD is down. %u", rc);
+return rc;
+  } else {
+msg.info.avd->msg_info.n2d_data_req.msg_id = ++(cb->snd_msg_id);
+  }
+
   /* send the msg to AvD */
   rc = avnd_di_msg_send(cb, );
   if (NCSCC_RC_SUCCESS == rc) msg.info.avd = 0;
@@ -1515,9 +1524,20 @@ void avnd_diq_rec_send_buffered_msg(AVND_CB *cb) {
 pending_rec->msg.info.avd->msg_info.n2d_opr_state.rec_rcvr
 .raw);
 ++iter;
-} else {
-  ++iter;
-}
+} else if (pending_rec->msg.info.avd->msg_type == 
AVSV_N2D_DATA_REQUEST_MSG &&
+   pending_rec->msg.info.avd->msg_info.n2d_data_req.msg_id == 
0) {
+pending_rec->msg.info.avd->msg_info.n2d_data_req.msg_id =
+  ++(cb->snd_msg_id);
+
+LOG_NO(
+"Found and resend buffered Data Req msg for SU:'%s', 
msg_id:'%u'",
+osaf_extended_name_borrow(_rec->msg.info.avd->msg_info
+   .n2d_data_req.param_info.name),
+pending_rec->msg.info.avd->msg_info.n2d_data_req.msg_id);
+   ++iter;
+ } else {
+   ++iter;
+ }
   }
 
   TRACE("retransmit message to amfd");
-- 
1.9.1



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! 
https://urldefense.proofpoint.com/v2/url?u=http-3A__sdm.link_slashdot=DwIFaQ=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10=rFCQ76TW5HZUgA7b20ApVcXgXru6mvz4fvCm1_H6w1k=_B8vfWmVodcwGLVWeK6LJ5kV9_6Z39lakp7QX8LKQzk=tpxAiV8CXUFAPD5RvnnpQZoFhOdQrBc7h-EA0c9AfZM=
 
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net

Re: [devel] [PATCH 1/1] amfnd: store pid before sending event [#2650]

2017-10-23 Thread Ravi Sekhar Reddy Konda

Ack ( Code review only)

Regards,
Ravi

-Original Message-
From: Gary Lee [mailto:gary@dektech.com.au] 
Sent: Monday, October 23, 2017 8:48 AM
To: hans.nordeb...@ericsson.com; minh.c...@dektech.com.au; 
ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net; Gary Lee 
Subject: [PATCH 1/1] amfnd: store pid before sending event [#2650]

The event may be processed and pm_rec
deleted by the main thread, before it is read here.
---
 src/amf/amfnd/mon.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/amf/amfnd/mon.cc b/src/amf/amfnd/mon.cc index 
5d5cc2393..9cdfc3797 100644
--- a/src/amf/amfnd/mon.cc
+++ b/src/amf/amfnd/mon.cc
@@ -263,6 +263,7 @@ err:
 uint32_t avnd_send_pid_exit_evt(AVND_CB *cb, AVND_COMP_PM_REC *pm_rec) {
   AVND_EVT *evt;
   uint32_t rc = NCSCC_RC_FAILURE;
+  const SaUint64T pid = pm_rec->pid;
 
   /* create & send the timer event */
   evt = avnd_evt_create(cb, AVND_EVT_PID_EXIT, 0, 0, (void *)pm_rec, 0, 0); @@ 
-271,9 +272,9 @@ uint32_t avnd_send_pid_exit_evt(AVND_CB *cb, AVND_COMP_PM_REC 
*pm_rec) {
   }
 
   if (rc == NCSCC_RC_SUCCESS) {
-TRACE_1("Sent PM (PID: %lld) Exit event", pm_rec->pid);
+TRACE_1("Sent PM (PID: %lld) Exit event", pid);
   } else {
-LOG_ER("Failed to send PM (PID: %lld) exit event", pm_rec->pid);
+LOG_ER("Failed to send PM (PID: %lld) exit event", pid);
   }
 
   return rc;
--
2.11.0

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] CLM issue #2088 not oserved now

2017-10-20 Thread Ravi Sekhar Reddy Konda

Hi Mathi & Anders,


Regarding CLM issue #2088 "saClmClusterNodeGetAsync returns OK on a non member 
node" 

I am not observing this issue now

here is the tests performed

On two node systems
Locked the CLM Node SC-2 and then invoked the async API returns ERR_UNAVAILABLE


root@SC-2:~# clmprint -a 0x2020f
node_id:131599(2020f)

===CLM NODE GET CALLBACK STARTS==
Error: SA_AIS_ERR_UNAVAILABLE (31)
Invocation: 0
error - InvocationId wrong,expected: , received: 0
===CLM NODE GET CALLBACK ENDS==

Let me know if the issue can be closed now


Thanks,
Ravi

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] develop branch build is failing

2017-10-20 Thread Ravi Sekhar Reddy Konda

Hi All,

Its building fine after doing make uninstall and then make
I think its problem with old .so 

Thanks,
Ravi
- Original Message -
From: ravisekhar.ko...@oracle.com
To: opensaf-devel@lists.sourceforge.net
Sent: Friday, October 20, 2017 2:03:17 PM GMT +05:30 Chennai, Kolkata, Mumbai, 
New Delhi
Subject: [devel] develop branch build is failing


Hi All,

Make is failing with following error 

  CCLD bin/immadm
/usr/local/lib/opensaf/libimm_common.so.0: undefined reference to 
`_logtrace_trace'
/usr/local/lib/opensaf/libimm_common.so.0: undefined reference to 
`_logtrace_log'
collect2: error: ld returned 1 exit status
make[2]: *** [bin/immadm] Error 1
make[2]: Leaving directory `/root/osaf_latest'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/root/osaf_latest'
make: *** [all] Error 2



Here is the last changeset, with which I am building 

root@PL-5:~/osaf_latest# git log | more
commit 1c58a2106a55ad212a8e296424b1f20508eeb9cd
Author: Lennart Lund 
Date:   Thu Oct 19 15:17:27 2017 +0200

smf: coredump and syslog flood after immnd crash [#2441]

Thanks,
Ravi

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! 
https://urldefense.proofpoint.com/v2/url?u=http-3A__sdm.link_slashdot=DwICAg=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10=rFCQ76TW5HZUgA7b20ApVcXgXru6mvz4fvCm1_H6w1k=tnI13Hf_kVjeMGtW13ygt6HOOWJIGXgx1rzBDJFZki8=cPOLG6fUoDfslyCe8srrTH3HtZVsrGG6pzs7XSYjQAo=
 
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_opensaf-2Ddevel=DwICAg=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10=rFCQ76TW5HZUgA7b20ApVcXgXru6mvz4fvCm1_H6w1k=tnI13Hf_kVjeMGtW13ygt6HOOWJIGXgx1rzBDJFZki8=_l6yoloWh8jwMRPH7dMymfHxwWlg3FshX1yvJOc94n4=
 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] develop branch build is failing

2017-10-20 Thread Ravi Sekhar Reddy Konda


Hi All,

Make is failing with following error 

  CCLD bin/immadm
/usr/local/lib/opensaf/libimm_common.so.0: undefined reference to 
`_logtrace_trace'
/usr/local/lib/opensaf/libimm_common.so.0: undefined reference to 
`_logtrace_log'
collect2: error: ld returned 1 exit status
make[2]: *** [bin/immadm] Error 1
make[2]: Leaving directory `/root/osaf_latest'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/root/osaf_latest'
make: *** [all] Error 2



Here is the last changeset, with which I am building 

root@PL-5:~/osaf_latest# git log | more
commit 1c58a2106a55ad212a8e296424b1f20508eeb9cd
Author: Lennart Lund 
Date:   Thu Oct 19 15:17:27 2017 +0200

smf: coredump and syslog flood after immnd crash [#2441]

Thanks,
Ravi

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] develop branch build is failing

2017-10-20 Thread Ravi Sekhar Reddy Konda

Hi All,

Make is failing with following error 

commit 1c58a2106a55ad212a8e296424b1f20508eeb9cd
Author: Lennart Lund 
Date:   Thu Oct 19 15:17:27 2017 +0200


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 2/2] amf: improve error checking and display [#2628]

2017-10-17 Thread Ravi Sekhar Reddy Konda

Ack, Code review only

Thanks,
RAvi

-Original Message-
From: Gary Lee [mailto:gary@dektech.com.au] 
Sent: Tuesday, October 17, 2017 12:44 PM
To: hans.nordeb...@ericsson.com; ravisekhar.ko...@oracle.com; 
minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Gary Lee 
Subject: [PATCH 2/2] amf: improve error checking and display [#2628]

* nodes are prefixed with '0x', eg '0x2010f' instead of '2010f' to match input
* options 's', 'c', 'u' are incompatible and are rejected when used together
---
 src/amf/tools/amf_cluster_status.cc | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/src/amf/tools/amf_cluster_status.cc 
b/src/amf/tools/amf_cluster_status.cc
index 668e43c01..6b86b03be 100644
--- a/src/amf/tools/amf_cluster_status.cc
+++ b/src/amf/tools/amf_cluster_status.cc
@@ -375,6 +375,7 @@ int main(int argc, char *argv[]) {
   char *ptr = NULL;
   NODE_ID node_id = 0;
   bool user_node = false;
+  unsigned int options_found = 0;
 
   struct option long_options[] = {
 {"help",no_argument,   0, 'h'},
@@ -391,10 +392,12 @@ int main(int argc, char *argv[]) {
 
  switch (option) {
  case 's':
+ ++options_found;
  opt_char = option;
  break;
  case 'c':
  case 'u':
+ ++options_found;
  opt_char = option;
  if (optarg == NULL) {
if ((argv[optind] != NULL) && @@ -427,9 +430,16 @@ int main(int 
argc, char *argv[]) {
 exit(EXIT_FAILURE);
   }
 
-  if (user_node == false)
+  if (user_node == false) {
 node_id = m_NCS_GET_NODE_ID; //Get local node_id
-
+  } else if (node_id == 0) {
+std::cerr << "Invalid node" << std::endl;
+exit(EXIT_FAILURE);
+  }
+  if (options_found > 1) {
+std::cerr << "Too many options specified" << std::endl;
+exit(EXIT_FAILURE);
+  }
   if (mds_get_handle() != NCSCC_RC_SUCCESS) exit(EXIT_FAILURE);
   if (mds_init() != NCSCC_RC_SUCCESS) exit(EXIT_FAILURE);
   if (get_clm_nodes() != SA_AIS_OK) exit(EXIT_FAILURE); @@ -454,7 +464,8 @@ 
int main(int argc, char *argv[]) {
   if (user_node == false)
 std::cout << "This is a Controller node"<

Re: [devel] [PATCH 1/1] amf: Allow SI and SI Dependency object to be deleted in same ccb [#2585]

2017-10-10 Thread Ravi Sekhar Reddy Konda

Hi Minh,

Ack  for the patch as well AMF PR doc update

Thanks,
Ravi

-Original Message-
From: Minh Chau [mailto:minh.c...@dektech.com.au] 
Sent: Thursday, September 28, 2017 6:24 PM
To: hans.nordeb...@ericsson.com; gary@dektech.com.au; 
praveen.malv...@oracle.com; ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
Subject: [PATCH 1/1] amf: Allow SI and SI Dependency object to be deleted in 
same ccb [#2585]

CCB is allowed if:
. All SIs are in CCB must be unassigned
. CCB must include all safDepend that are related to all SIs CCB is aborted if 
one of above conditions is not meet
---
 src/amf/amfd/si.cc | 66 ++
 src/amf/amfd/si.h  |  2 ++
 src/amf/amfd/si_dep.cc | 15 +---
 3 files changed, 74 insertions(+), 9 deletions(-)

diff --git a/src/amf/amfd/si.cc b/src/amf/amfd/si.cc index 74b465314..cf026d7b8 
100644
--- a/src/amf/amfd/si.cc
+++ b/src/amf/amfd/si.cc
@@ -1091,11 +1091,40 @@ static SaAisErrorT 
si_ccb_completed_cb(CcbUtilOperationData_t *opdata) {
 goto done;
   }
   /* check for any SI-SI dependency configurations */
-  if (0 != si->num_dependents || si->spons_si_list != nullptr) {
-report_ccb_validation_error(
-opdata, "Sponsors or Dependents Exist; Cannot delete '%s'",
-si->name.c_str());
-goto done;
+  if (si->num_dependents != 0) {
+if (si->is_all_dependent_si_unassigned() == false) {
+  report_ccb_validation_error(
+  opdata, "Dependent SI still has assignment; Cannot delete '%s'",
+  si->name.c_str());
+  goto done;
+}
+  }
+  if (si->spons_si_list != nullptr) {
+if (si->is_all_sponsor_si_unassigned() == false) {
+  report_ccb_validation_error(
+  opdata, "Sponsor SI still has assignment; Cannot delete '%s'",
+  si->name.c_str());
+  goto done;
+}
+  }
+  if (si->num_dependents != 0 || si->spons_si_list != nullptr) {
+/* loop through sidep_db
+ * if any sidep can't be found in ccbutildata, reject
+ */
+for (const auto  : *sidep_db) {
+  const AVD_SI_DEP *sidep = value.second;
+  if (si == sidep->spons_si || si == sidep->dep_si) {
+SaNameT sidepDn;
+osaf_extended_name_lend(sidep->name.c_str(), );
+if (ccbutil_getCcbOpDataByDN(opdata->ccbId, ) == nullptr) {
+  report_ccb_validation_error(
+  opdata, "Dependency object '%s' must be deleted in same ccb;"
+  " Cannot delete '%s'", sidep->name.c_str(),
+  si->name.c_str());
+  goto done;
+}
+  }
+}
   }
   rc = SA_AIS_OK;
   opdata->userData = si; /* Save for later use in apply */ @@ -1573,6 
+1602,33 @@ bool AVD_SI::is_sirank_valid(uint32_t newSiRank) const {  }
 
 /*
+ * @brief Check if all sponsor SIs are unassigned
+ * @return true if all are unassigned
+ */
+bool AVD_SI::is_all_sponsor_si_unassigned() const {
+  AVD_SPONS_SI_NODE *node;
+
+  for (node = spons_si_list; node; node = node->next) {
+if (node->si->list_of_sisu != nullptr) return false;
+  }
+  return true;
+}
+
+/*
+ * @brief Check if all dependent SIs are unassigned
+ * @return true if all are unassigned
+ */
+bool AVD_SI::is_all_dependent_si_unassigned() const {
+  std::list depsi_list;
+  get_dependent_si_list(name, depsi_list);
+  for (std::list::const_iterator it = depsi_list.begin();
+   it != depsi_list.end(); ++it) {
+if ((*it)->list_of_sisu != nullptr) return false;
+  }
+  return true;
+}
+
+/*
  * @brief Update saAmfSIRank by new value of @newSiRank, and update the
  *the si list which is hold by the sg
  * @param [in] @newSiRank: rank of si to be updated diff --git 
a/src/amf/amfd/si.h b/src/amf/amfd/si.h index af14363b6..45b37cc33 100644
--- a/src/amf/amfd/si.h
+++ b/src/amf/amfd/si.h
@@ -153,6 +153,8 @@ class AVD_SI {
   bool is_active() const;
   SaAisErrorT si_swap_validate();
   uint32_t count_sisu_with(SaAmfHAStateT ha);
+  bool is_all_sponsor_si_unassigned() const;  bool 
+ is_all_dependent_si_unassigned() const;
 
  private:
   bool is_assigned() const { return list_of_sisu ? true : false; } diff --git 
a/src/amf/amfd/si_dep.cc b/src/amf/amfd/si_dep.cc index 55fcb28cb..a4ccbe7c4 
100644
--- a/src/amf/amfd/si_dep.cc
+++ b/src/amf/amfd/si_dep.cc
@@ -1230,6 +1230,7 @@ static AVD_SI_DEP *sidep_new(const std::string 
_name,
 
   sidep = new AVD_SI_DEP();
   avd_sidep_indx_init(sidep_name, sidep);
+  sidep->name = sidep_name;
   osafassert(sidep->dep_si != nullptr);
   osafassert(sidep->spons_si != nullptr);
   sidep_db->insert(make_pair(sidep->spons_name, sidep->dep_name), sidep); @@ 
-1376,10 +1377,16 @@ static void sidep_ccb_apply_cb(CcbUtilOperationData_t 
*opdata) {
   sidep_spons_list_del(sidep);

Re: [devel] [PATCH 1/1] amfd: remove node_up variable from AVD_AVND [#2595]

2017-09-28 Thread Ravi Sekhar Reddy Konda

Ack, Code review only

Thanks,
Ravi

- Original Message -
From: gary@dektech.com.au
To: hans.nordeb...@ericsson.com, minh.c...@dektech.com.au, 
ravisekhar.ko...@oracle.com, praveen.malv...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net, gary@dektech.com.au
Sent: Wednesday, September 27, 2017 12:19:28 PM GMT +05:30 Chennai, Kolkata, 
Mumbai, New Delhi
Subject: [PATCH 1/1] amfd: remove node_up variable from AVD_AVND [#2595]

node_up is not checkpointed. Replace with node_state.
---
 src/amf/amfd/clm.cc| 6 +++---
 src/amf/amfd/ndfsm.cc  | 1 -
 src/amf/amfd/ndproc.cc | 1 -
 src/amf/amfd/node.cc   | 1 -
 src/amf/amfd/node.h| 1 -
 5 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc
index 7340c4bf3..d8342ca88 100644
--- a/src/amf/amfd/clm.cc
+++ b/src/amf/amfd/clm.cc
@@ -247,7 +247,7 @@ static void clm_track_cb(
   case SA_CLM_CHANGE_VALIDATE:
 if (notifItem->clusterChange == SA_CLM_NODE_LEFT) {
   node = avd_node_find_nodeid(notifItem->clusterNode.nodeId);
-  if (node == nullptr || node->node_up == false) {
+  if (node == nullptr || node->node_state == AVD_AVND_STATE_ABSENT) {
 LOG_IN("%s: CLM node '%s' is not an AMF cluster member",
__FUNCTION__, node_name.c_str());
 goto done;
@@ -263,7 +263,7 @@ static void clm_track_cb(
 
   case SA_CLM_CHANGE_START:
 node = avd_node_find_nodeid(notifItem->clusterNode.nodeId);
-if (node == nullptr || node->node_up == false) {
+if (node == nullptr || node->node_state == AVD_AVND_STATE_ABSENT) {
   LOG_IN("%s: CLM node '%s' is not an AMF cluster member", 
__FUNCTION__,
  node_name.c_str());
   goto done;
@@ -294,7 +294,7 @@ static void clm_track_cb(
 LOG_IN("%s: CLM node '%s' is not an AMF cluster member",
__FUNCTION__, node_name.c_str());
 goto done;
-  } else if (node->node_up == false) {
+  } else if (node->node_state == AVD_AVND_STATE_ABSENT) {
 LOG_IN("%s: CLM node '%s' is not an AMF cluster member; MDS down 
received",
__FUNCTION__, node_name.c_str());
 avd_node_delete_nodeid(node);
diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc
index 223f57f20..ca2e3f698 100644
--- a/src/amf/amfd/ndfsm.cc
+++ b/src/amf/amfd/ndfsm.cc
@@ -247,7 +247,6 @@ void record_node_up_msg_info(AVD_AVND *avnd, const 
AVD_DND_MSG *n2d_msg) {
   osafassert(avnd != nullptr);
 
   avnd->adest = n2d_msg->msg_info.n2d_node_up.adest_address;
-  avnd->node_up = true;
 
   if (n2d_msg->msg_info.n2d_node_up.msg_id >= avnd->rcv_msg_id) {
 LOG_NO("Received node_up from %x: msg_id %u",
diff --git a/src/amf/amfd/ndproc.cc b/src/amf/amfd/ndproc.cc
index 2edb9b16e..0c6316627 100644
--- a/src/amf/amfd/ndproc.cc
+++ b/src/amf/amfd/ndproc.cc
@@ -1221,6 +1221,5 @@ void avd_node_failover(AVD_AVND *node) {
   avd_pg_node_csi_del_all(avd_cb, node);
   avd_node_down_mw_susi_failover(avd_cb, node);
   avd_node_down_appl_susi_failover(avd_cb, node);
-  node->node_up = false; // postpone deletion from node_id_db
   TRACE_LEAVE();
 }
diff --git a/src/amf/amfd/node.cc b/src/amf/amfd/node.cc
index e7eb709f0..0ffcfb782 100644
--- a/src/amf/amfd/node.cc
+++ b/src/amf/amfd/node.cc
@@ -120,7 +120,6 @@ void AVD_AVND::initialize() {
   clm_change_start_preceded = {};
   recvr_fail_sw = {};
   admin_ng = {};
-  node_up = false;
 }
 
 //
diff --git a/src/amf/amfd/node.h b/src/amf/amfd/node.h
index 4cee956cc..e64bf8c93 100644
--- a/src/amf/amfd/node.h
+++ b/src/amf/amfd/node.h
@@ -148,7 +148,6 @@ class AVD_AVND {
   bool is_campaign_set_for_all_sus() const;
   // Member functions.
   void node_sus_termstate_set(bool term_state) const;
-  bool node_up; // true if MDS is up, false if MDS is down
 
  private:
   void initialize();
-- 
2.11.0


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfd: choose unlocked instantiable SU for instantiation [#2462]

2017-09-25 Thread Ravi Sekhar Reddy Konda

Hi Minh,

Addressed your comments, please find the attached patch(this is on top of the 
original patch).
If you are fine, I will push the patch

Thanks,
Ravi

- Original Message -
From: minh.c...@dektech.com.au
To: ravi.sek...@oracle.com, hans.nordeb...@ericsson.com, gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Sent: Wednesday, September 13, 2017 10:36:55 AM GMT +05:30 Chennai, Kolkata, 
Mumbai, New Delhi
Subject: Re: [devel] [PATCH 1/1] amfd: choose unlocked instantiable SU for 
instantiation [#2462]

Hi Ravi,

Minor comments in line.

Thanks,
Minh
On 08/09/17 17:03, Ravi Sekhar wrote:
> ---
>   src/amf/amfd/sgproc.cc | 23 +--
>   1 file changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/src/amf/amfd/sgproc.cc b/src/amf/amfd/sgproc.cc
> index 6ca4261..0dbaa59 100644
> --- a/src/amf/amfd/sgproc.cc
> +++ b/src/amf/amfd/sgproc.cc
> @@ -1924,7 +1924,20 @@ uint32_t in_serv_su(AVD_SG *sg) {
> TRACE_LEAVE2("%u", in_serv);
> return in_serv;
>   }
> -
> +/**
> + * @brief   This function checks if there is any same ranked SU
> + *  which is Unlocked and can be Instantiated
> + *
> + * @param   pointer to su
> + *
> + */
> +uint32_t find_instantiable_same_rank_su(AVD_SU *su) {
> +  for (const auto _su : su->sg_of_su->list_of_su) {
> +if (i_su->is_instantiable() && (i_su->saAmfSURank == su->saAmfSURank))
> +  return true;
> +  }
> +  return false;
> +}
[Minh]: Returned value of function should be bool, if possible please 
consider the new find_instantantiable_same_rank_su() as a method of class.
>   
> /*
>* Function: avd_sg_app_su_inst_func
>*
> @@ -2011,7 +2024,13 @@ uint32_t avd_sg_app_su_inst_func(AVD_CL_CB *cb, AVD_SG 
> *sg) {
>   TRACE("%u, %u", sg->pref_inservice_sus(), num_try_insvc_su);
>   if (sg->pref_inservice_sus() >
>   (sg_instantiated_su_count(i_su->sg_of_su) + num_try_insvc_su)) {
> -  /* Try to Instantiate this SU */
> +  /* If SU is in Locked State, find if there is any other SU in the 
> same rank
> +   * which can provide Service(Unlocked SU)
> +   */
> +  if(i_su->saAmfSUAdminState == SA_AMF_ADMIN_LOCKED) {
> +if (find_instantiable_same_rank_su(i_su))
> +  continue;
> +  }
[Minh]: Patch works fine with reported scenario. However, if replace 
"lock SU1, lock SU2" in scenario of ticket by "lock SC1, lock SC2", we 
have the same problem - SI is still PARTIALLY_ASSIGNED.
If I change the *if* as below, it works for me in both cases.
   if(i_su->is_instantiable() == false) {
 if (find_instantiable_same_rank_su(i_su))
   continue;
   }
> if (avd_snd_presence_msg(cb, i_su, false) == NCSCC_RC_SUCCESS) {
>   num_try_insvc_su++;
> }


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! 
https://urldefense.proofpoint.com/v2/url?u=http-3A__sdm.link_slashdot=DwICAg=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10=rFCQ76TW5HZUgA7b20ApVcXgXru6mvz4fvCm1_H6w1k=ICtz-8NcHlILP8LBUdTUuikcCD3s8_aYlcsx5gQ_MxI=Ex2RmugAaf80RUUKinucP8YznGguSqRGYzwecTZIdvI=
 
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_opensaf-2Ddevel=DwICAg=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10=rFCQ76TW5HZUgA7b20ApVcXgXru6mvz4fvCm1_H6w1k=ICtz-8NcHlILP8LBUdTUuikcCD3s8_aYlcsx5gQ_MxI=91ManmbaZoIBHsmXK3kReGUsb5EcSNIe3wYPHT8V_VA=
 


amfd_2462_2.patch
Description: Binary data
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amf: Fix amf_demo program compilation error [#2578]

2017-09-12 Thread Ravi Sekhar Reddy Konda

Hi Hans,

Ack

Regards,
Ravi
- Original Message -
From: hans.nordeb...@ericsson.com
To: gary@dektech.com.au, ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net, hans.nordeb...@ericsson.com
Sent: Friday, September 8, 2017 6:23:28 PM GMT +05:30 Chennai, Kolkata, Mumbai, 
New Delhi
Subject: [PATCH 1/1] amf: Fix amf_demo program compilation error [#2578]

---
 tools/cluster_sim_uml/build_uml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/cluster_sim_uml/build_uml b/tools/cluster_sim_uml/build_uml
index 6ad205f89..39b0bee69 100755
--- a/tools/cluster_sim_uml/build_uml
+++ b/tools/cluster_sim_uml/build_uml
@@ -107,6 +107,7 @@ cmd_install_testprog() {
 cp $src/amf_demo_script $installd
 gcc -g -O2 -Wall -fPIC -I$opensaf_home/src/amf/saf \
-I$opensaf_home/src/ntf/saf \
+   -I$opensaf_home/src/ais/include \
-I$opensaf_home/src/osaf/saf \
-DSA_EXTENDED_NAME_SOURCE \
-o $installd/amf_demo $src/amf_demo.c \
-- 
2.14.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

64 matches

Mail list logo