[devel] [PATCH 0/2] Review Request for osaf: ensure takeover_requests have a lease [#2954]

2018-11-12 Thread Gary Lee
Summary: osaf: ensure takeover_requests have a lease [#2954]
Review request for Ticket(s): 2954
Peer Reviewer(s): Canh, Hans, Nagu 
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2954
Base revision: 5bb2174a323a97f626ce354d553a1dc4d1673899
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y 
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

Acknowledgement: Canh noticed this problem and did
the initial patches


revision 81650ab440f6b140fd4d5d485d13e573b4e070c0
Author: Gary Lee 
Date:   Tue, 13 Nov 2018 07:30:45 +

osaf: update etcd2 and sample plugins [#2954]

add timeout parameter to set and set_if_prev



revision f3a64996eeb739904c7d8e1abcf4f469eeea78b4
Author: Gary Lee 
Date:   Tue, 13 Nov 2018 06:38:27 +

osaf: ensure takeover_requests have a lease [#2954]

In CreateTakeoverRequest(), if the initial attempt fails,
then the takeover_request is created without a lease.

Furthermore, when the takeover_request result is set,
it is being set without a lease, and the takeover_request
is not automatically removed.

Add  parameter to KeyValue::Set, and remove
default value for the  parameter to KeyValue::Create to
ensure a timeout is always set.



Complete diffstat:
--
 src/osaf/consensus/consensus.cc  | 10 ---
 src/osaf/consensus/key_value.cc  | 11 +---
 src/osaf/consensus/key_value.h   |  8 --
 src/osaf/consensus/plugins/etcd.plugin   | 20 +++--
 src/osaf/consensus/plugins/etcd3.plugin  | 48 ++--
 src/osaf/consensus/plugins/sample.plugin | 16 +++
 6 files changed, 79 insertions(+), 34 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y  
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user 

[devel] [PATCH 1/2] osaf: ensure takeover_requests have a lease [#2954]

2018-11-12 Thread Gary Lee
In CreateTakeoverRequest(), if the initial attempt fails,
then the takeover_request is created without a lease.

Furthermore, when the takeover_request result is set,
it is being set without a lease, and the takeover_request
is not automatically removed.

Add  parameter to KeyValue::Set, and remove
default value for the  parameter to KeyValue::Create to
ensure a timeout is always set.
---
 src/osaf/consensus/consensus.cc | 10 +++---
 src/osaf/consensus/key_value.cc | 11 +++---
 src/osaf/consensus/key_value.h  |  8 +++--
 src/osaf/consensus/plugins/etcd3.plugin | 48 -
 4 files changed, 57 insertions(+), 20 deletions(-)

diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc
index 1136c3724..112af7df0 100644
--- a/src/osaf/consensus/consensus.cc
+++ b/src/osaf/consensus/consensus.cc
@@ -181,11 +181,11 @@ bool Consensus::IsWritable() const {
   SaAisErrorT rc;
   uint32_t retries = 0;
   constexpr uint32_t kMaxTestRetry = 3;
-  rc = KeyValue::Set(kTestKeyname, base::Conf::NodeName());
+  rc = KeyValue::Set(kTestKeyname, base::Conf::NodeName(), 0);
   while (rc != SA_AIS_OK && retries < kMaxTestRetry) {
 ++retries;
 std::this_thread::sleep_for(kSleepInterval);
-rc = KeyValue::Set(kTestKeyname, base::Conf::NodeName());
+rc = KeyValue::Set(kTestKeyname, base::Conf::NodeName(), 0);
   }
 
   if (rc == SA_AIS_OK) {
@@ -338,7 +338,8 @@ SaAisErrorT Consensus::CreateTakeoverRequest(const 
std::string& current_owner,
   while (rc == SA_AIS_ERR_FAILED_OPERATION && retries < kMaxRetry) {
 ++retries;
 std::this_thread::sleep_for(kSleepInterval);
-rc = KeyValue::Create(kTakeoverRequestKeyname, takeover_request);
+rc = KeyValue::Create(kTakeoverRequestKeyname, takeover_request,
+  takeover_valid_time);
   }
 
   if (rc == SA_AIS_ERR_EXIST) {
@@ -435,7 +436,8 @@ SaAisErrorT Consensus::WriteTakeoverResult(
 
   // previous value must match
   rc =
-  KeyValue::Set(kTakeoverRequestKeyname, takeover_result, 
takeover_request);
+  KeyValue::Set(kTakeoverRequestKeyname, takeover_result,
+takeover_request, takeover_valid_time);
 
   return rc;
 }
diff --git a/src/osaf/consensus/key_value.cc b/src/osaf/consensus/key_value.cc
index cf5c02213..109cf9f69 100644
--- a/src/osaf/consensus/key_value.cc
+++ b/src/osaf/consensus/key_value.cc
@@ -58,13 +58,14 @@ SaAisErrorT KeyValue::Get(const std::string& key, 
std::string& value) {
   }
 }
 
-SaAisErrorT KeyValue::Set(const std::string& key, const std::string& value) {
+SaAisErrorT KeyValue::Set(const std::string& key, const std::string& value,
+  const unsigned int timeout) {
   TRACE_ENTER();
 
   const std::string kv_store_cmd =
   base::GetEnv("FMS_KEYVALUE_STORE_PLUGIN_CMD", "");
   const std::string command(kv_store_cmd + " set \"" + key + "\" \"" + value +
-"\"");
+"\" " + std::to_string(timeout));
   std::string output;
   int rc = KeyValue::Execute(command, output);
 
@@ -76,11 +77,13 @@ SaAisErrorT KeyValue::Set(const std::string& key, const 
std::string& value) {
 }
 
 SaAisErrorT KeyValue::Set(const std::string& key, const std::string& value,
-  const std::string& prev_value) {
+  const std::string& prev_value,
+  const unsigned int timeout) {
   const std::string kv_store_cmd =
   base::GetEnv("FMS_KEYVALUE_STORE_PLUGIN_CMD", "");
   const std::string command(kv_store_cmd + " set_if_prev \"" + key + "\" \"" +
-value + "\" \"" + prev_value + "\"");
+value + "\" \"" + prev_value +
+"\" " + std::to_string(timeout));
   std::string output;
   int rc = KeyValue::Execute(command, output);
 
diff --git a/src/osaf/consensus/key_value.h b/src/osaf/consensus/key_value.h
index af9717595..043a964ae 100644
--- a/src/osaf/consensus/key_value.h
+++ b/src/osaf/consensus/key_value.h
@@ -30,15 +30,17 @@ class KeyValue {
   static SaAisErrorT Get(const std::string& key, std::string& value);
 
   // Set key to value
-  static SaAisErrorT Set(const std::string& key, const std::string& value);
+  static SaAisErrorT Set(const std::string& key, const std::string& value,
+ const unsigned int timeout);
 
   // Set key to value only if prev value matches
   static SaAisErrorT Set(const std::string& key, const std::string& value,
- const std::string& prev_value);
+ const std::string& prev_value,
+ const unsigned int timeout);
 
   // Create key, and set to value. Fails if key already exists.
   static SaAisErrorT Create(const std::string& key, const std::string& value,
-const unsigned int timeout = 0);
+const unsigned int timeout);
 
   // Erase key
   static 

[devel] [PATCH 2/2] osaf: update etcd2 and sample plugins [#2954]

2018-11-12 Thread Gary Lee
add timeout parameter to set and set_if_prev
---
 src/osaf/consensus/plugins/etcd.plugin   | 20 
 src/osaf/consensus/plugins/sample.plugin | 16 ++--
 2 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/src/osaf/consensus/plugins/etcd.plugin 
b/src/osaf/consensus/plugins/etcd.plugin
index b585bb657..f62cc892b 100644
--- a/src/osaf/consensus/plugins/etcd.plugin
+++ b/src/osaf/consensus/plugins/etcd.plugin
@@ -45,15 +45,17 @@ get() {
 # params:
 #   $1 - 
 #   $2 - 
+#   $3 - 
 # returns:
 #   0 - success
 #   non-zero - failure
 setkey() {
   readonly key="$1"
   readonly value="$2"
+  readonly timeout="$3"
 
   if etcdctl $etcd_options --timeout $etcd_timeout set "$directory$key" \
-"$value" >/dev/null
+"$value" --ttl "$timeout" >/dev/null
   then
 return 0
   else
@@ -98,6 +100,7 @@ create_key() {
 #   $1 - 
 #   $2 - 
 #   $3 - 
+#   $4 - 
 # returns:
 #   0 - success
 #   non-zero - failure
@@ -105,9 +108,10 @@ setkey_match_prev() {
   readonly key="$1"
   readonly value="$2"
   readonly prev="$3"
+  readonly timeout="$4"
 
   if etcdctl $etcd_options --timeout $etcd_timeout set "$directory$key" \
-"$value" --swap-with-value "$prev" >/dev/null
+"$value" --swap-with-value "$prev" --ttl "$timeout" >/dev/null
   then
 return 0
   else
@@ -263,19 +267,19 @@ case "$1" in
 exit $?
 ;;
   set)
-if [ "$#" -ne 3 ]; then
-  echo "Usage: $0 set  "
+if [ "$#" -ne 4 ]; then
+  echo "Usage: $0 set   "
   exit 1
 fi
-setkey "$2" "$3"
+setkey "$2" "$3" "$4"
 exit $?
 ;;
   set_if_prev)
-if [ "$#" -ne 4 ]; then
-  echo "Usage: $0 set   "
+if [ "$#" -ne 5 ]; then
+  echo "Usage: $0 set"
   exit 1
 fi
-setkey_match_prev "$2" "$3" "$4"
+setkey_match_prev "$2" "$3" "$4" "$5"
 exit $?
 ;;
   create)
diff --git a/src/osaf/consensus/plugins/sample.plugin 
b/src/osaf/consensus/plugins/sample.plugin
index 6f6c71f6f..fc4c54c17 100644
--- a/src/osaf/consensus/plugins/sample.plugin
+++ b/src/osaf/consensus/plugins/sample.plugin
@@ -35,12 +35,14 @@ get() {
 # params:
 #   $1 - 
 #   $2 - 
+#   $3 - 
 # returns:
 #   0 - success
 #   non-zero - failure
 setkey() {
   readonly key="$1"
   readonly value="$2"
+  readonly timeout="$3"
   ...
 }
 
@@ -69,6 +71,7 @@ create_key() {
 #   $1 - 
 #   $2 - 
 #   $3 - 
+#   $4 - 
 # returns:
 #   0 - success
 #   non-zero - failure
@@ -76,6 +79,7 @@ setkey_match_prev() {
   readonly key="$1"
   readonly value="$2"
   readonly prev="$3"
+  readonly timeout="$4"
   ...
 }
 
@@ -155,19 +159,19 @@ case "$1" in
 exit $?
 ;;
   set)
-if [ "$#" -ne 3 ]; then
-  echo "Usage: $0 set  "
+if [ "$#" -ne 4 ]; then
+  echo "Usage: $0 set   "
   exit 1
 fi
-setkey "$2" "$3"
+setkey "$2" "$3" "$4"
 exit $?
 ;;
   set_if_prev)
-if [ "$#" -ne 4 ]; then
-  echo "Usage: $0 set   "
+if [ "$#" -ne 5 ]; then
+  echo "Usage: $0 set"
   exit 1
 fi
-setkey_match_prev "$2" "$3" "$4"
+setkey_match_prev "$2" "$3" "$4" "$5"
 exit $?
 ;;
   create)
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] amfd: check consensus service is writable [#2957]

2018-11-12 Thread Gary Lee
A check to make sure the consensus service is writable (ie. the SC is in a 
partition with quorum)
is present in avd_node_failover(). However, [#2918] means this function is not 
always being called.
We need to move it.
---
 src/amf/amfd/ndfsm.cc  |  1 +
 src/amf/amfd/ndproc.cc | 10 +++---
 src/amf/amfd/proc.h|  1 +
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc
index 301de835b..218551cdd 100644
--- a/src/amf/amfd/ndfsm.cc
+++ b/src/amf/amfd/ndfsm.cc
@@ -817,6 +817,7 @@ void avd_mds_avnd_down_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
   if (cb->node_failover_delay == 0) {
 avd_node_failover(node);
   }
+  check_quorum();
   node->node_info.member = SA_FALSE;
   // Update standby out of sync if standby sc goes down
   if (avd_cb->node_id_avd_other == node->node_info.nodeId) {
diff --git a/src/amf/amfd/ndproc.cc b/src/amf/amfd/ndproc.cc
index 853a68b6e..c4eebb174 100644
--- a/src/amf/amfd/ndproc.cc
+++ b/src/amf/amfd/ndproc.cc
@@ -1242,6 +1242,12 @@ void avd_node_failover(AVD_AVND *node, const bool 
mw_only) {
 avd_node_down_appl_susi_failover(avd_cb, node);
   }
 
+  TRACE_LEAVE();
+}
+
+void check_quorum() {
+  TRACE_ENTER();
+
   Consensus consensus_service;
   if (consensus_service.IsRemoteFencingEnabled() == false &&
   consensus_service.IsWritable() == false) {
@@ -1250,6 +1256,4 @@ void avd_node_failover(AVD_AVND *node, const bool 
mw_only) {
 opensaf_reboot(0, nullptr,
   "Quorum lost. Rebooting this node to prevent split-brain");
   }
-
-  TRACE_LEAVE();
-}
+}
\ No newline at end of file
diff --git a/src/amf/amfd/proc.h b/src/amf/amfd/proc.h
index 99d1cbfc2..a37821829 100644
--- a/src/amf/amfd/proc.h
+++ b/src/amf/amfd/proc.h
@@ -96,6 +96,7 @@ void avd_process_hb_event(AVD_CL_CB *cb_now, struct AVD_EVT 
*evt);
 extern void avd_node_mark_absent(AVD_AVND *node);
 extern void avd_tmr_snd_hb_evh(AVD_CL_CB *cb, AVD_EVT *evt);
 extern void avd_node_failover(AVD_AVND *node, const bool mw_only = false);
+extern void check_quorum();
 extern AVD_SU *get_other_su_from_oper_list(AVD_SU *su);
 extern void su_complete_admin_op(AVD_SU *su, SaAisErrorT result);
 extern void comp_complete_admin_op(AVD_COMP *comp, SaAisErrorT result);
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for amfd: check consensus service is writable [#2957]

2018-11-12 Thread Gary Lee
Summary: amfd: check consensus service is writable [#2957]
Review request for Ticket(s): 2957
Peer Reviewer(s): Hans, Nagu, Minh
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2957
Base revision: 5bb2174a323a97f626ce354d553a1dc4d1673899
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy 
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 34dc36653385a670198acdbc66f4b53699524696
Author: Gary Lee 
Date:   Tue, 13 Nov 2018 06:23:52 +

amfd: check consensus service is writable [#2957]

A check to make sure the consensus service is writable (ie. the SC is in a 
partition with quorum)
is present in avd_node_failover(). However, [#2918] means this function is not 
always being called.
We need to move it.



Complete diffstat:
--
 src/amf/amfd/ndfsm.cc  |  1 +
 src/amf/amfd/ndproc.cc | 10 +++---
 src/amf/amfd/proc.h|  1 +
 3 files changed, 9 insertions(+), 3 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amf: active amfd should check nodes after reinit with imm [#2949]

2018-11-12 Thread Gary Lee

Hi Thuan

Ack with one comment (review only), I guess 
avd_check_nodes_after_renit_imn() and should be 
avd_check_nodes_after_reinit_imm()?


I'll change it and push on your behalf.

Thanks

Gary

On 12/11/18 7:25 pm, thuan.tran wrote:

- When AMFD got IMM BAD_HANDLE, it will try to finalize current OI
and reinit new OI, it make some callbacks are removed without execution.
Try to dispatch OI before finalize it to reinit.
- After reinit OI, check node db to find out node which is not exist
in IMM (in case ccb apply delete node miss execution) then cleanup
SU/COMP relate to the node and delete the node.
---
  src/amf/amfd/imm.cc  |  4 +++
  src/amf/amfd/node.cc | 80 
  src/amf/amfd/node.h  |  1 +
  src/amf/amfd/su.cc   | 79 +++
  4 files changed, 72 insertions(+), 92 deletions(-)

diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc
index 82d2b13..1b14383 100644
--- a/src/amf/amfd/imm.cc
+++ b/src/amf/amfd/imm.cc
@@ -41,6 +41,7 @@
  #include "amf/common/amf_defs.h"
  #include "amf/amfd/imm.h"
  #include "amf/amfd/cluster.h"
+#include "amf/amfd/node.h"
  #include "amf/amfd/app.h"
  #include "amf/amfd/sgtype.h"
  #include "amf/amfd/sg.h"
@@ -2132,6 +2133,8 @@ static void *avd_imm_reinit_bg_thread(void *_cb) {
  
immutilWrapperProfile.errorsAreFatal = 0;
  
+  /* Try to dispatch imm cb if any, e.g: apply cb, before finalize OI */

+  (void)saImmOiDispatch(avd_cb->immOiHandle, SA_DISPATCH_ALL);
while (++no_of_retries < MAX_NO_RETRIES) {
  (void)saImmOiFinalize(avd_cb->immOiHandle);
  
@@ -2167,6 +2170,7 @@ static void *avd_imm_reinit_bg_thread(void *_cb) {

  osaf_mutex_unlock_ordie(_reinit_mutex);
  exit(EXIT_FAILURE);
}
+  avd_check_nodes_after_renit_imm();
  } else {
/* become applier and re-read the config */
rc = avd_imm_applier_set();
diff --git a/src/amf/amfd/node.cc b/src/amf/amfd/node.cc
index 201f1fc..b6ebcc8 100644
--- a/src/amf/amfd/node.cc
+++ b/src/amf/amfd/node.cc
@@ -25,6 +25,7 @@
  #include "amf/amfd/amfd.h"
  #include "amf/amfd/cluster.h"
  #include "amf/amfd/imm.h"
+#include "amf/amfd/util.h"
  #include 
  
  AmfDb *node_name_db = 0; /* SaNameT index */

@@ -153,57 +154,23 @@ AVD_AVND *avd_node_new(const std::string ) {
  void avd_node_delete(AVD_AVND *node) {
TRACE_ENTER();
osafassert(node->pg_csi_list.n_nodes == 0);
-  if (node->node_info.nodeId) avd_node_delete_nodeid(node);
-  /* Check if the SUs and related objects are still left. This can
- happen on Standby Amfd when it has just read the configuration
- and before it becomes applier, Act Amfd deletes SUs. Those SUs
- will be left out at Standby Amfd. Though this could be rare.*/
-  if (avd_cb->avail_state_avd != SA_AMF_HA_ACTIVE) {
-if (node->list_of_su.empty() != true) {
-  std::set su_list;
-  std::set comp_list;
-  for (const auto  : node->list_of_su) su_list.insert(su->name);
-  for (std::set::const_iterator iter = su_list.begin();
-   iter != su_list.end(); ++iter) {
-AVD_SU *su = su_db->find(*iter);
-TRACE("Standby Amfd, su '%s' not deleted", su->name.c_str());
-for (const auto  : su->list_of_comp)
-  comp_list.insert(Amf::to_string(>comp_info.name));
-for (std::set::const_iterator iter1 = comp_list.begin();
- iter1 != comp_list.end(); ++iter1) {
-  AVD_COMP *comp = comp_db->find(*iter1);
-  TRACE("Standby Amfd, comp '%s' not deleted",
-osaf_extended_name_borrow(>comp_info.name));
-
-  std::map::iterator it =
-  compcstype_db->begin();
-  while (it != compcstype_db->end()) {
-AVD_COMPCS_TYPE *compcstype = it->second;
-if (compcstype->comp == comp) {
-  TRACE("Standby Amfd, compcstype '%s' not deleted",
-compcstype->name.c_str());
-  it = compcstype_db->erase(it);
-  delete compcstype;
-} else
-  ++it;
-  }
+  if (node->node_info.nodeId) {
+avd_node_delete_nodeid(node);
+  }
  
-  /* Delete the Comp. */

-  struct CcbUtilOperationData opdata;
-  osaf_extended_name_alloc(
-  osaf_extended_name_borrow(>comp_info.name),
-  );
-  comp_ccb_apply_delete_hdlr();
-}
-comp_list.clear();
-/* Delete the SU. */
-struct CcbUtilOperationData opdata;
-opdata.userData = su;
-su_ccb_apply_delete_hdlr();
-  }
-  su_list.clear();
-}
+  std::set su_list;
+  for (const auto  : node->list_of_ncs_su) su_list.insert(su->name);
+  for (const auto  : node->list_of_su) su_list.insert(su->name);
+  for (std::set::const_iterator iter = su_list.begin();
+   iter != su_list.end(); ++iter) {
+AVD_SU *su = su_db->find(*iter);
+LOG_WA("su '%s' not deleted, delete it", su->name.c_str());
+struct 

Re: [devel] Review Request for amf: Update PR [#2929]

2018-11-12 Thread Nagendra Kumar
Ok, thanks
-Nagu

-Original Message-
From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] 
Sent: 12 November 2018 17:51
To: Nagendra Kumar; 'Hans Nordeback'; 'Gary Lee'
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: Review Request for amf: Update PR [#2929]

Hi Nagu,

It's not the recovery in specification, I mean a new attribute.

Thanks,

Minh

On 12/11/18 9:53 pm, Nagendra Kumar wrote:
> Hi Minh,
> Thanks for your response.
>>> In future, I think we can make it as configurable recovery method, so up to 
>>> applications to choose from.
> You mean recommended recovery option? But how will it work?
>
>
> Thanks
> -Nagendra
> High Availability Solutions
> www.hasolutions.in
> cont...@hasolutions.in
> Hyderabad, India: +91-9866424860   |   Delaware, USA: +1 508-422-7725
>
> -Original Message-
> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
> Sent: 12 November 2018 16:16
> To: Nagendra Kumar; 'Hans Nordeback'; 'Gary Lee'
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: Review Request for amf: Update PR [#2929]
>
> Hi Nagu,
>
> Agree with you that we can do it for 2N. However the mutual active
> workload has to be exclusively one at a time, so there may be some sort
> of corruption to applications. But it also depends on how internal
> application logics are implemented. So reboot the node is a choice of
> safety for now. In future, I think we can make it as configurable
> recovery method, so up to applications to choose from.
>
> Thanks,
>
> Minh
>
> On 12/11/18 7:49 pm, Nagendra Kumar wrote:
>> Hi Minh,
>> Ack from me.
>> Btw, why did you opt to remove assignments and restart admin operation for 
>> Nway Act and No Red.
>> The same could have done in 2N by removing the assignments and restart and 
>> then provide fresh assignments.
>>
>> Thanks
>> -Nagendra
>> High Availability Solutions
>> www.hasolutions.in
>> cont...@hasolutions.in
>> Hyderabad, India: +91-9866424860   |   Delaware, USA: +1 508-422-7725
>>
>> -Original Message-
>> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
>> Sent: 12 November 2018 08:04
>> To: Hans Nordeback; Nagendra Kumar; Gary Lee
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Review Request for amf: Update PR [#2929]
>>
>> Hi all,
>>
>> Document update for #2929 in item 2.2.18 to be reviewed.
>>
>> https://sourceforge.net/p/opensaf/tickets/2929/attachment/OpenSAF_AMF_PR_2929.odt
>>
>> Thanks,
>>
>> Minh
>>
>>
>>
>



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] Review Request for amf: Update PR [#2929]

2018-11-12 Thread Minh Hon Chau

Hi Nagu,

It's not the recovery in specification, I mean a new attribute.

Thanks,

Minh

On 12/11/18 9:53 pm, Nagendra Kumar wrote:

Hi Minh,
Thanks for your response.

In future, I think we can make it as configurable recovery method, so up to 
applications to choose from.

You mean recommended recovery option? But how will it work?


Thanks
-Nagendra
High Availability Solutions
www.hasolutions.in
cont...@hasolutions.in
Hyderabad, India: +91-9866424860   |   Delaware, USA: +1 508-422-7725

-Original Message-
From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
Sent: 12 November 2018 16:16
To: Nagendra Kumar; 'Hans Nordeback'; 'Gary Lee'
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: Review Request for amf: Update PR [#2929]

Hi Nagu,

Agree with you that we can do it for 2N. However the mutual active
workload has to be exclusively one at a time, so there may be some sort
of corruption to applications. But it also depends on how internal
application logics are implemented. So reboot the node is a choice of
safety for now. In future, I think we can make it as configurable
recovery method, so up to applications to choose from.

Thanks,

Minh

On 12/11/18 7:49 pm, Nagendra Kumar wrote:

Hi Minh,
Ack from me.
Btw, why did you opt to remove assignments and restart admin operation for Nway 
Act and No Red.
The same could have done in 2N by removing the assignments and restart and then 
provide fresh assignments.

Thanks
-Nagendra
High Availability Solutions
www.hasolutions.in
cont...@hasolutions.in
Hyderabad, India: +91-9866424860   |   Delaware, USA: +1 508-422-7725

-Original Message-
From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
Sent: 12 November 2018 08:04
To: Hans Nordeback; Nagendra Kumar; Gary Lee
Cc: opensaf-devel@lists.sourceforge.net
Subject: Review Request for amf: Update PR [#2929]

Hi all,

Document update for #2929 in item 2.2.18 to be reviewed.

https://sourceforge.net/p/opensaf/tickets/2929/attachment/OpenSAF_AMF_PR_2929.odt

Thanks,

Minh








___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] Review Request for amf: Update PR [#2929]

2018-11-12 Thread Nagendra Kumar
Hi Minh,
Thanks for your response.
>> In future, I think we can make it as configurable recovery method, so up to 
>> applications to choose from.
You mean recommended recovery option? But how will it work?


Thanks
-Nagendra
High Availability Solutions
www.hasolutions.in
cont...@hasolutions.in
Hyderabad, India: +91-9866424860   |   Delaware, USA: +1 508-422-7725

-Original Message-
From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] 
Sent: 12 November 2018 16:16
To: Nagendra Kumar; 'Hans Nordeback'; 'Gary Lee'
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: Review Request for amf: Update PR [#2929]

Hi Nagu,

Agree with you that we can do it for 2N. However the mutual active 
workload has to be exclusively one at a time, so there may be some sort 
of corruption to applications. But it also depends on how internal 
application logics are implemented. So reboot the node is a choice of 
safety for now. In future, I think we can make it as configurable 
recovery method, so up to applications to choose from.

Thanks,

Minh

On 12/11/18 7:49 pm, Nagendra Kumar wrote:
> Hi Minh,
> Ack from me.
> Btw, why did you opt to remove assignments and restart admin operation for 
> Nway Act and No Red.
> The same could have done in 2N by removing the assignments and restart and 
> then provide fresh assignments.
>
> Thanks
> -Nagendra
> High Availability Solutions
> www.hasolutions.in
> cont...@hasolutions.in
> Hyderabad, India: +91-9866424860   |   Delaware, USA: +1 508-422-7725
>
> -Original Message-
> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
> Sent: 12 November 2018 08:04
> To: Hans Nordeback; Nagendra Kumar; Gary Lee
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Review Request for amf: Update PR [#2929]
>
> Hi all,
>
> Document update for #2929 in item 2.2.18 to be reviewed.
>
> https://sourceforge.net/p/opensaf/tickets/2929/attachment/OpenSAF_AMF_PR_2929.odt
>
> Thanks,
>
> Minh
>
>
>



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] Review Request for amf: Update PR [#2929]

2018-11-12 Thread Minh Hon Chau

Hi Nagu,

Agree with you that we can do it for 2N. However the mutual active 
workload has to be exclusively one at a time, so there may be some sort 
of corruption to applications. But it also depends on how internal 
application logics are implemented. So reboot the node is a choice of 
safety for now. In future, I think we can make it as configurable 
recovery method, so up to applications to choose from.


Thanks,

Minh

On 12/11/18 7:49 pm, Nagendra Kumar wrote:

Hi Minh,
Ack from me.
Btw, why did you opt to remove assignments and restart admin operation for Nway 
Act and No Red.
The same could have done in 2N by removing the assignments and restart and then 
provide fresh assignments.

Thanks
-Nagendra
High Availability Solutions
www.hasolutions.in
cont...@hasolutions.in
Hyderabad, India: +91-9866424860   |   Delaware, USA: +1 508-422-7725

-Original Message-
From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
Sent: 12 November 2018 08:04
To: Hans Nordeback; Nagendra Kumar; Gary Lee
Cc: opensaf-devel@lists.sourceforge.net
Subject: Review Request for amf: Update PR [#2929]

Hi all,

Document update for #2929 in item 2.2.18 to be reviewed.

https://sourceforge.net/p/opensaf/tickets/2929/attachment/OpenSAF_AMF_PR_2929.odt

Thanks,

Minh






___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] Review Request for amf: Update PR [#2929]

2018-11-12 Thread Nagendra Kumar
Hi Minh,
Ack from me.
Btw, why did you opt to remove assignments and restart admin operation for Nway 
Act and No Red.
The same could have done in 2N by removing the assignments and restart and then 
provide fresh assignments.

Thanks
-Nagendra
High Availability Solutions
www.hasolutions.in
cont...@hasolutions.in
Hyderabad, India: +91-9866424860   |   Delaware, USA: +1 508-422-7725

-Original Message-
From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] 
Sent: 12 November 2018 08:04
To: Hans Nordeback; Nagendra Kumar; Gary Lee
Cc: opensaf-devel@lists.sourceforge.net
Subject: Review Request for amf: Update PR [#2929]

Hi all,

Document update for #2929 in item 2.2.18 to be reviewed.

https://sourceforge.net/p/opensaf/tickets/2929/attachment/OpenSAF_AMF_PR_2929.odt

Thanks,

Minh




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for amf: active amfd should check nodes after reinit with imm [#2949] V4

2018-11-12 Thread thuan.tran
Summary: amf: active amfd should check nodes after reinit with imm [#2949]
Review request for Ticket(s): 2949
Peer Reviewer(s): Gary, Minh
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2949
Base revision: f8a6848a1cdbff0b518c3db951e4689e260226c7
Personal repository: git://git.code.sf.net/u/thuantr/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
N/A

revision b977065a3257758e7b043d60174487504dfba70c
Author: thuan.tran 
Date:   Mon, 12 Nov 2018 15:17:54 +0700

amf: active amfd should check nodes after reinit with imm [#2949]

- When AMFD got IMM BAD_HANDLE, it will try to finalize current OI
and reinit new OI, it make some callbacks are removed without execution.
Try to dispatch OI before finalize it to reinit.
- After reinit OI, check node db to find out node which is not exist
in IMM (in case ccb apply delete node miss execution) then cleanup
SU/COMP relate to the node and delete the node.



Complete diffstat:
--
 src/amf/amfd/imm.cc  |  4 +++
 src/amf/amfd/node.cc | 80 
 src/amf/amfd/node.h  |  1 +
 src/amf/amfd/su.cc   | 79 +++
 4 files changed, 72 insertions(+), 92 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK by reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] amf: active amfd should check nodes after reinit with imm [#2949]

2018-11-12 Thread thuan.tran
- When AMFD got IMM BAD_HANDLE, it will try to finalize current OI
and reinit new OI, it make some callbacks are removed without execution.
Try to dispatch OI before finalize it to reinit.
- After reinit OI, check node db to find out node which is not exist
in IMM (in case ccb apply delete node miss execution) then cleanup
SU/COMP relate to the node and delete the node.
---
 src/amf/amfd/imm.cc  |  4 +++
 src/amf/amfd/node.cc | 80 
 src/amf/amfd/node.h  |  1 +
 src/amf/amfd/su.cc   | 79 +++
 4 files changed, 72 insertions(+), 92 deletions(-)

diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc
index 82d2b13..1b14383 100644
--- a/src/amf/amfd/imm.cc
+++ b/src/amf/amfd/imm.cc
@@ -41,6 +41,7 @@
 #include "amf/common/amf_defs.h"
 #include "amf/amfd/imm.h"
 #include "amf/amfd/cluster.h"
+#include "amf/amfd/node.h"
 #include "amf/amfd/app.h"
 #include "amf/amfd/sgtype.h"
 #include "amf/amfd/sg.h"
@@ -2132,6 +2133,8 @@ static void *avd_imm_reinit_bg_thread(void *_cb) {
 
   immutilWrapperProfile.errorsAreFatal = 0;
 
+  /* Try to dispatch imm cb if any, e.g: apply cb, before finalize OI */
+  (void)saImmOiDispatch(avd_cb->immOiHandle, SA_DISPATCH_ALL);
   while (++no_of_retries < MAX_NO_RETRIES) {
 (void)saImmOiFinalize(avd_cb->immOiHandle);
 
@@ -2167,6 +2170,7 @@ static void *avd_imm_reinit_bg_thread(void *_cb) {
 osaf_mutex_unlock_ordie(_reinit_mutex);
 exit(EXIT_FAILURE);
   }
+  avd_check_nodes_after_renit_imm();
 } else {
   /* become applier and re-read the config */
   rc = avd_imm_applier_set();
diff --git a/src/amf/amfd/node.cc b/src/amf/amfd/node.cc
index 201f1fc..b6ebcc8 100644
--- a/src/amf/amfd/node.cc
+++ b/src/amf/amfd/node.cc
@@ -25,6 +25,7 @@
 #include "amf/amfd/amfd.h"
 #include "amf/amfd/cluster.h"
 #include "amf/amfd/imm.h"
+#include "amf/amfd/util.h"
 #include 
 
 AmfDb *node_name_db = 0; /* SaNameT index */
@@ -153,57 +154,23 @@ AVD_AVND *avd_node_new(const std::string ) {
 void avd_node_delete(AVD_AVND *node) {
   TRACE_ENTER();
   osafassert(node->pg_csi_list.n_nodes == 0);
-  if (node->node_info.nodeId) avd_node_delete_nodeid(node);
-  /* Check if the SUs and related objects are still left. This can
- happen on Standby Amfd when it has just read the configuration
- and before it becomes applier, Act Amfd deletes SUs. Those SUs
- will be left out at Standby Amfd. Though this could be rare.*/
-  if (avd_cb->avail_state_avd != SA_AMF_HA_ACTIVE) {
-if (node->list_of_su.empty() != true) {
-  std::set su_list;
-  std::set comp_list;
-  for (const auto  : node->list_of_su) su_list.insert(su->name);
-  for (std::set::const_iterator iter = su_list.begin();
-   iter != su_list.end(); ++iter) {
-AVD_SU *su = su_db->find(*iter);
-TRACE("Standby Amfd, su '%s' not deleted", su->name.c_str());
-for (const auto  : su->list_of_comp)
-  comp_list.insert(Amf::to_string(>comp_info.name));
-for (std::set::const_iterator iter1 = comp_list.begin();
- iter1 != comp_list.end(); ++iter1) {
-  AVD_COMP *comp = comp_db->find(*iter1);
-  TRACE("Standby Amfd, comp '%s' not deleted",
-osaf_extended_name_borrow(>comp_info.name));
-
-  std::map::iterator it =
-  compcstype_db->begin();
-  while (it != compcstype_db->end()) {
-AVD_COMPCS_TYPE *compcstype = it->second;
-if (compcstype->comp == comp) {
-  TRACE("Standby Amfd, compcstype '%s' not deleted",
-compcstype->name.c_str());
-  it = compcstype_db->erase(it);
-  delete compcstype;
-} else
-  ++it;
-  }
+  if (node->node_info.nodeId) {
+avd_node_delete_nodeid(node);
+  }
 
-  /* Delete the Comp. */
-  struct CcbUtilOperationData opdata;
-  osaf_extended_name_alloc(
-  osaf_extended_name_borrow(>comp_info.name),
-  );
-  comp_ccb_apply_delete_hdlr();
-}
-comp_list.clear();
-/* Delete the SU. */
-struct CcbUtilOperationData opdata;
-opdata.userData = su;
-su_ccb_apply_delete_hdlr();
-  }
-  su_list.clear();
-}
+  std::set su_list;
+  for (const auto  : node->list_of_ncs_su) su_list.insert(su->name);
+  for (const auto  : node->list_of_su) su_list.insert(su->name);
+  for (std::set::const_iterator iter = su_list.begin();
+   iter != su_list.end(); ++iter) {
+AVD_SU *su = su_db->find(*iter);
+LOG_WA("su '%s' not deleted, delete it", su->name.c_str());
+struct CcbUtilOperationData opdata;
+opdata.userData = su;
+su_ccb_apply_delete_hdlr();
   }
+  su_list.clear();
+
   m_AVSV_SEND_CKPT_UPDT_ASYNC_RMV(avd_cb, node, AVSV_CKPT_AVD_NODE_CONFIG);
   node_name_db->erase(node->name);
   delete node;
@@ -1678,3 +1645,18 @@ bool