Re: [devel] [PATCH 5/5] build: fix compile errors with gcc 9.x [#3134]

2020-02-03 Thread Tran Thuan
Hi Alex,

 

About test_ntf_imcn.cc, please update following too

Since you add “return” then static code check report leak “ f ”.

 

@@ -6202,6 +6202,7 @@ __attribute__((constructor)) static void 
ntf_imcn_constructor(void) {

 snprintf(cp_cmd, sizeof(cp_cmd), "cp ");

 if ((strlen(line) - 1) > (sizeof(cp_cmd) - sizeof("cp "))) {

   printf("line: %s too long", line);

+  fclose(f);

   return;

 }

 

About SmfUtils.cc:

 

- strncpy(*((SaStringT *)*i_value), i_str, len - 1);
+ strncpy(*((SaStringT *)*i_value), i_str, len + 1);
(*((SaStringT *)*i_value))[len] = '\0';

 

=> strncpy with “len + 1” then later overwrite with ‘\0’.

I suggest strncpy with “len” as original code to avoid redundant changes.

 

Best Regards,

ThuanTr

 

From: Alex Jones  
Sent: Monday, February 3, 2020 10:39 PM
To: thuan.t...@dektech.com.au; vu.m.ngu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Alex Jones 
Subject: [PATCH 5/5] build: fix compile errors with gcc 9.x [#3134]

 

Rework fixes in NTF and SMF.
---
src/ntf/apitest/test_ntf_imcn.cc | 2 +-
src/smf/smfd/SmfUtils.cc | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/ntf/apitest/test_ntf_imcn.cc b/src/ntf/apitest/test_ntf_imcn.cc
index 51b9076c6..04f155074 100644
--- a/src/ntf/apitest/test_ntf_imcn.cc
+++ b/src/ntf/apitest/test_ntf_imcn.cc
@@ -1140,7 +1140,7 @@ static SaAisErrorT set_add_info(
>additionalInfo[idx].infoValue);
if (error == SA_AIS_OK) {
strcpy(reinterpret_cast(temp), infoValue);
- temp[strlen(infoValue) - 1] = '\0';
+ //temp[strlen(infoValue)] = '\0';
nHeader->additionalInfo[idx].infoId = infoId;
nHeader->additionalInfo[idx].infoType = SA_NTF_VALUE_STRING;
}
diff --git a/src/smf/smfd/SmfUtils.cc b/src/smf/smfd/SmfUtils.cc
index 2d539e7c2..f1593b4cf 100644
--- a/src/smf/smfd/SmfUtils.cc
+++ b/src/smf/smfd/SmfUtils.cc
@@ -993,7 +993,7 @@ bool smf_stringToValue(SaImmValueTypeT i_type, 
SaImmAttrValueT *i_value,
len = strlen(i_str);
*i_value = malloc(sizeof(SaStringT));
*((SaStringT *)*i_value) = (SaStringT)malloc(len + 1);
- strncpy(*((SaStringT *)*i_value), i_str, len - 1);
+ strncpy(*((SaStringT *)*i_value), i_str, len + 1);
(*((SaStringT *)*i_value))[len] = '\0';
break;
case SA_IMM_ATTR_SAANYT:
-- 
2.21.1



  _  

Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that is confidential and/or proprietary for the sole 
use of the intended recipient. Any review, disclosure, reliance or distribution 
by others or forwarding without express permission is strictly prohibited. If 
you are not the intended recipient, please notify the sender immediately and 
then delete all copies, including any attachments.

  _  


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] dtm: rotate all logtraces on demand [#3133]

2020-01-07 Thread Tran Thuan
Hi,

Update inline comment.

Best Regards,
ThuanTr

-Original Message-
From: Tran Thuan  
Sent: Tuesday, January 7, 2020 4:22 PM
To: 'phuc.h.chau' ; 'thang.d.ngu...@dektech.com.au' 
;
'minh.c...@dektech.com.au' ; 
'vu.m.ngu...@dektech.com.au' ;
'gary@dektech.com.au' 
Cc: 'opensaf-devel@lists.sourceforge.net' 
Subject: RE: [devel] [PATCH 1/1] dtm: rotate all logtraces on demand [#3133]

Hi Phuc,

I have minor comments with [Thuan].

Best Regards,
ThuanTr

-Original Message-
From: phuc.h.chau  
Sent: Tuesday, January 7, 2020 3:11 PM
To: thang.d.ngu...@dektech.com.au; minh.c...@dektech.com.au; 
vu.m.ngu...@dektech.com.au; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] dtm: rotate all logtraces on demand [#3133]

Adding a new option '--all' that means to rotate all existing logtrace
if given along with the '--rotate' option.

This patch also corrects wrong indentation in osaflog.cc file.
---
 src/base/log_writer.h   |  1 +
 src/dtm/README  | 15 +-
 src/dtm/tools/osaflog.cc| 44 +++--
 src/dtm/transport/log_server.cc | 12 ++-
 src/dtm/transport/log_server.h  |  8 ++--
 5 files changed, 61 insertions(+), 19 deletions(-)

diff --git a/src/base/log_writer.h b/src/base/log_writer.h
index ab2bf32..abd6d47 100644
--- a/src/base/log_writer.h
+++ b/src/base/log_writer.h
@@ -47,6 +47,7 @@ class LogWriter {
   void Flush();
   void RotateLog();
   void SetLogFile(const std::string& log_file) { log_file_ = log_file; }
+  size_t file_size() const { return current_file_size_; }
 [Thuan]: should name function as other functions? e.g: GetFileSize()

  private:
   constexpr static const size_t kBufferSize = 128 * size_t{1024};
diff --git a/src/dtm/README b/src/dtm/README
index 430ff19..ff73af7 100644
--- a/src/dtm/README
+++ b/src/dtm/README
@@ -190,6 +190,9 @@ Options:
 --delete  Delete the specified LOGSTREAM(s) by
   removing allocated resources in the log
   server. Does not delete log files from disk.
+--rotate  Rotate the specified LOGSTREAM(s).
+--all Rotate all LOGSTREAM(s).
+  This option only works with '--rotate'.
 --max-file-size=SIZE  Set the maximum size of the log file to
   SIZE bytes. The log file will be rotated
   when it exceeds this size. Suffixes k, M and
@@ -197,4 +200,14 @@ Options:
   gigabytes.
 --max-backups=NUM Set the maximum number of backup files to
   retain during log rotation to NUM.
-
+--extract-trace  
+  If a process produces a core dump file has
+  THREAD_TRACE_BUFFER enabled, this option
+  reads the  to extract the trace
+  strings in all threads and writes them to
+  the  file.
+--max-idle=NUMSet the maximum number of idle time to NUM"
+  minutes. If a stream has not been used for
+  the given time, the stream will be closed.
+  Given zero (default) to max-idle to disable
+  this functionality.
diff --git a/src/dtm/tools/osaflog.cc b/src/dtm/tools/osaflog.cc
index f6fa168..b1fb461 100644
--- a/src/dtm/tools/osaflog.cc
+++ b/src/dtm/tools/osaflog.cc
@@ -55,6 +55,7 @@ uint64_t Random64Bits(uint64_t seed);
 bool PrettyPrint(const std::string& log_stream);
 bool Delete(const std::string& log_stream);
 bool Rotate(const std::string& log_stream);
+bool RotateAll();
 std::list OpenLogFiles(const std::string& log_stream);
 std::string PathName(const std::string& log_stream, int suffix);
 uint64_t GetInode(int fd);
@@ -72,6 +73,7 @@ int main(int argc, char** argv) {
   {"print", no_argument, nullptr, 'p'},
   {"delete", no_argument, nullptr, 'd'},
   {"rotate", no_argument, nullptr, 'r'},
+  {"all", no_argument, nullptr, 'a'},
   {"extract-trace", required_argument, 0, 'e'},
   {"max-idle", required_argument, 0, 'i'},
   {0, 0, 0, 0}};
@@ -93,6 +95,7 @@ int main(int argc, char** argv) {
   bool pretty_print_set = false;
   bool delete_set = false;
   bool rotate_set = false;
+  bool rotate_all = false;
   bool max_file_size_set = false;
   bool max_backups_set = false;
   bool max_idle_set = false;
@@ -105,7 +108,7 @@ int main(int argc, char** argv) {
 exit(EXIT_FAILURE);
   }
 
-  while ((option = getopt_long(argc, argv, "m:b:p:f:e:i:r",
+  while ((option = getopt_long(argc, argv, "m:b:p:f:e:i:ra",
long_options,

Re: [devel] [PATCH 1/1] dtm: rotate all logtraces on demand [#3133]

2020-01-07 Thread Tran Thuan
Hi Phuc,

I have minor comments with [Thuan].

Best Regards,
ThuanTr

-Original Message-
From: phuc.h.chau  
Sent: Tuesday, January 7, 2020 3:11 PM
To: thang.d.ngu...@dektech.com.au; minh.c...@dektech.com.au; 
vu.m.ngu...@dektech.com.au; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] dtm: rotate all logtraces on demand [#3133]

Adding a new option '--all' that means to rotate all existing logtrace
if given along with the '--rotate' option.

This patch also corrects wrong indentation in osaflog.cc file.
---
 src/base/log_writer.h   |  1 +
 src/dtm/README  | 15 +-
 src/dtm/tools/osaflog.cc| 44 +++--
 src/dtm/transport/log_server.cc | 12 ++-
 src/dtm/transport/log_server.h  |  8 ++--
 5 files changed, 61 insertions(+), 19 deletions(-)

diff --git a/src/base/log_writer.h b/src/base/log_writer.h
index ab2bf32..abd6d47 100644
--- a/src/base/log_writer.h
+++ b/src/base/log_writer.h
@@ -47,6 +47,7 @@ class LogWriter {
   void Flush();
   void RotateLog();
   void SetLogFile(const std::string& log_file) { log_file_ = log_file; }
+  size_t file_size() const { return current_file_size_; }
 [Thuan]: should name function as other functions? e.g: GetFileSize()

  private:
   constexpr static const size_t kBufferSize = 128 * size_t{1024};
diff --git a/src/dtm/README b/src/dtm/README
index 430ff19..ff73af7 100644
--- a/src/dtm/README
+++ b/src/dtm/README
@@ -190,6 +190,9 @@ Options:
 --delete  Delete the specified LOGSTREAM(s) by
   removing allocated resources in the log
   server. Does not delete log files from disk.
+--rotate  Rotate the specified LOGSTREAM(s).
+--all Rotate all LOGSTREAM(s).
+  This option only works with '--rotate'.
 --max-file-size=SIZE  Set the maximum size of the log file to
   SIZE bytes. The log file will be rotated
   when it exceeds this size. Suffixes k, M and
@@ -197,4 +200,14 @@ Options:
   gigabytes.
 --max-backups=NUM Set the maximum number of backup files to
   retain during log rotation to NUM.
-
+--extract-trace  
+  If a process produces a core dump file has
+  THREAD_TRACE_BUFFER enabled, this option
+  reads the  to extract the trace
+  strings in all threads and writes them to
+  the  file.
+--max-idle=NUMSet the maximum number of idle time to NUM"
+  minutes. If a stream has not been used for
+  the given time, the stream will be closed.
+  Given zero (default) to max-idle to disable
+  this functionality.
diff --git a/src/dtm/tools/osaflog.cc b/src/dtm/tools/osaflog.cc
index f6fa168..b1fb461 100644
--- a/src/dtm/tools/osaflog.cc
+++ b/src/dtm/tools/osaflog.cc
@@ -55,6 +55,7 @@ uint64_t Random64Bits(uint64_t seed);
 bool PrettyPrint(const std::string& log_stream);
 bool Delete(const std::string& log_stream);
 bool Rotate(const std::string& log_stream);
+bool RotateAll();
 std::list OpenLogFiles(const std::string& log_stream);
 std::string PathName(const std::string& log_stream, int suffix);
 uint64_t GetInode(int fd);
@@ -72,6 +73,7 @@ int main(int argc, char** argv) {
   {"print", no_argument, nullptr, 'p'},
   {"delete", no_argument, nullptr, 'd'},
   {"rotate", no_argument, nullptr, 'r'},
+  {"all", no_argument, nullptr, 'a'},
   {"extract-trace", required_argument, 0, 'e'},
   {"max-idle", required_argument, 0, 'i'},
   {0, 0, 0, 0}};
@@ -93,6 +95,7 @@ int main(int argc, char** argv) {
   bool pretty_print_set = false;
   bool delete_set = false;
   bool rotate_set = false;
+  bool rotate_all = false;
   bool max_file_size_set = false;
   bool max_backups_set = false;
   bool max_idle_set = false;
@@ -105,7 +108,7 @@ int main(int argc, char** argv) {
 exit(EXIT_FAILURE);
   }
 
-  while ((option = getopt_long(argc, argv, "m:b:p:f:e:i:r",
+  while ((option = getopt_long(argc, argv, "m:b:p:f:e:i:ra",
long_options, _index)) != -1) {
 switch (option) {
   case 'p':
@@ -121,6 +124,9 @@ int main(int argc, char** argv) {
   case 'r':
 rotate_set = true;
 break;
+  case 'a':
+rotate_all = true;
[Thuan] Can we add check here?
   rotate_set = false;
   If (!rotate_set || optind < argc) {
   PrintUsage(argv[0]);
   exit(EXIT_FAILURE);
   }
+break;
   case 'm':
 max_file_size = base::StrToUint64(optarg,
   

Re: [devel] [PATCH 1/1] amfnd: reset restart flag in failover context [#3135]

2020-01-01 Thread Tran Thuan
Hi Thang,

Just one minor comment that commit message should revise.

"amfnd: reset SU restart flag in COMP failover context [#3135]

When SU restart is escalated to component failover, reset the SU restart flag."

Best Regards,
ThuanTr

-Original Message-
From: thang.d.nguyen  
Sent: Monday, December 30, 2019 9:30 AM
To: gary@dektech.com.au; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; thang.d.nguyen 

Subject: [PATCH 1/1] amfnd: reset restart flag in failover context [#3135]

When SU reStart is escalated to component failoverReset
the restart flag need resetting.
---
 src/amf/amfnd/err.cc  | 4 +++-
 src/amf/amfnd/susm.cc | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/amf/amfnd/err.cc b/src/amf/amfnd/err.cc
index db3baabc7..65cc3a5c3 100644
--- a/src/amf/amfnd/err.cc
+++ b/src/amf/amfnd/err.cc
@@ -879,8 +879,10 @@ uint32_t avnd_err_rcvr_comp_failover(AVND_CB *cb, 
AVND_COMP *failed_comp) {
 
   /* We are now in the context of failover, forget the reset restart admin op
* id*/
-  if (m_AVND_SU_IS_RESTART(su))
+  if (m_AVND_SU_IS_RESTART(su)) {
+reset_suRestart_flag(su);
 su->admin_op_Id = static_cast(0);
+  }
 
   // TODO: there should be no difference between PI/NPI comps
   if (m_AVND_SU_IS_PREINSTANTIABLE(su)) {
diff --git a/src/amf/amfnd/susm.cc b/src/amf/amfnd/susm.cc
index c1aa9e44b..86811f1e4 100644
--- a/src/amf/amfnd/susm.cc
+++ b/src/amf/amfnd/susm.cc
@@ -947,7 +947,8 @@ static bool susi_operation_in_progress(AVND_SU *su, 
AVND_SU_SI_REC *si) {
   _csi->si_dll_node)) {
 if (m_AVND_COMP_IS_FAILED(t_csi->comp) ||
 (su->pres == SA_AMF_PRESENCE_INSTANTIATION_FAILED) ||
-(su->pres == SA_AMF_PRESENCE_TERMINATION_FAILED))
+(su->pres == SA_AMF_PRESENCE_TERMINATION_FAILED) ||
+!m_AVND_COMP_IS_REG(t_csi->comp))
   continue;
 else if (m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_ASSIGNING(t_csi) ||
  m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_REMOVING(t_csi) ||
-- 
2.17.1




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amfd: not allow unlock/lock if su is under restarting [#3132]

2020-01-01 Thread Tran Thuan
Hi Thang,

ACK

Best Regards,
ThuanTr

-Original Message-
From: thang.d.nguyen  
Sent: Thursday, January 2, 2020 10:06 AM
To: gary@dektech.com.au; minh.c...@dektech.com.au; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; thang.d.nguyen 

Subject: [PATCH 1/1] amfd: not allow unlock/lock if su is under restarting 
[#3132]

Not allow unlock/lock if su is under restarting.
---
 src/amf/amfd/su.cc | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/amf/amfd/su.cc b/src/amf/amfd/su.cc
index e1da8f726..5a6c69c33 100644
--- a/src/amf/amfd/su.cc
+++ b/src/amf/amfd/su.cc
@@ -1479,6 +1479,15 @@ static void su_admin_op_cb(SaImmOiHandleT immoi_handle,
 goto done;
   }
 
+  if ((su->pend_cbk.admin_oper == SA_AMF_ADMIN_RESTART) &&
+  ((op_id == SA_AMF_ADMIN_UNLOCK) || (op_id == SA_AMF_ADMIN_LOCK))) {
+report_admin_op_error(
+immoi_handle, invocation, SA_AIS_ERR_TRY_AGAIN, nullptr,
+"SU'%s', undergoing admin operation'%u'",
+su->name.c_str(), su->pend_cbk.admin_oper);
+goto done;
+  }
+
   /* Validation has passed and admin operation should be done. Proceed with
* it... */
   switch (op_id) {
-- 
2.17.1




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amfd: not allow unlock if su is under restarting [#3132]

2019-12-31 Thread Tran Thuan
Hi Thang,

Is there any how admin op should be rejected NOT_SUPPORT but it reject 
TRY_AGAIN with your patch?
That's why I think it should place after that IF block. If you assure that 
won't happen then no more comment from me.

Best Regards,
ThuanTr

-Original Message-
From: Thang  
Sent: Tuesday, December 31, 2019 8:36 AM
To: 'Tran Thuan' ; gary@dektech.com.au; 
minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] amfd: not allow unlock if su is under restarting 
[#3132]

Hi Thuan,

I think there is no issue.

B.R/Thang

-Original Message-
From: Tran Thuan  
Sent: Monday, December 30, 2019 4:24 PM
To: 'thang.d.nguyen' ;
gary@dektech.com.au; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] amfd: not allow unlock if su is under restarting
[#3132]

Hi Thang,

One comment inline.

Best Regards,
ThuanTr

-Original Message-
From: thang.d.nguyen 
Sent: Monday, December 30, 2019 2:12 PM
To: gary@dektech.com.au; minh.c...@dektech.com.au;
thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; thang.d.nguyen

Subject: [PATCH 1/1] amfd: not allow unlock if su is under restarting
[#3132]

Not allow unlock if su is under restarting.
---
 src/amf/amfd/su.cc | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/amf/amfd/su.cc b/src/amf/amfd/su.cc index
e1da8f726..82e2b457f 100644
--- a/src/amf/amfd/su.cc
+++ b/src/amf/amfd/su.cc
@@ -1337,6 +1337,15 @@ static void su_admin_op_cb(SaImmOiHandleT
immoi_handle,
 goto done;
   }
 
+  if ((su->pend_cbk.admin_oper == SA_AMF_ADMIN_RESTART) &&
+  (op_id == SA_AMF_ADMIN_UNLOCK)) {
+report_admin_op_error(
+immoi_handle, invocation, SA_AIS_ERR_TRY_AGAIN, nullptr,
+"SU'%s', undergoing admin operation'%u'",
+su->name.c_str(), su->pend_cbk.admin_oper);
+goto done;
+  }
+
[Thuan] I think should move new check to after below IF

   if ((su->sg_of_su->sg_ncs_spec == true) &&
   (cb->node_id_avd == su->su_on_node->node_info.nodeId)) {
 report_admin_op_error(
--
2.17.1






___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amfd: not allow unlock if su is under restarting [#3132]

2019-12-30 Thread Tran Thuan
Hi Thang,

One comment inline.

Best Regards,
ThuanTr

-Original Message-
From: thang.d.nguyen  
Sent: Monday, December 30, 2019 2:12 PM
To: gary@dektech.com.au; minh.c...@dektech.com.au; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; thang.d.nguyen 

Subject: [PATCH 1/1] amfd: not allow unlock if su is under restarting [#3132]

Not allow unlock if su is under restarting.
---
 src/amf/amfd/su.cc | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/amf/amfd/su.cc b/src/amf/amfd/su.cc
index e1da8f726..82e2b457f 100644
--- a/src/amf/amfd/su.cc
+++ b/src/amf/amfd/su.cc
@@ -1337,6 +1337,15 @@ static void su_admin_op_cb(SaImmOiHandleT immoi_handle,
 goto done;
   }
 
+  if ((su->pend_cbk.admin_oper == SA_AMF_ADMIN_RESTART) &&
+  (op_id == SA_AMF_ADMIN_UNLOCK)) {
+report_admin_op_error(
+immoi_handle, invocation, SA_AIS_ERR_TRY_AGAIN, nullptr,
+"SU'%s', undergoing admin operation'%u'",
+su->name.c_str(), su->pend_cbk.admin_oper);
+goto done;
+  }
+
[Thuan] I think should move new check to after below IF

   if ((su->sg_of_su->sg_ncs_spec == true) &&
   (cb->node_id_avd == su->su_on_node->node_info.nodeId)) {
 report_admin_op_error(
-- 
2.17.1




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amfd: one adm op activa on SU at a time [#3132]

2019-12-26 Thread Tran Thuan
Hi Thang,

I found an old commit that may lead to your ticket.
Can you check if your changes bring old issue back?
--
amfd: allow lock on su after issue of shutdown [#582]
Shutdown operation was rejected because shitdown operation was going on. Amf 
should respect lock
operation as per Specs. Now, if lock is issued on su when shutdown is going
on, then Amf will send response for shutdown first and then start lock
operation. Once lock operation is completed, then Amf will respond to imm
for lock operation.

/* Avoid multiple admin operations on other SUs belonging to the same 
SG. */
for (su_ptr = su->sg_of_su->list_of_su; su_ptr != NULL; su_ptr = 
su_ptr->sg_list_su_next) {
-   if (su_ptr->pend_cbk.invocation != 0) {
+   /* su's sg_fsm_state is checked below, just check other su. */
+   if ((su != su_ptr) && (su_ptr->pend_cbk.invocation != 0)) {
--
Another concern that why you move your checking to before
check "Admin operation on Active middleware SU is not allowed"?
Why not keep it after this check?

Best Regards,
ThuanTr

-Original Message-
From: thang.d.nguyen  
Sent: Wednesday, December 25, 2019 9:37 AM
To: gary@dektech.com.au; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] amfd: one adm op activa on SU at a time [#3132]

Not allow admin op on SU if other is activating.
---
 src/amf/amfd/su.cc | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/src/amf/amfd/su.cc b/src/amf/amfd/su.cc
index e1da8f726..6223cd9e4 100644
--- a/src/amf/amfd/su.cc
+++ b/src/amf/amfd/su.cc
@@ -1337,6 +1337,23 @@ static void su_admin_op_cb(SaImmOiHandleT immoi_handle,
 goto done;
   }
 
+  node = su->get_node_ptr();
+  if (node->admin_node_pend_cbk.admin_oper != 0) {
+report_admin_op_error(
+immoi_handle, invocation, SA_AIS_ERR_TRY_AGAIN, nullptr,
+"Node'%s' hosting SU'%s', undergoing admin operation'%u'",
+node->name.c_str(), su->name.c_str(),
+node->admin_node_pend_cbk.admin_oper);
+goto done;
+  }
+  if (su->pend_cbk.invocation != 0) {
+report_admin_op_error(
+immoi_handle, invocation, SA_AIS_ERR_TRY_AGAIN, nullptr,
+"SU'%s', undergoing admin operation'%u'",
+su->name.c_str(), su->pend_cbk.admin_oper);
+goto done;
+  }
+
   if ((su->sg_of_su->sg_ncs_spec == true) &&
   (cb->node_id_avd == su->su_on_node->node_info.nodeId)) {
 report_admin_op_error(
@@ -1469,15 +1486,6 @@ static void su_admin_op_cb(SaImmOiHandleT immoi_handle,
   goto done;
 }
   }
-  node = su->get_node_ptr();
-  if (node->admin_node_pend_cbk.admin_oper != 0) {
-report_admin_op_error(
-immoi_handle, invocation, SA_AIS_ERR_TRY_AGAIN, nullptr,
-"Node'%s' hosting SU'%s', undergoing admin operation'%u'",
-node->name.c_str(), su->name.c_str(),
-node->admin_node_pend_cbk.admin_oper);
-goto done;
-  }
 
   /* Validation has passed and admin operation should be done. Proceed with
* it... */
-- 
2.17.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] imm: fix non-local user cannot access IMM when accessControlMode is in ENFORCED [#3043]

2019-12-22 Thread Tran Thuan
Hi Thien,

I have some comments inline.

Best Regards,
ThuanTr

-Original Message-
From: thien.m.huynh  
Sent: Thursday, December 19, 2019 9:12 AM
To: vu.m.ngu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] imm: fix non-local user cannot access IMM when 
accessControlMode is in ENFORCED [#3043]

---
 src/base/osaf_secutil.c| 57 +-
 src/base/osaf_secutil.h|  2 +-
 src/imm/immnd/immnd_evt.c  |  4 +--
 src/log/logd/lgs_config.cc |  2 +-
 4 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/src/base/osaf_secutil.c b/src/base/osaf_secutil.c
index 0e175c915..ef27fdded 100644
--- a/src/base/osaf_secutil.c
+++ b/src/base/osaf_secutil.c
@@ -42,6 +42,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "base/osaf_poll.h"
 
 #include "base/logtrace.h"
@@ -184,6 +186,53 @@ static void *auth_server_main(void *_fd)
return 0;
 }
 
+bool osaf_pid_is_member_of_group(pid_t pid, gid_t gid_auth)
+{
[Thuan] We can make this function become simpler with system() call:
char str[256];
snprintf(str, sizeof(str), "cat /proc/%d/status | grep "Groups:" | grep 
"%d", pid, gid_auth);
if (system(str) != 0) {
return false;
}
return true;

+   char path[50];
+   bool state = false;
+   size_t line_buf_size = 0;
+   ssize_t line_size;
+   char *line_buf = NULL;
+   FILE *stream;
+
+   if (!pid)
+   return false;
+   sprintf(path, "/proc/%d/status", pid);
+   stream = fopen(path, "r");
+   if (!stream) {
+   LOG_ER("Error opening file");
+   goto done;
+   }
+
+   while ((line_size = getline(_buf, _buf_size, stream)) != -1) {
+   if (strstr(line_buf, "Groups") != NULL) {
+   char *pch;
+   for (ssize_t i = 0; i < line_size; i++) {
+   if (line_buf[i] == 0x09) {
+   line_buf[i] = 0x20;
+   break;
+   }
+   }
+
+   pch = strtok(line_buf, " ");
+   while (pch != NULL && pch[0] != 0x0a) {
+   if (isdigit(pch[0]) != 0 &&
+   (gid_t)atoi(pch) == gid_auth) {
+   state = true;
+   goto done;
+   }
+   pch = strtok(NULL, " ");
+   }
+   goto done;
+   }
+   }
+done:
+   free(line_buf);
+   line_buf = NULL;
+   fclose(stream);
+   return state;
+}
+
 /*** public interface follows*** */
 
 int osaf_auth_server_create(const char *pathname,
@@ -220,7 +269,7 @@ int osaf_auth_server_create(const char *pathname,
 }
 
 /* used by server, logging is OK */
-bool osaf_user_is_member_of_group(uid_t uid, const char *groupname)
+bool osaf_user_is_member_of_group(uid_t uid, const char *groupname, pid_t pid)
 {
long grpmembufsize = sysconf(_SC_GETGR_R_SIZE_MAX);
if (grpmembufsize < 0)
@@ -263,6 +312,12 @@ bool osaf_user_is_member_of_group(uid_t uid, const char 
*groupname)
return false;
}
 
+   if (osaf_pid_is_member_of_group(pid, client_grp->gr_gid)) {
+   free(pwdmembuf);
+   free(grpmembuf);
+   return true;
+   }
+
[Thuan] Need check pid > 0? Also, I think should move the new function call in 
below block:
if (client_pwd == NULL) {
if (pid > 0 && osaf_pid_is_member_of_group(pid, 
client_grp->gr_gid)) {
free(pwdmembuf);
free(grpmembuf);
return true;
}
LOG_WA("%s: user id %u does not exist", __FUNCTION__,
   (unsigned)uid);
free(pwdmembuf);
free(grpmembuf);
return false;
}
[Thuan] Maybe, you can revise the whole function use "goto done;" to free() and 
return value at the end function.

// get password file entry for user
struct passwd pbuf;
struct passwd *client_pwd;
diff --git a/src/base/osaf_secutil.h b/src/base/osaf_secutil.h
index a2389241c..d60cafac7 100644
--- a/src/base/osaf_secutil.h
+++ b/src/base/osaf_secutil.h
@@ -86,7 +86,7 @@ int osaf_auth_server_create(const char *_pathname,
  * @param groupname
  * @return true if member
  */
-bool osaf_user_is_member_of_group(uid_t uid, const char *groupname);
+bool osaf_user_is_member_of_group(uid_t uid, const char *groupname, pid_t pid);
 
 /**
  * Get list of groups that a user belong to
diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c
index 3bd56fe34..5e7c1fe5c 100644
--- 

Re: [devel] [PATCH 1/1] mds: update mdstest 27 4 to use waitpid() [#3130]

2019-12-19 Thread Tran Thuan
Hi,

I will push the patch if no comment by end of today.

Best Regards,
ThuanTr

-Original Message-
From: thuan.tran  
Sent: Tuesday, December 17, 2019 10:49 AM
To: Minh Hon Chau ; thuan . tran 
; thang . d . nguyen
; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 1/1] mds: update mdstest 27 4 to use waitpid() [#3130]

Wait() may stuck forever if other receiver already exited before.
---
 src/mds/apitest/mdstipc_api.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/mds/apitest/mdstipc_api.c b/src/mds/apitest/mdstipc_api.c
index 4a97f99e9..1f16a6a93 100644
--- a/src/mds/apitest/mdstipc_api.c
+++ b/src/mds/apitest/mdstipc_api.c
@@ -13612,10 +13612,13 @@ void tet_overload_tp_4(void)
1, fr_svcids);
mds_shutdown();
if (FAIL == 0) {
+   pid_t rc;
int status;
-   wait();
-   if (WIFEXITED(status) && \
-   (WEXITSTATUS(status) != 0)) {
+   do {
+   rc = waitpid(pid2, , 0);
+   } while ((rc == -1) && (errno == EINTR));
+   if ((rc == -1) || \
+   (WIFEXITED(status) && (WEXITSTATUS(status) 
!= 0))) {
printf("\nThe other receiver FAIL\n");
FAIL = 1;
}
-- 
2.17.1




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] mds: check svc_cnt greater than zero before decrease [#3129]

2019-12-19 Thread Tran Thuan
Hi,

I will push the patch if no comment by end of today.

Best Regards,
ThuanTr

-Original Message-
From: thuan.tran  
Sent: Monday, December 16, 2019 4:41 PM
To: Minh Hon Chau ; thuan . tran 
; thang . d . nguyen
; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 1/1] mds: check svc_cnt greater than zero before decrease 
[#3129]

- #3102 introduce remote adest list which refer svc_cnt to start/stop
timer to delete adest from list. But in tcp transport, svc down may
come twice then svc_cnt is updated incorrectly. Later cause crash inside
function stop_mds_down_tmr().
- check svc_cnt greater than zero before decrease it then no crash as
consequence of this wrong counter.
---
 src/mds/mds_c_api.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mds/mds_c_api.c b/src/mds/mds_c_api.c
index a76b2b76a..42e26f91c 100644
--- a/src/mds/mds_c_api.c
+++ b/src/mds/mds_c_api.c
@@ -3661,7 +3661,7 @@ uint32_t mds_mcm_svc_down(PW_ENV_ID pwe_id, MDS_SVC_ID 
svc_id, V_DEST_RL role,
(MDS_ADEST_INFO *)ncs_patricia_tree_get(
_mds_mcm_cb->adest_list,
(uint8_t *));
-   if (adest_info) {
+   if (adest_info && adest_info->svc_cnt > 0) {
adest_info->svc_cnt--;
if (adest_info->svc_cnt == 0) {
m_MDS_LOG_INFO(
-- 
2.17.1




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] mds: fix ckpt 20 11 failure [#3127]

2019-12-09 Thread Tran Thuan



Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau  
Sent: Tuesday, December 10, 2019 9:51 AM
To: thuan.tran ; 'Nguyen Minh Vu' 
; 'thang . d . nguyen' 
; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: fix ckpt 20 11 failure [#3127]

Hi Thuan,

- We could give the patch title a bit more meanings than "fix ckpt 20 
11..", for example, something as "Using timer to continue sending queued 
message".
[Thuan] OK, will update short commit message.

- And a few comments inline

Thanks

Minh

On 5/12/19 3:05 pm, thuan.tran wrote:
> - In overflow, receive chunk ack may stuck in retrying to send pending
> messages then later chunk ack comming cannot proceed.
> - Instead of retrying to send pending messages, reuse timer send chunk
> ack to trigger send pending messages if any. By this, even no more Nack
> or ChunkAck event comming, pending messages will be sent by timer.
> ---
>   src/mds/mds_dt_tipc.c| 12 ++---
>   src/mds/mds_tipc_fctrl_intf.cc   | 10 
>   src/mds/mds_tipc_fctrl_portid.cc | 88 ++--
>   src/mds/mds_tipc_fctrl_portid.h  |  1 +
>   4 files changed, 56 insertions(+), 55 deletions(-)
>
> diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
> index 9b3290833..6b30846a1 100644
> --- a/src/mds/mds_dt_tipc.c
> +++ b/src/mds/mds_dt_tipc.c
> @@ -3183,13 +3183,13 @@ ssize_t mds_retry_sendto(int sockfd, const void *buf, 
> size_t len, int flags,
>   {
>   int retry = 5;
>   ssize_t send_len = 0;
> - while (retry >= 0) {
> + while (retry-- >= 0) {
>   send_len = sendto(sockfd, buf, len, flags, dest_addr, addrlen);
>   if (send_len == len) {
>   return send_len;
> - } else if (retry-- > 0) {
> - if (errno != ENOMEM &&
> - errno != ENOBUFS &&
> + } else if (retry >= 0) {
> + if (errno != EAGAIN && errno != EWOULDBLOCK &&
> + errno != ENOMEM && errno != ENOBUFS &&
>   errno != EINTR)
>   break;
>   osaf_nanosleep();

[Minh] We may need to do error-log the strerror and errno in case of 
failure in mds_retry_sendto(). Also,
[Thuan] error-log already done by upper callers.

uint32_t TipcPortId::Send(uint8_t* data, uint16_t length) {

...

   m_MDS_LOG_ERR("FCTRL: sendto() failed, Error[%s]", strerror(errno));
}

this logging "sendto()" should be now "TipcPortId::Send()"
[Thuan] OK, will update this log message.

> @@ -3242,7 +3242,7 @@ static uint32_t mdtm_sendto(uint8_t *buffer, uint16_t 
> buff_len,
>   if (mds_tipc_fctrl_trysend(id, buffer, buff_len, is_queued)
>   == NCSCC_RC_SUCCESS) {
>   send_len = mds_retry_sendto(
> - tipc_cb.BSRsock, buffer, buff_len, 0,
> + tipc_cb.BSRsock, buffer, buff_len, MSG_DONTWAIT,
>   (struct sockaddr *)_addr, 
> sizeof(server_addr));
[Minh] There must be a reason that you want to use non-blocking with 
MSG_DONTWAIT?
[Thuan] without that flag, ckpt 20 11 hang in sendto() then fctrl control 
thread cannot handle anything.

>   if (send_len == buff_len) {
>   m_MDS_LOG_INFO("MDTM: Successfully sent message");
> @@ -3289,7 +3289,7 @@ static uint32_t mdtm_mcast_sendto(void *buffer, size_t 
> size,
>   /*This can be scope-down to dest_svc_id  server_inst TBD*/
>   server_addr.addr.nameseq.upper = HTONL(MDS_MDTM_UPPER_INSTANCE);
>   ssize_t send_len =
> - mds_retry_sendto(tipc_cb.BSRsock, buffer, size, 0,
> + mds_retry_sendto(tipc_cb.BSRsock, buffer, size, MSG_DONTWAIT,
>  (struct sockaddr *)_addr, sizeof(server_addr));
>   
>   if (send_len == size) {
> diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
> index 7d0571e7c..b20205686 100644
> --- a/src/mds/mds_tipc_fctrl_intf.cc
> +++ b/src/mds/mds_tipc_fctrl_intf.cc
> @@ -102,6 +102,8 @@ void tmr_exp_cbk(void* uarg) {
>   
>   void process_timer_event(const Event& evt) {
> bool txprob_restart = false;
> +  m_MDS_LOG_DBG("FCTRL: process timer event start [evt:%d]",
> +static_cast(evt.type_));
> for (auto i : portid_map) {
>   TipcPortId* portid = i.second;
>   
> @@ -113,16 +115,20 @@ void process_timer_event(const Event& evt) {
>   
>   if (evt.type_ == Event::Type::kEvtTmrChunkAck) {
> portid->ReceiveTmrChunkAck();
> +  portid->SendUnsentMsg();
>   }
[Minh] The idea now is using ChunkAck timer to continue sending unsent 
message. This fix comes from a situation that we failed in the middle of 
sending unsent message due to "Cannot allocate memory...". In the 
scenario without such error "Cannot allocate ...", the function 
SendUnsentMsg() here will be sending extra messages from the "receiving 
channel" as ChunkAck timer apart from 

Re: [devel] [PATCH 1/5] log: improve the resilience of log service [#3116]

2019-12-09 Thread Tran Thuan
Hi Vu,

Thanks. See my replies inline.

Best Regards,
ThuanTr

-Original Message-
From: Nguyen Minh Vu  
Sent: Monday, December 9, 2019 6:18 PM
To: Tran Thuan ; lennart.l...@ericsson.com; 
gary@dektech.com.au; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/5] log: improve the resilience of log service 
[#3116]

Hi Thuan,

See my responses inline.

Regards, Vu

On 12/9/19 5:32 PM, Tran Thuan wrote:
> Hi Vu,
>
> Some comments from me:
>
> - I think need remove xid name in code.
OK
> - CleanOverdueData() should loop to clean all overdue records stead of just 
> one overdue record.
No. It should only serve one element each time to avoid blocking the 
main thread.
[Thuan] OK, the function name Clean/Flush make me think about MANY (ALL)
Please consider to rename these functions. E.g: PopOverdueData()

> - In PeriodicCheck, don't need check is_iothread_ready() before Flush() 
> because it is checked inside Flush()
Ok. I will remove the check in `PeriodicCheck`.
> - Flush() mean write all records, but actually just try to write one log 
> record, I think should rename it.
Ok. Will rename it to 'FlushFrontElement`.
[Thuan] Avoid Flush, maybe WriteFrontElement or PopDataToWrite()

>
> Best Regards,
> ThuanTr
>
> -Original Message-
> From: Vu Minh Nguyen 
> Sent: Thursday, November 28, 2019 3:24 PM
> To: lennart.l...@ericsson.com; gary@dektech.com.au; 
> minh.c...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: [devel] [PATCH 1/5] log: improve the resilience of log service 
> [#3116]
>
> In order to improve resilience of OpenSAF LOG service when underlying
> file system is unresponsive, a queue is introduced to hold async
> write request up to an configurable time that is around 15 - 30 seconds.
>
> The readiness of the I/O thread will periodically check, and if it turns
> to ready state, the front element will go first. Returns SA_AIS_ERR_TRY_AGAIN
> to client if the element stays in the queue longer than the setting time.
>
> The queue capacity and the resilient time are configurable via the attributes:
> `logMaxPendingWriteRequests` and `logResilienceTimeout`.
>
> In default, this feature is disabled to keep log server backward compatible.
> ---
>   src/log/Makefile.am  |  21 +-
>   src/log/config/logsv_classes.xml |  43 ++-
>   src/log/logd/lgs_cache.cc| 469 +++
>   src/log/logd/lgs_cache.h | 287 +++
>   src/log/logd/lgs_config.cc   |  78 -
>   src/log/logd/lgs_config.h|  10 +-
>   src/log/logd/lgs_evt.cc  | 161 +++
>   src/log/logd/lgs_evt.h   |  10 +
>   src/log/logd/lgs_file.cc |   8 +-
>   src/log/logd/lgs_filehdl.cc  |  58 ++--
>   src/log/logd/lgs_imm.cc  |  40 ++-
>   src/log/logd/lgs_main.cc |  24 +-
>   src/log/logd/lgs_mbcsv.cc| 447 +++--
>   src/log/logd/lgs_mbcsv.h |  19 +-
>   src/log/logd/lgs_mbcsv_cache.cc  | 372 
>   src/log/logd/lgs_mbcsv_cache.h   | 110 
>   src/log/logd/lgs_mbcsv_v1.cc |   1 +
>   src/log/logd/lgs_mbcsv_v2.cc |   2 +
>   18 files changed, 1889 insertions(+), 271 deletions(-)
>   create mode 100644 src/log/logd/lgs_cache.cc
>   create mode 100644 src/log/logd/lgs_cache.h
>   create mode 100644 src/log/logd/lgs_mbcsv_cache.cc
>   create mode 100644 src/log/logd/lgs_mbcsv_cache.h
>
> diff --git a/src/log/Makefile.am b/src/log/Makefile.am
> index f63a4a053..3367ef4f6 100644
> --- a/src/log/Makefile.am
> +++ b/src/log/Makefile.am
> @@ -95,7 +95,9 @@ noinst_HEADERS += \
>   src/log/logd/lgs_nildest.h \
>   src/log/logd/lgs_unixsock_dest.h \
>   src/log/logd/lgs_common.h \
> - src/log/logd/lgs_amf.h
> + src/log/logd/lgs_amf.h \
> + src/log/logd/lgs_cache.h \
> + src/log/logd/lgs_mbcsv_cache.h
>   
>   
>   bin_PROGRAMS += bin/saflogger
> @@ -123,6 +125,15 @@ bin_osaflogd_CPPFLAGS = \
>   -DSA_EXTENDED_NAME_SOURCE \
>   $(AM_CPPFLAGS)
>   
> +# Enable this flag to simulate the case that file system is unresponsive
> +# during write log record. Mainly for testing the following enhancement:
> +# log: improve the resilience of log service [#3116].
> +# When enabled, log handle thread will be suspended 17 seconds every 02 write
> +# requests and only take affect if the `logMaxPendingWriteRequests` is set
> +# to an non-zero value.
> +bin_osaflogd_CPPFLAGS += -DSIMULATE_NFS_UNRESPONSE
> +
> +
>   bin_osaflogd_SOURCES = \
>   src/log/logd/lgs_amf.cc \
>   src/log/logd/lgs_clm.cc \
> @@ -147,7 +158,9 @@ bin_osaflogd_SOURCES = \
>   src/log/logd/lgs_util.c

Re: [devel] [PATCH 5/5] log: add test cases of improving the log resilience [#3116]

2019-12-09 Thread Tran Thuan
Hi Vu,

Thanks. See my reply inline.

Best Regards,
ThuanTr

-Original Message-
From: Nguyen Minh Vu  
Sent: Monday, December 9, 2019 6:30 PM
To: Tran Thuan ; lennart.l...@ericsson.com; 
gary@dektech.com.au; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 5/5] log: add test cases of improving the log 
resilience [#3116]

Hi Thuan,

See my responses inline.

Regards, Vu

On 12/9/19 5:35 PM, Tran Thuan wrote:
> Hi Vu,
>
> Some comments from me:
>
> - Remove xid in test code.
OK
> - Use AIS_EVALUATE() and SYS_EVALUATE() that you defined, but you still 
> direct use test_validate()/rc_validate()
The macros are used to validate the given pre-condition. If the 
condition does not meet, the test will get failed
and discontinue. These are used for different purposes.
[Thuan] OK, I see. But I don't see anywhere use SYS_EVALUATE()
> - polling_thread() is create thread and wait it done, why not directly call 
> the function?
OK. Good point.
> - In case of reboot SCs for headless, can we use system() command with 
> following? Then no need manual interaction.
> ssh SC-1 "rdegetrole |grep ACTIVE && reboot"
> ssh SC-1 "rdegetrole |grep STANDBY && reboot"
ssh does not work on UML containers.
[Thuan] Can we use "immadm -o 5 -p saClmAction:SA_STRING_T:stop "?
Just hope to make all test can run without manual interaction.

>
>
> Best Regards,
> ThuanTr
>
> -Original Message-
> From: Vu Minh Nguyen 
> Sent: Thursday, November 28, 2019 3:25 PM
> To: lennart.l...@ericsson.com; gary@dektech.com.au; 
> minh.c...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: [devel] [PATCH 5/5] log: add test cases of improving the log 
> resilience [#3116]
>
> Adding 08 new test cases into 02 suites:
> 1) Suite 20 with 07 test cases, including:
> - Test changing queue size & resilient timeout;
> - Test if a write async is dropped if its timeout setting is overdue,
> also verify if log server has kept the request in proper time.
> - Test if getting write callback right away if the cache is full.
> - Test if the cache is fully and correctly synced with standby.
>
> 2) Suite 21 with one test case:
> Test if LOG agent notifies all lost invocation to log client.
>
> As the suite 21 requires manual interaction, it is put into
> 'extended' tests. Only run with option '-e'.
> ---
>   src/log/Makefile.am   |   3 +-
>   src/log/apitest/logtest.c |   7 +
>   src/log/apitest/logtest.h |   7 +-
>   src/log/apitest/logutil.c |  14 +-
>   src/log/apitest/tet_log_runtime_cfgobj.c  |   2 +-
>   .../apitest/tet_saLogWriteLogAsync_cache.c| 648 ++
>   6 files changed, 667 insertions(+), 14 deletions(-)
>   create mode 100644 src/log/apitest/tet_saLogWriteLogAsync_cache.c
>
> diff --git a/src/log/Makefile.am b/src/log/Makefile.am
> index 3367ef4f6..3ec03c097 100644
> --- a/src/log/Makefile.am
> +++ b/src/log/Makefile.am
> @@ -224,7 +224,8 @@ bin_logtest_SOURCES = \
>   src/log/apitest/tet_log_longDN.c \
>   src/log/apitest/tet_Log_clm.c \
>   src/log/apitest/tet_cfg_destination.c \
> - src/log/apitest/tet_multiple_thread.c
> + src/log/apitest/tet_multiple_thread.c \
> + src/log/apitest/tet_saLogWriteLogAsync_cache.c
>   
>   bin_logtest_LDADD = \
>   lib/libapitest.la \
> diff --git a/src/log/apitest/logtest.c b/src/log/apitest/logtest.c
> index aabd1e578..149d27d93 100644
> --- a/src/log/apitest/logtest.c
> +++ b/src/log/apitest/logtest.c
> @@ -96,6 +96,7 @@ SaLogCallbacksT logCallbacks = {NULL, NULL, NULL};
>   SaInvocationT invocation = 0;
>   SaSelectionObjectT selectionObject;
>   char log_root_path[PATH_MAX];
> +SaLogAckFlagsT ack_flags = 0;
>   
>   void init_logrootpath(void)
>   {
> @@ -465,6 +466,9 @@ int main(int argc, char **argv)
>   add_suite_14();
>   add_suite_15();
>   add_suite_16();
> +#ifdef SIMULATE_NFS_UNRESPONSE
> + add_suite_21();
> +#endif
>   test_list();
>   exit(0);
>   case 'e':
> @@ -493,6 +497,9 @@ int main(int argc, char **argv)
>   add_suite_14();
>   add_suite_15();
>   add_suite_16();
> +#ifdef SIMULATE_NFS_UNRESPONSE
> + add_suite_21();
> +#endif
>   break;
>   case 'v':
>   if (silent_flg == true) {
> diff --git a/src/log/apitest/logtest.h b/src/log/apitest/logtest.h
> index 68f9df608..e044920

Re: [devel] [PATCH 5/5] log: add test cases of improving the log resilience [#3116]

2019-12-09 Thread Tran Thuan
Hi Vu,

Some comments from me:

- Remove xid in test code.
- Use AIS_EVALUATE() and SYS_EVALUATE() that you defined, but you still direct 
use test_validate()/rc_validate()
- polling_thread() is create thread and wait it done, why not directly call the 
function?
- In case of reboot SCs for headless, can we use system() command with 
following? Then no need manual interaction.
   ssh SC-1 "rdegetrole |grep ACTIVE && reboot"
   ssh SC-1 "rdegetrole |grep STANDBY && reboot"


Best Regards,
ThuanTr

-Original Message-
From: Vu Minh Nguyen  
Sent: Thursday, November 28, 2019 3:25 PM
To: lennart.l...@ericsson.com; gary@dektech.com.au; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 5/5] log: add test cases of improving the log 
resilience [#3116]

Adding 08 new test cases into 02 suites:
1) Suite 20 with 07 test cases, including:
- Test changing queue size & resilient timeout;
- Test if a write async is dropped if its timeout setting is overdue,
also verify if log server has kept the request in proper time.
- Test if getting write callback right away if the cache is full.
- Test if the cache is fully and correctly synced with standby.

2) Suite 21 with one test case:
Test if LOG agent notifies all lost invocation to log client.

As the suite 21 requires manual interaction, it is put into
'extended' tests. Only run with option '-e'.
---
 src/log/Makefile.am   |   3 +-
 src/log/apitest/logtest.c |   7 +
 src/log/apitest/logtest.h |   7 +-
 src/log/apitest/logutil.c |  14 +-
 src/log/apitest/tet_log_runtime_cfgobj.c  |   2 +-
 .../apitest/tet_saLogWriteLogAsync_cache.c| 648 ++
 6 files changed, 667 insertions(+), 14 deletions(-)
 create mode 100644 src/log/apitest/tet_saLogWriteLogAsync_cache.c

diff --git a/src/log/Makefile.am b/src/log/Makefile.am
index 3367ef4f6..3ec03c097 100644
--- a/src/log/Makefile.am
+++ b/src/log/Makefile.am
@@ -224,7 +224,8 @@ bin_logtest_SOURCES = \
src/log/apitest/tet_log_longDN.c \
src/log/apitest/tet_Log_clm.c \
src/log/apitest/tet_cfg_destination.c \
-   src/log/apitest/tet_multiple_thread.c
+   src/log/apitest/tet_multiple_thread.c \
+   src/log/apitest/tet_saLogWriteLogAsync_cache.c
 
 bin_logtest_LDADD = \
lib/libapitest.la \
diff --git a/src/log/apitest/logtest.c b/src/log/apitest/logtest.c
index aabd1e578..149d27d93 100644
--- a/src/log/apitest/logtest.c
+++ b/src/log/apitest/logtest.c
@@ -96,6 +96,7 @@ SaLogCallbacksT logCallbacks = {NULL, NULL, NULL};
 SaInvocationT invocation = 0;
 SaSelectionObjectT selectionObject;
 char log_root_path[PATH_MAX];
+SaLogAckFlagsT ack_flags = 0;
 
 void init_logrootpath(void)
 {
@@ -465,6 +466,9 @@ int main(int argc, char **argv)
add_suite_14();
add_suite_15();
add_suite_16();
+#ifdef SIMULATE_NFS_UNRESPONSE
+   add_suite_21();
+#endif
test_list();
exit(0);
case 'e':
@@ -493,6 +497,9 @@ int main(int argc, char **argv)
add_suite_14();
add_suite_15();
add_suite_16();
+#ifdef SIMULATE_NFS_UNRESPONSE
+   add_suite_21();
+#endif
break;
case 'v':
if (silent_flg == true) {
diff --git a/src/log/apitest/logtest.h b/src/log/apitest/logtest.h
index 68f9df608..e04492086 100644
--- a/src/log/apitest/logtest.h
+++ b/src/log/apitest/logtest.h
@@ -76,7 +76,7 @@ extern SaSelectionObjectT selectionObject;
 extern SaNameT logSvcUsrName;
 extern SaLogRecordT genLogRecord;
 extern char log_root_path[];
-
+extern SaLogAckFlagsT ack_flags;
 const static SaVersionT kLogVersion = {'A', 0x02, 0x03};
 const static SaVersionT kImmVersion = {'A', 02, 11};
 
@@ -105,6 +105,11 @@ void add_suite_12(void);
 void add_suite_14();
 void add_suite_15();
 void add_suite_16();
+
+#ifdef SIMULATE_NFS_UNRESPONSE
+void add_suite_21();
+#endif
+
 int get_active_sc(void);
 int get_attr_value(SaNameT *inObjName, char *inAttr, void *outValue);
 
diff --git a/src/log/apitest/logutil.c b/src/log/apitest/logutil.c
index 59d255515..d3e0c6297 100644
--- a/src/log/apitest/logutil.c
+++ b/src/log/apitest/logutil.c
@@ -52,15 +52,7 @@ void cond_check(void)
 int systemCall(const char *command)
 {
int rc = system(command);
-   if (rc == -1) {
-   fprintf(stderr, "system() retuned -1 Failed \n");
-   } else {
-   rc = WEXITSTATUS(rc);
-   if (rc != 0)
-   fprintf(stderr, " Failed in command: %s \n", command);
-   }
-
-   return rc;
+   return WEXITSTATUS(rc);
 }
 
 /*
@@ -144,8 +136,8 @@ logAppStreamOpen(const SaNameT *logStreamName,
  */
 SaAisErrorT logWriteAsync(const SaLogRecordT *logRecord)
 {
-   

Re: [devel] [PATCH 1/5] log: improve the resilience of log service [#3116]

2019-12-09 Thread Tran Thuan
Hi Vu,

Some comments from me:

- I think need remove xid name in code.
- CleanOverdueData() should loop to clean all overdue records stead of just one 
overdue record.
- In PeriodicCheck, don't need check is_iothread_ready() before Flush() because 
it is checked inside Flush()
- Flush() mean write all records, but actually just try to write one log 
record, I think should rename it.

Best Regards,
ThuanTr

-Original Message-
From: Vu Minh Nguyen  
Sent: Thursday, November 28, 2019 3:24 PM
To: lennart.l...@ericsson.com; gary@dektech.com.au; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/5] log: improve the resilience of log service [#3116]

In order to improve resilience of OpenSAF LOG service when underlying
file system is unresponsive, a queue is introduced to hold async
write request up to an configurable time that is around 15 - 30 seconds.

The readiness of the I/O thread will periodically check, and if it turns
to ready state, the front element will go first. Returns SA_AIS_ERR_TRY_AGAIN
to client if the element stays in the queue longer than the setting time.

The queue capacity and the resilient time are configurable via the attributes:
`logMaxPendingWriteRequests` and `logResilienceTimeout`.

In default, this feature is disabled to keep log server backward compatible.
---
 src/log/Makefile.am  |  21 +-
 src/log/config/logsv_classes.xml |  43 ++-
 src/log/logd/lgs_cache.cc| 469 +++
 src/log/logd/lgs_cache.h | 287 +++
 src/log/logd/lgs_config.cc   |  78 -
 src/log/logd/lgs_config.h|  10 +-
 src/log/logd/lgs_evt.cc  | 161 +++
 src/log/logd/lgs_evt.h   |  10 +
 src/log/logd/lgs_file.cc |   8 +-
 src/log/logd/lgs_filehdl.cc  |  58 ++--
 src/log/logd/lgs_imm.cc  |  40 ++-
 src/log/logd/lgs_main.cc |  24 +-
 src/log/logd/lgs_mbcsv.cc| 447 +++--
 src/log/logd/lgs_mbcsv.h |  19 +-
 src/log/logd/lgs_mbcsv_cache.cc  | 372 
 src/log/logd/lgs_mbcsv_cache.h   | 110 
 src/log/logd/lgs_mbcsv_v1.cc |   1 +
 src/log/logd/lgs_mbcsv_v2.cc |   2 +
 18 files changed, 1889 insertions(+), 271 deletions(-)
 create mode 100644 src/log/logd/lgs_cache.cc
 create mode 100644 src/log/logd/lgs_cache.h
 create mode 100644 src/log/logd/lgs_mbcsv_cache.cc
 create mode 100644 src/log/logd/lgs_mbcsv_cache.h

diff --git a/src/log/Makefile.am b/src/log/Makefile.am
index f63a4a053..3367ef4f6 100644
--- a/src/log/Makefile.am
+++ b/src/log/Makefile.am
@@ -95,7 +95,9 @@ noinst_HEADERS += \
src/log/logd/lgs_nildest.h \
src/log/logd/lgs_unixsock_dest.h \
src/log/logd/lgs_common.h \
-   src/log/logd/lgs_amf.h
+   src/log/logd/lgs_amf.h \
+   src/log/logd/lgs_cache.h \
+   src/log/logd/lgs_mbcsv_cache.h
 
 
 bin_PROGRAMS += bin/saflogger
@@ -123,6 +125,15 @@ bin_osaflogd_CPPFLAGS = \
-DSA_EXTENDED_NAME_SOURCE \
$(AM_CPPFLAGS)
 
+# Enable this flag to simulate the case that file system is unresponsive
+# during write log record. Mainly for testing the following enhancement:
+# log: improve the resilience of log service [#3116].
+# When enabled, log handle thread will be suspended 17 seconds every 02 write
+# requests and only take affect if the `logMaxPendingWriteRequests` is set
+# to an non-zero value.
+bin_osaflogd_CPPFLAGS += -DSIMULATE_NFS_UNRESPONSE
+
+
 bin_osaflogd_SOURCES = \
src/log/logd/lgs_amf.cc \
src/log/logd/lgs_clm.cc \
@@ -147,7 +158,9 @@ bin_osaflogd_SOURCES = \
src/log/logd/lgs_util.cc \
src/log/logd/lgs_dest.cc \
src/log/logd/lgs_nildest.cc \
-   src/log/logd/lgs_unixsock_dest.cc
+   src/log/logd/lgs_unixsock_dest.cc \
+   src/log/logd/lgs_cache.cc \
+   src/log/logd/lgs_mbcsv_cache.cc
 
 bin_osaflogd_LDADD = \
lib/libosaf_common.la \
@@ -183,6 +196,10 @@ bin_logtest_CPPFLAGS = \
-DSA_EXTENDED_NAME_SOURCE \
$(AM_CPPFLAGS)
 
+# Enable this flag to add test cases for following enhancement:
+# log: improve the resilience of log service [#3116].
+bin_logtest_CPPFLAGS += -DSIMULATE_NFS_UNRESPONSE
+
 bin_logtest_SOURCES = \
src/log/apitest/logtest.c \
src/log/apitest/logutil.c \
diff --git a/src/log/config/logsv_classes.xml b/src/log/config/logsv_classes.xml
index 9359823ff..084e8915d 100644
--- a/src/log/config/logsv_classes.xml
+++ b/src/log/config/logsv_classes.xml
@@ -195,7 +195,7 @@ to ensure that default global values in the implementation 
are also changed acco
SA_UINT32_T
SA_CONFIG
SA_WRITABLE
-1024
+   1024


logStreamFileFormat
@@ -208,42 +208,42 @@ to ensure that default global values in the 
implementation are also changed acco
 

Re: [devel] [PATCH 1/1] mds: Fix mds flow control keep all messages in queue [#3123]

2019-11-27 Thread Tran Thuan
Hi Vu,

Thanks. See my reply inline.

Best Regards,
ThuanTr

-Original Message-
From: Nguyen Minh Vu  
Sent: Thursday, November 28, 2019 10:36 AM
To: thuan.tran ; 'Minh Hon Chau' 
; 'thang . d . nguyen' 
; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: Fix mds flow control keep all messages in queue 
[#3123]

Hi Thuan,

Ack with comments inline.

Regards, Vu

On 11/27/19 6:33 PM, thuan.tran wrote:
> When overflow happens, mds with flow control enabled may keep
> all messages in queue if it fails to send a message when receiving
> Nack or ChunkAck since no more trigger come after that.
> MDS flow control should retry to send message in this scenario.
> ---
>   src/mds/mds_tipc_fctrl_portid.cc | 39 +++-
>   1 file changed, 23 insertions(+), 16 deletions(-)
>
> diff --git a/src/mds/mds_tipc_fctrl_portid.cc 
> b/src/mds/mds_tipc_fctrl_portid.cc
> index 316e1ba75..8081e8bd4 100644
> --- a/src/mds/mds_tipc_fctrl_portid.cc
> +++ b/src/mds/mds_tipc_fctrl_portid.cc
> @@ -17,6 +17,7 @@
>   
>   #include "mds/mds_tipc_fctrl_portid.h"
>   #include "base/ncssysf_def.h"
> +#include "base/osaf_time.h"
>   
>   #include "mds/mds_dt.h"
>   #include "mds/mds_log.h"
> @@ -149,23 +150,23 @@ void TipcPortId::FlushData() {
>   
>   uint32_t TipcPortId::Send(uint8_t* data, uint16_t length) {
> struct sockaddr_tipc server_addr;
> -  ssize_t send_len = 0;
> -  uint32_t rc = NCSCC_RC_SUCCESS;
> -
> memset(_addr, 0, sizeof(server_addr));
> server_addr.family = AF_TIPC;
> server_addr.addrtype = TIPC_ADDR_ID;
> server_addr.addr.id = id_;
> -  send_len = sendto(bsrsock_, data, length, 0,
> -(struct sockaddr *)_addr, sizeof(server_addr));
> -
> -  if (send_len == length) {
> -rc = NCSCC_RC_SUCCESS;
> -  } else {
> -m_MDS_LOG_ERR("FCTRL: sendto() failed, Error[%s]", strerror(errno));
> -rc = NCSCC_RC_FAILURE;
> +  int retry = 5;
> +  while (retry >= 0) {
> +ssize_t send_len = sendto(bsrsock_, data, length, 0,
> +  (struct sockaddr *)_addr, sizeof(server_addr));
> +
[Vu] Any case the sendto just sends a part of data? if so, the retry if 
any should not start from the beginning of data.
the below code shows what i meant:

ssize_t byte_sent = 0;
while (retry--) {
   ssize_t send_len = sendto(bsrsock_, data + byte_sent, length - byte_sent, 0,
  (struct sockaddr *)_addr, sizeof(server_addr));
if (send_lenn == -1) {
  // error handling here
  if (errno == EINTR) continue;
  // error, can't continue. should log something here?
  return NCSCC_RC_FAILURE; // or assert?
}

// number of bytes sent
byte_sent += send_data;
 if (byte_sent >= length) {
   return NCSCC_RC_SUCCESS;
 }
 
 // retry but do we need to sleep here?
 osaf_nanosleep();
}
  
[Thuan]: I think there is no case send a part of message.
Even if yes, the incomplete message is not accept by receiver.
Receiver don't have reassemble for unfragmented message.

> +if (send_len == length) {
> +  return NCSCC_RC_SUCCESS;
> +} else if (retry-- > 0) {
> +  osaf_nanosleep();
> +}
> }
> -  return rc;
> +  m_MDS_LOG_ERR("FCTRL: sendto() failed, Error[%s]", strerror(errno));
> +  return NCSCC_RC_FAILURE;
>   }
>   
>   uint32_t TipcPortId::Queue(const uint8_t* data, uint16_t length,
> @@ -440,13 +441,14 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, 
> uint16_t chksize) {
>   // try to send a few pending msg
>   DataMessage* msg = nullptr;
>   uint16_t send_msg_cnt = 0;
> -while (send_msg_cnt++ < chunk_size_) {
> +while (send_msg_cnt < chunk_size_) {
> // find the lowest sequence unsent yet
> msg = sndqueue_.FirstUnsent();
> if (msg == nullptr) {
>   break;
> } else {
> if (Send(msg->msg_data_, msg->header_.msg_len_) == 
> NCSCC_RC_SUCCESS) {
> +send_msg_cnt++;
>   msg->is_sent_ = true;
>   m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
>   "SndQData[fseq:%u, len:%u], "
> @@ -455,7 +457,9 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, uint16_t 
> chksize) {
>   msg->header_.fseq_, msg->header_.msg_len_,
>   sndwnd_.acked_.v(), sndwnd_.send_.v(), 
> sndwnd_.nacked_space_);
> } else {
> -break;
> +// If not retry, all messages are kept in queue
> +// and no more trigger to send messages
> +continue;
[Vu] If send is constantly failed, this loop has no way to exit?
[Thuan] Yes
> }
> }
>   }
> @@ -508,9 +512,12 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t 
> mfrag,
> DataMessage* msg = sndqueue_.Find(Seq16(fseq));
> if (msg != nullptr) {
>   // Resend the msg found
> -if (Send(msg->msg_data_, msg->header_.msg_len_) == NCSCC_RC_SUCCESS) {
> -  msg->is_sent_ = true;
> +while (Send(msg->msg_data_, 

Re: [devel] [PATCH 2/2] mds: Avoid message reallocation [#3089]

2019-11-26 Thread Tran Thuan
Hi Minh,

Why not free() inside mdtm_sendto() and mdtm_mcast_sendto()?
It will help reduce much code change.

Best Regards,
ThuanTr

-Original Message-
From: Minh Chau  
Sent: Tuesday, November 26, 2019 7:02 PM
To: thuan.t...@dektech.com.au; vu.m.ngu...@dektech.com.au; 
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
Subject: [PATCH 2/2] mds: Avoid message reallocation [#3089]

The patch avoids message reallocation if the message is in
retransmission queue
---
 src/mds/mds_dt_tipc.c| 42 +++-
 src/mds/mds_tipc_fctrl_intf.cc   |  6 --
 src/mds/mds_tipc_fctrl_intf.h|  4 ++--
 src/mds/mds_tipc_fctrl_msg.cc|  2 +-
 src/mds/mds_tipc_fctrl_portid.cc |  9 +++--
 5 files changed, 39 insertions(+), 24 deletions(-)

diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index 16cf11b..866c370 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -120,7 +120,7 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req);
 
 /* Tipc actual send, can be made as Macro even*/
 static uint32_t mdtm_sendto(uint8_t *buffer, uint16_t buff_len,
-   struct tipc_portid tipc_id);
+   struct tipc_portid tipc_id, uint8_t *is_queued);
 static uint32_t mdtm_mcast_sendto(void *buffer, size_t size,
  const MDTM_SEND_REQ *req);
 
@@ -2643,7 +2643,8 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
if (req->snd_type == MDS_SENDTYPE_ACK ||
req->snd_type == MDS_SENDTYPE_RACK) {
uint8_t len = mds_and_mdtm_hdr_len;
-   uint8_t buffer_ack[len];
+   uint8_t *buffer_ack = calloc(1, len);
+   uint8_t is_queued = 0;
 
/* Add mds_hdr */
if (mdtm_add_mds_hdr(buffer_ack, req)
@@ -2657,18 +2658,24 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
_seq_num) == NCSCC_RC_FAILURE){
m_MDS_LOG_ERR("FCTRL: Failed to send message"
" len :%d", len);
+   free(buffer_ack);
return NCSCC_RC_FAILURE;
}
/* Add frag_hdr */
if (mdtm_add_frag_hdr(buffer_ack, len, frag_seq_num,
0, fctrl_seq_num) != NCSCC_RC_SUCCESS) {
+   free(buffer_ack);
return NCSCC_RC_FAILURE;
}
 
m_MDS_LOG_DBG("MDTM:Sending message with Service"
" Seqno=%d, TO Dest_Tipc_id=<0x%08x:%u> ",
req->svc_seq_num, tipc_id.node, tipc_id.ref);
-   return mdtm_sendto(buffer_ack, len, tipc_id);
+   status = mdtm_sendto(buffer_ack, len, tipc_id,
+   _queued);
+   if (is_queued == 0)
+   free(buffer_ack);
+   return status;
}
 
if (req->msg.encoding == MDS_ENC_TYPE_FLAT) {
@@ -2730,6 +2737,7 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
} else {
uint8_t *p8;
uint8_t *body = NULL;
+   uint8_t is_queued = 0;
 
body = calloc(1, len +
mds_and_mdtm_hdr_len);
@@ -2824,7 +2832,7 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
return NCSCC_RC_FAILURE;
}
} else {
-   if (mdtm_sendto(body, len, tipc_id)
+   if (mdtm_sendto(body, len, tipc_id, 
_queued)
!= NCSCC_RC_SUCCESS) {
m_MDS_LOG_ERR("MDTM: Unable to"
" send the msg thru"
@@ -2835,7 +2843,8 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
}
}
m_MMGR_FREE_BUFR_LIST(usrbuf);
-   free(body);
+   if (is_queued == 0)
+   free(body);
return NCSCC_RC_SUCCESS;
}
} break;
@@ -2864,6 +2873,7 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
 
body = calloc(1, (req->msg.data.buff_info.len
+ mds_and_mdtm_hdr_len));
+   uint8_t is_queued = 0;
 

Re: [devel] [PATCH 1/1] mds: Fix mds flow control keep all messages in queue [#3123]

2019-11-26 Thread Tran Thuan
Hi Minh,

I think it's good if retry some times for normal Send().
Do you have any idea how many retries? Interval b/w tries?

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau  
Sent: Wednesday, November 27, 2019 10:30 AM
To: thuan.tran ; thang . d . nguyen 
; 'Nguyen Minh Vu' ; 
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: Fix mds flow control keep all messages in queue 
[#3123]

Hi Thuan,

The TipcPortId:Send is also called at a few other places, do you think 
it is good if we make a wrapper of TipcPortId::Send with a few retries 
on failures, says TipcPortId::TryToSend(), and call TryToSend() instead 
of Send()?

Thanks

Minh

On 27/11/19 1:26 pm, thuan.tran wrote:
> When overflow happens, mds with flow control enabled may keep
> all messages in queue if it fails to send a message when receiving
> Nack or ChunkAck since no more trigger come after that.
> MDS flow control should retry to send message in this scenario.
> ---
>   src/mds/mds_tipc_fctrl_portid.cc | 16 
>   1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/src/mds/mds_tipc_fctrl_portid.cc 
> b/src/mds/mds_tipc_fctrl_portid.cc
> index 724eb7b7b..e6e179669 100644
> --- a/src/mds/mds_tipc_fctrl_portid.cc
> +++ b/src/mds/mds_tipc_fctrl_portid.cc
> @@ -17,6 +17,7 @@
>   
>   #include "mds/mds_tipc_fctrl_portid.h"
>   #include "base/ncssysf_def.h"
> +#include "base/osaf_time.h"
>   
>   #include "mds/mds_dt.h"
>   #include "mds/mds_log.h"
> @@ -440,13 +441,14 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, 
> uint16_t chksize) {
>   // try to send a few pending msg
>   DataMessage* msg = nullptr;
>   uint16_t send_msg_cnt = 0;
> -while (send_msg_cnt++ < chunk_size_) {
> +while (send_msg_cnt < chunk_size_) {
> // find the lowest sequence unsent yet
> msg = sndqueue_.FirstUnsent();
> if (msg == nullptr) {
>   break;
> } else {
> if (Send(msg->msg_data_, msg->header_.msg_len_) == 
> NCSCC_RC_SUCCESS) {
> +send_msg_cnt++;
>   msg->is_sent_ = true;
>   m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
>   "SndQData[fseq:%u, len:%u], "
> @@ -455,7 +457,10 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, uint16_t 
> chksize) {
>   msg->header_.fseq_, msg->header_.msg_len_,
>   sndwnd_.acked_.v(), sndwnd_.send_.v(), 
> sndwnd_.nacked_space_);
> } else {
> -break;
> +// If not retry, all messages are kept in queue
> +// and no more trigger to send messages
> +osaf_nanosleep();
> +continue;
> }
> }
>   }
> @@ -508,9 +513,12 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t 
> mfrag,
> DataMessage* msg = sndqueue_.Find(Seq16(fseq));
> if (msg != nullptr) {
>   // Resend the msg found
> -if (Send(msg->msg_data_, msg->header_.msg_len_) == NCSCC_RC_SUCCESS) {
> -  msg->is_sent_ = true;
> +while (Send(msg->msg_data_, msg->header_.msg_len_) != NCSCC_RC_SUCCESS) {
> +  // If not retry, all messages are kept in queue
> +  // and no more trigger to send messages
> +  osaf_nanosleep();
>   }
> +msg->is_sent_ = true;
>   m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
>   "RsndData[mseq:%u, mfrag:%u, fseq:%u], "
>   "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] nid: fix unable to start UML cluster with tipc transport [#3122]

2019-11-25 Thread Tran Thuan
Hi Vu,

ACK from me.

Best Regards,
ThuanTr

-Original Message-
From: Nguyen Minh Vu  
Sent: Monday, November 25, 2019 3:04 PM
To: Tran Thuan ; thien.m.hu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] nid: fix unable to start UML cluster with tipc 
transport [#3122]

Hi Thuan,

Thanks. Here is a new version. Please help to review this one.

diff --git a/src/nid/configure_tipc.in b/src/nid/configure_tipc.in
index a63c97046..4573389d5 100644
--- a/src/nid/configure_tipc.in
+++ b/src/nid/configure_tipc.in
@@ -221,19 +221,17 @@ function tipc_duplicate_node_detect ()
  function tipc_configure ()
  {
  echo "Inserting TIPC mdoule..."
-
-if ! test -f "$TIPC_MODULE"  ; then
-  modprobe tipc
+
+# Prefer using modprobe to insmod as modprobe takes care of
+# loading all dependencies if any. If any dependent module
+# has not yet loaded, insmod will get failed.
+if modprobe tipc ; then
RM_TIPC_MODULE="modprobe -r tipc"
-else
-  insmod "$TIPC_MODULE"
+elif insmod "$TIPC_MODULE" ; then
RM_TIPC_MODULE="rmmod $TIPC_MODULE"
-fi
-
-ret_val=$?
-if [ $ret_val -ne 0 ] ; then
-logger -p user.err " TIPC Module could not be loaded "
-exit 1
+else
+  logger -p user.err " TIPC Module could not be loaded "
+  exit 1
  fi

  # max_nodes is not supported in TIPC 2.0

Regards, Vu

On 11/25/19 2:30 PM, Tran Thuan wrote:
> Hi Vu,
>
> Sorry, I have comments inline.
>
> Best Regards,
> ThuanTr
>
> -Original Message-
> From: Tran Thuan 
> Sent: Monday, November 25, 2019 2:27 PM
> To: 'Vu Minh Nguyen' ; 
> 'thien.m.hu...@dektech.com.au' 
> Cc: 'opensaf-devel@lists.sourceforge.net' 
> 
> Subject: RE: [PATCH 1/1] nid: fix unable to start UML cluster with tipc 
> transport [#3122]
>
> Hi Vu,
>
> ACK from me (code review).
>
> Best Regards,
> ThuanTr
>
> -Original Message-
> From: Vu Minh Nguyen 
> Sent: Monday, November 25, 2019 1:45 PM
> To: thuan.t...@dektech.com.au; thien.m.hu...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 
> 
> Subject: [PATCH 1/1] nid: fix unable to start UML cluster with tipc transport 
> [#3122]
>
> ---
>   src/nid/configure_tipc.in | 10 ++
>   1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/src/nid/configure_tipc.in b/src/nid/configure_tipc.in
> index a63c97046..43ddb06e1 100644
> --- a/src/nid/configure_tipc.in
> +++ b/src/nid/configure_tipc.in
> @@ -221,11 +221,13 @@ function tipc_duplicate_node_detect ()
>   function tipc_configure ()
>   {
>   echo "Inserting TIPC mdoule..."
> -
> -if ! test -f "$TIPC_MODULE"  ; then
> -  modprobe tipc
> +
> +# Prefer using modprobe to insmod as modprobe takes care of
> +# loading all dependencies if any. If any dependent module
> +# has not yet loaded, insmod will get failed.
> +if modprobe tipc ; then
> [Thuan] ret_val=$?
> RM_TIPC_MODULE="modprobe -r tipc"
> -else
> +else
> insmod "$TIPC_MODULE"
> [Thuan] ret_val=$?
> RM_TIPC_MODULE="rmmod $TIPC_MODULE"
>   fi
>  ret_val=$?
> [Thuan] Remove ret_val=$? here
>  if [ $ret_val -ne 0 ] ; then
>  logger -p user.err " TIPC Module could not be loaded "
>  exit 1
>  fi




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] nid: fix unable to start UML cluster with tipc transport [#3122]

2019-11-24 Thread Tran Thuan
Hi Vu,

Sorry, I have comments inline.

Best Regards,
ThuanTr

-Original Message-
From: Tran Thuan  
Sent: Monday, November 25, 2019 2:27 PM
To: 'Vu Minh Nguyen' ; 
'thien.m.hu...@dektech.com.au' 
Cc: 'opensaf-devel@lists.sourceforge.net' 
Subject: RE: [PATCH 1/1] nid: fix unable to start UML cluster with tipc 
transport [#3122]

Hi Vu,

ACK from me (code review).

Best Regards,
ThuanTr

-Original Message-
From: Vu Minh Nguyen  
Sent: Monday, November 25, 2019 1:45 PM
To: thuan.t...@dektech.com.au; thien.m.hu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: [PATCH 1/1] nid: fix unable to start UML cluster with tipc transport 
[#3122]

---
 src/nid/configure_tipc.in | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/nid/configure_tipc.in b/src/nid/configure_tipc.in
index a63c97046..43ddb06e1 100644
--- a/src/nid/configure_tipc.in
+++ b/src/nid/configure_tipc.in
@@ -221,11 +221,13 @@ function tipc_duplicate_node_detect ()
 function tipc_configure ()
 {
 echo "Inserting TIPC mdoule..."
-
-if ! test -f "$TIPC_MODULE"  ; then
-  modprobe tipc
+
+# Prefer using modprobe to insmod as modprobe takes care of
+# loading all dependencies if any. If any dependent module
+# has not yet loaded, insmod will get failed.
+if modprobe tipc ; then
[Thuan] ret_val=$?
   RM_TIPC_MODULE="modprobe -r tipc"
-else 
+else
   insmod "$TIPC_MODULE"
[Thuan] ret_val=$?
   RM_TIPC_MODULE="rmmod $TIPC_MODULE"
 fi
ret_val=$?
[Thuan] Remove ret_val=$? here
if [ $ret_val -ne 0 ] ; then
logger -p user.err " TIPC Module could not be loaded "
exit 1
fi
-- 
2.17.1




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] nid: fix unable to start UML cluster with tipc transport [#3122]

2019-11-24 Thread Tran Thuan
Hi Vu,

ACK from me (code review).

Best Regards,
ThuanTr

-Original Message-
From: Vu Minh Nguyen  
Sent: Monday, November 25, 2019 1:45 PM
To: thuan.t...@dektech.com.au; thien.m.hu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: [PATCH 1/1] nid: fix unable to start UML cluster with tipc transport 
[#3122]

---
 src/nid/configure_tipc.in | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/nid/configure_tipc.in b/src/nid/configure_tipc.in
index a63c97046..43ddb06e1 100644
--- a/src/nid/configure_tipc.in
+++ b/src/nid/configure_tipc.in
@@ -221,11 +221,13 @@ function tipc_duplicate_node_detect ()
 function tipc_configure ()
 {
 echo "Inserting TIPC mdoule..."
-
-if ! test -f "$TIPC_MODULE"  ; then
-  modprobe tipc
+
+# Prefer using modprobe to insmod as modprobe takes care of
+# loading all dependencies if any. If any dependent module
+# has not yet loaded, insmod will get failed.
+if modprobe tipc ; then
   RM_TIPC_MODULE="modprobe -r tipc"
-else 
+else
   insmod "$TIPC_MODULE"
   RM_TIPC_MODULE="rmmod $TIPC_MODULE"
 fi
-- 
2.17.1




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] mds: Avoid message re-allocation [#3089]

2019-11-24 Thread Tran Thuan
Hi Minh,

ACK from me.

Best Regards,
ThuanTr

-Original Message-
From: Minh Chau  
Sent: Monday, November 25, 2019 1:13 PM
To: thuan.t...@dektech.com.au; gary@dektech.com.au; 
vu.m.ngu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
Subject: [PATCH 1/1] mds: Avoid message re-allocation [#3089]

The patch avoids message reallocation if enable
MDS_TIPC_FCTRL_ENABLED
---
 src/mds/mds_dt_tipc.c| 27 ---
 src/mds/mds_tipc_fctrl_msg.cc|  2 +-
 src/mds/mds_tipc_fctrl_portid.cc |  9 +++--
 3 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index fdf0da7..aa8d5c2 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -2644,7 +2644,7 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
if (req->snd_type == MDS_SENDTYPE_ACK ||
req->snd_type == MDS_SENDTYPE_RACK) {
uint8_t len = sum_mds_hdr_plus_mdtm_hdr_plus_len;
-   uint8_t buffer_ack[len];
+   uint8_t* buffer_ack = calloc(1, len);
 
/* Add mds_hdr */
if (NCSCC_RC_SUCCESS !=
@@ -2667,7 +2667,11 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
m_MDS_LOG_DBG(
"MDTM:Sending message with Service Seqno=%d, TO 
Dest_Tipc_id=<0x%08x:%u> ",
req->svc_seq_num, tipc_id.node, tipc_id.ref);
-   return mdtm_sendto(buffer_ack, len, tipc_id);
+   status = mdtm_sendto(buffer_ack, len, tipc_id);
+   if (gl_mds_pro_ver != MDS_PROT_FCTRL) {
+   free(buffer_ack);
+   }
+   return status;
}
 
if (MDS_ENC_TYPE_FLAT == req->msg.encoding) {
@@ -2815,6 +2819,8 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
free(body);
return NCSCC_RC_FAILURE;
}
+   m_MMGR_FREE_BUFR_LIST(usrbuf);
+   free(body);
} else {
if (NCSCC_RC_SUCCESS !=
mdtm_sendto(body, len, tipc_id)) {
@@ -2824,9 +2830,12 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
free(body);
return NCSCC_RC_FAILURE;
}
+   if (gl_mds_pro_ver != MDS_PROT_FCTRL) {
+   m_MMGR_FREE_BUFR_LIST(usrbuf);
+   free(body);
+   }
}
-   m_MMGR_FREE_BUFR_LIST(usrbuf);
-   free(body);
+
return NCSCC_RC_SUCCESS;
}
} break;
@@ -2909,7 +2918,9 @@ uint32_t mds_mdtm_send_tipc(MDTM_SEND_REQ *req)
mds_free_direct_buff(
req->msg.data.buff_info.buff);
}
-   free(body);
+   if (gl_mds_pro_ver != MDS_PROT_FCTRL) {
+   free(body);
+   }
return NCSCC_RC_SUCCESS;
} break;
 
@@ -3059,21 +3070,23 @@ uint32_t mdtm_frag_and_send(MDTM_SEND_REQ *req, 
uint32_t seq_num,
get_svc_names(req->src_svc_id), 
req->src_svc_id,
get_svc_names(req->dest_svc_id), 
req->dest_svc_id);
ret = mdtm_mcast_sendto(body, len_buf, req);
+   free(body);
} else {
m_MDS_LOG_DBG(
"MDTM:Sending message with Service 
Seqno=%d, Fragment Seqnum=%d, frag_num=%d, TO
Dest_Tipc_id=<0x%08x:%u>",
req->svc_seq_num, seq_num, frag_val,
id.node, id.ref);
ret = mdtm_sendto(body, len_buf, id);
+   if (gl_mds_pro_ver != MDS_PROT_FCTRL) {
+   free(body);
+   }
}
if (ret != NCSCC_RC_SUCCESS) {
// Failed to send a fragmented msg, stop sending
m_MMGR_FREE_BUFR_LIST(usrbuf);
-   free(body);
break;
}
 

Re: [devel] [PATCH 1/1] mds: Reduce mds logging [#3120]

2019-11-24 Thread Tran Thuan
Hi Minh,

ACK from me.

Best Regards,
ThuanTr

-Original Message-
From: Minh Chau  
Sent: Monday, November 25, 2019 7:53 AM
To: thuan.t...@dektech.com.au; vu.m.ngu...@dektech.com.au; 
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
Subject: [PATCH 1/1] mds: Reduce mds logging [#3120]

The logging of broadcast/multicast is currently logged with
NOTIFY as mds does not support broadcast/multicast message,
so the logging would be helpful in some cases. However, the
mds.log may be located in nfs file system, and this logging
may cause high rate traffic towards nfs file system.

This patch moves the logging to DEBUG for broadcast/multicast
message, and for adding/removal mds service.
---
 src/mds/mds_tipc_fctrl_intf.cc   | 4 ++--
 src/mds/mds_tipc_fctrl_portid.cc | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index dd8d80d..0e3230a 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -390,7 +390,7 @@ uint32_t mds_tipc_fctrl_portid_up(struct tipc_portid id, 
uint32_t type) {
 id.node, id.ref, svc_id, portid->svc_cnt_);
   } else {
 portid->svc_cnt_++;
-m_MDS_LOG_NOTIFY("FCTRL: Add svc[node:%x, ref:%u svc_id:%u], svc_cnt:%u",
+m_MDS_LOG_DBG("FCTRL: Add svc[node:%x, ref:%u svc_id:%u], svc_cnt:%u",
 id.node, id.ref, svc_id, portid->svc_cnt_);
   }
 
@@ -410,7 +410,7 @@ uint32_t mds_tipc_fctrl_portid_down(struct tipc_portid id, 
uint32_t type) {
   TipcPortId *portid = portid_lookup(id);
   if (portid != nullptr) {
 portid->svc_cnt_--;
-m_MDS_LOG_NOTIFY("FCTRL: Remove svc[node:%x, ref:%u svc_id:%u], 
svc_cnt:%u",
+m_MDS_LOG_DBG("FCTRL: Remove svc[node:%x, ref:%u svc_id:%u], svc_cnt:%u",
 id.node, id.ref, svc_id, portid->svc_cnt_);
   }
   portid_map_mutex.unlock();
diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index 724eb7b..316e1ba 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -298,7 +298,7 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
   }
 }
 if (rcving_mbcast_ == true) {
-  m_MDS_LOG_NOTIFY("FCTRL: [me] <-- [node:%x, ref:%u], "
+  m_MDS_LOG_DBG("FCTRL: [me] <-- [node:%x, ref:%u], "
   "RcvData[mseq:%u, mfrag:%u, fseq:%u], "
   "rcvwnd[acked:%u, rcv:%u, nacked:%" PRIu64 "], "
   "Ignore bcast/mcast ",
-- 
2.7.4




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] imm: Fix coding issues identified by codechecker [#3115]

2019-11-18 Thread Tran Thuan
Hi Vu,

I used osaf_clock_gettime() since it's used some places already in current code 
file.
E.g: Line 626, Line 638. I think we keep use it to easy for code reading.
Or we have to change all osaf_clock_gettime() calls in current code file.

Best Regards,
ThuanTr

-Original Message-
From: Nguyen Minh Vu  
Sent: Tuesday, November 19, 2019 10:01 AM
To: thuan.tran ; 'Minh Hon Chau' 
; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] imm: Fix coding issues identified by codechecker 
[#3115]

Hi Thuan,

Ack with a minor comment.

Regards,

On 11/4/19 2:57 PM, thuan.tran wrote:
> ---
>   src/imm/agent/imma_db.cc   | 2 +-
>   src/imm/immnd/immnd_main.c | 1 +
>   2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/imm/agent/imma_db.cc b/src/imm/agent/imma_db.cc
> index 071edbe74..80637e55f 100644
> --- a/src/imm/agent/imma_db.cc
> +++ b/src/imm/agent/imma_db.cc
> @@ -621,7 +621,7 @@ int imma_oi_ccb_record_note_callback(IMMA_CLIENT_NODE 
> *cl_node,
> rs = 1;
>   }
> }
> -  if (callback) {
> +  if (tmp && callback) {
>   if (callback->type == IMMA_CALLBACK_OI_CCB_CREATE && 
> !(tmp->adminOwner)) {
> SaImmAttrValuesT_2 **attributes =
> (SaImmAttrValuesT_2 **)callback->attrValsForCreateUc;
> diff --git a/src/imm/immnd/immnd_main.c b/src/imm/immnd/immnd_main.c
> index 5280f0599..62c7b2478 100644
> --- a/src/imm/immnd/immnd_main.c
> +++ b/src/imm/immnd/immnd_main.c
> @@ -489,6 +489,7 @@ int main(int argc, char *argv[])
>   fds[FD_CLM_INIT].fd = immnd_cb->clm_init_sel_obj.rmv_obj;
>   fds[FD_CLM_INIT].events = POLLIN;
>   
> + osaf_clock_gettime(CLOCK_MONOTONIC, _time);
[Vu]  Is it better to use the one provided by base as below?
struct timespec start_time = base::ReadMonotonicClock()


>   while (1) {
>   /* Watch out for performance bug. Possibly change from
>  event-count to recalculated timer. */




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] log: Fix coding issues identified by codechecker [#3113]

2019-11-18 Thread Tran Thuan
Hi Vu,

Agree, please help update NULL to nullptr before push.
Thanks.

Best Regards,
ThuanTr

-Original Message-
From: Nguyen Minh Vu  
Sent: Tuesday, November 19, 2019 9:58 AM
To: thuan.tran ; 'Minh Hon Chau' 
; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] log: Fix coding issues identified by codechecker 
[#3113]

Hi Thuan,

Ack with a minor comment.

Regards, Vu

On 11/4/19 2:17 PM, thuan.tran wrote:
> ---
>   src/log/logd/lgs_mbcsv.cc | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/log/logd/lgs_mbcsv.cc b/src/log/logd/lgs_mbcsv.cc
> index cd3d70009..ebc659ea1 100644
> --- a/src/log/logd/lgs_mbcsv.cc
> +++ b/src/log/logd/lgs_mbcsv.cc
> @@ -1931,7 +1931,7 @@ static uint32_t ckpt_proc_log_write(lgs_cb_t *cb, void 
> *data) {
> /* If configured for split file system log records shall be written also 
> if
>  * we are standby.
>  */
> -  if (lgs_is_split_file_system()) {
> +  if (lgs_is_split_file_system() && (logRecord != NULL)) {
[Vu] Prefer using nullptr to NULL
>   size_t rec_len = strlen(logRecord);
>   stream->act_last_close_timestamp = c_file_close_time_stamp;
>   




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/3] mds: Distinguish protocol version of fragment [#3111]

2019-11-14 Thread Tran Thuan
Hi Minh,

 

Thanks for explanation.

It’s clear now. Then no more comment from me.

 

Best Regards,

ThuanTr

 

From: Minh Hon Chau  
Sent: Thursday, November 14, 2019 6:29 PM
To: Tran Thuan ; hans.nordeb...@ericsson.com; 
gary@dektech.com.au; vu.m.ngu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/3] mds: Distinguish protocol version of fragment [#3111]

 

Hi Thuan,

I add one comment inline for explanation.

Thanks

Minh

On 14/11/19 8:33 pm, Tran Thuan wrote:

Hi Minh,

 

I thought you will update check state of port id to know FCTRL or LEGACY.

Since if (msg_len_ - fseq_ - 2 == MDTM_FRAG_HDR_LEN) may be not LEGACY protocol.

[Minh] Yes, this case we can not tell whether it is FCTRL or LEGACY, thus the 
pro_ver_ remains UNDEFINED. In the mds_tipc_fctrl_rcv_data(), this UNDEFINED 
pro_ver_ fragment is forwarded to portid under the "if (header.IsFlowMessage() 
|| header.IsUndefinedMessage())". The portid will skip this fragment if the 
state is kDisabled. In short, the fragment is forwarded to portid to check 
internally to follow the data flow, instead of checking the portid state inside 
message decoding which we need to refer the portid in mds_tipc_fctrl_msg.cc .



Agree if (msg_len_ - fseq_ - 2 != MDTM_FRAG_HDR_LEN) 100% is FCTRL protocol.

 

Best Regards,

ThuanTr

 

From: Minh Hon Chau  <mailto:minh.c...@dektech.com.au> 
 
Sent: Thursday, November 14, 2019 4:28 PM
To: Tran Thuan  <mailto:thuan.t...@dektech.com.au> ; 
hans.nordeb...@ericsson.com <mailto:hans.nordeb...@ericsson.com> ; 
gary@dektech.com.au <mailto:gary@dektech.com.au> ; 
vu.m.ngu...@dektech.com.au <mailto:vu.m.ngu...@dektech.com.au> 
Cc: opensaf-devel@lists.sourceforge.net 
<mailto:opensaf-devel@lists.sourceforge.net> 
Subject: Re: [PATCH 1/3] mds: Distinguish protocol version of fragment [#3111]

 

Hi Thuan,

Are you happy with my reply?

Thanks

Minh

On 14/11/19 9:35 am, Minh Hon Chau wrote:

Hi Thuan,

Please see my reply inline.

Thanks

Minh

On 13/11/19 9:54 pm, Tran Thuan wrote:

Hi Minh,
 
See my comment inline.
 
Best Regards,
ThuanTr
 
-Original Message-
From: Minh Chau  <mailto:minh.c...@dektech.com.au>  
Sent: Friday, November 8, 2019 5:33 PM
To: hans.nordeb...@ericsson.com <mailto:hans.nordeb...@ericsson.com> ; 
gary@dektech.com.au <mailto:gary@dektech.com.au> ; 
vu.m.ngu...@dektech.com.au <mailto:vu.m.ngu...@dektech.com.au> ; 
thuan.t...@dektech.com.au <mailto:thuan.t...@dektech.com.au> 
Cc: opensaf-devel@lists.sourceforge.net 
<mailto:opensaf-devel@lists.sourceforge.net> ; Minh Chau  
<mailto:minh.c...@dektech.com.au> 
Subject: [PATCH 1/3] mds: Distinguish protocol version of fragment [#3111]
 
The legacy mds encodes the protocol version in either non fragment
message or the first fragment only. Hence, the subsequent fragment
after the first one is not able for mds to determine the protocol
version.
 
The patch maintains the encoding of lengthcheck as same as the legacy
mds version. Also, the subsequent fragments needs to consult the
stateful portid to determine the protocol version, so that the
fragment will be skipped if it is sent from legacy mds, or inspected
the sequence if it is sent from new mds.
---
 src/mds/mds_dt.h |   6 ++
 src/mds/mds_dt_tipc.c|  11 ++-
 src/mds/mds_tipc_fctrl_intf.cc   | 154 ++-
 src/mds/mds_tipc_fctrl_msg.cc|  86 +++---
 src/mds/mds_tipc_fctrl_msg.h |   5 ++
 src/mds/mds_tipc_fctrl_portid.cc |  23 ++
 src/mds/mds_tipc_fctrl_portid.h  |   1 +
 7 files changed, 193 insertions(+), 93 deletions(-)
 
diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h
index 64da600..007ff98 100644
--- a/src/mds/mds_dt.h
+++ b/src/mds/mds_dt.h
@@ -243,6 +243,12 @@ bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, NCSCONTEXT 
msg);
 #define MDS_PROT_VER_MASK 0xFC
 #define MDTM_PRI_MASK 0x3
 
+/* Unknown or undefined MDS protocol/version */
+#define MDS_PROT_UNDEFINED 0x00
+
+/* MDS protocol/version for non flow control (legacy) */
+#define MDS_PROT_LEGACY (MDS_PROT | MDS_VERSION)
+
 /* MDS protocol/version for flow control */
 #define MDS_PROT_FCTRL (0xB0 | MDS_VERSION)
 #define MDS_PROT_FCTRL_ID 0xFDAC13F5
diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index e085de7..fdf0da7 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -166,7 +166,7 @@ NCS_PATRICIA_TREE mdtm_reassembly_list;
 uint32_t mdtm_global_frag_num;
 
 const unsigned int MAX_RECV_THRESHOLD = 30;
-static uint8_t gl_mds_pro_ver = MDS_PROT | MDS_VERSION;
+static uint8_t gl_mds_pro_ver = MDS_PROT_LEGACY;
 static int gl_mds_fctrl_acksize = -1;
 static int gl_mds_fctrl_ackto = -1;
 
@@ -381,7 +381,7 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, uint32_t 
*mds_tipc_ref)
  "MDTM:TIPC Failed to unset 
MDS_TIPC_FCTRL_ACKSIZE");
 

Re: [devel] [PATCH 1/3] mds: Distinguish protocol version of fragment [#3111]

2019-11-14 Thread Tran Thuan
Hi Minh,

 

I thought you will update check state of port id to know FCTRL or LEGACY.

Since if (msg_len_ - fseq_ - 2 == MDTM_FRAG_HDR_LEN) may be not LEGACY protocol.

Agree if (msg_len_ - fseq_ - 2 != MDTM_FRAG_HDR_LEN) 100% is FCTRL protocol.

 

Best Regards,

ThuanTr

 

From: Minh Hon Chau  
Sent: Thursday, November 14, 2019 4:28 PM
To: Tran Thuan ; hans.nordeb...@ericsson.com; 
gary@dektech.com.au; vu.m.ngu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/3] mds: Distinguish protocol version of fragment [#3111]

 

Hi Thuan,

Are you happy with my reply?

Thanks

Minh

On 14/11/19 9:35 am, Minh Hon Chau wrote:

Hi Thuan,

Please see my reply inline.

Thanks

Minh

On 13/11/19 9:54 pm, Tran Thuan wrote:

Hi Minh,
 
See my comment inline.
 
Best Regards,
ThuanTr
 
-Original Message-
From: Minh Chau  <mailto:minh.c...@dektech.com.au>  
Sent: Friday, November 8, 2019 5:33 PM
To: hans.nordeb...@ericsson.com <mailto:hans.nordeb...@ericsson.com> ; 
gary@dektech.com.au <mailto:gary@dektech.com.au> ; 
vu.m.ngu...@dektech.com.au <mailto:vu.m.ngu...@dektech.com.au> ; 
thuan.t...@dektech.com.au <mailto:thuan.t...@dektech.com.au> 
Cc: opensaf-devel@lists.sourceforge.net 
<mailto:opensaf-devel@lists.sourceforge.net> ; Minh Chau  
<mailto:minh.c...@dektech.com.au> 
Subject: [PATCH 1/3] mds: Distinguish protocol version of fragment [#3111]
 
The legacy mds encodes the protocol version in either non fragment
message or the first fragment only. Hence, the subsequent fragment
after the first one is not able for mds to determine the protocol
version.
 
The patch maintains the encoding of lengthcheck as same as the legacy
mds version. Also, the subsequent fragments needs to consult the
stateful portid to determine the protocol version, so that the
fragment will be skipped if it is sent from legacy mds, or inspected
the sequence if it is sent from new mds.
---
 src/mds/mds_dt.h |   6 ++
 src/mds/mds_dt_tipc.c|  11 ++-
 src/mds/mds_tipc_fctrl_intf.cc   | 154 ++-
 src/mds/mds_tipc_fctrl_msg.cc|  86 +++---
 src/mds/mds_tipc_fctrl_msg.h |   5 ++
 src/mds/mds_tipc_fctrl_portid.cc |  23 ++
 src/mds/mds_tipc_fctrl_portid.h  |   1 +
 7 files changed, 193 insertions(+), 93 deletions(-)
 
diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h
index 64da600..007ff98 100644
--- a/src/mds/mds_dt.h
+++ b/src/mds/mds_dt.h
@@ -243,6 +243,12 @@ bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, NCSCONTEXT 
msg);
 #define MDS_PROT_VER_MASK 0xFC
 #define MDTM_PRI_MASK 0x3
 
+/* Unknown or undefined MDS protocol/version */
+#define MDS_PROT_UNDEFINED 0x00
+
+/* MDS protocol/version for non flow control (legacy) */
+#define MDS_PROT_LEGACY (MDS_PROT | MDS_VERSION)
+
 /* MDS protocol/version for flow control */
 #define MDS_PROT_FCTRL (0xB0 | MDS_VERSION)
 #define MDS_PROT_FCTRL_ID 0xFDAC13F5
diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index e085de7..fdf0da7 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -166,7 +166,7 @@ NCS_PATRICIA_TREE mdtm_reassembly_list;
 uint32_t mdtm_global_frag_num;
 
 const unsigned int MAX_RECV_THRESHOLD = 30;
-static uint8_t gl_mds_pro_ver = MDS_PROT | MDS_VERSION;
+static uint8_t gl_mds_pro_ver = MDS_PROT_LEGACY;
 static int gl_mds_fctrl_acksize = -1;
 static int gl_mds_fctrl_ackto = -1;
 
@@ -381,7 +381,7 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, uint32_t 
*mds_tipc_ref)
  "MDTM:TIPC Failed to unset 
MDS_TIPC_FCTRL_ACKSIZE");
   }
   } else {
-  gl_mds_pro_ver = MDS_PROT | MDS_VERSION;
+  gl_mds_pro_ver = MDS_PROT_LEGACY;
   syslog(LOG_ERR, "MDTM:TIPC Invalid value of"
  "MDS_TIPC_FCTRL_ENABLED");
   }
@@ -3125,7 +3125,12 @@ uint32_t mdtm_add_frag_hdr(uint8_t *buf_ptr, uint16_t 
len, uint32_t seq_num,
* hereafter, these 2 bytes will be used as sequence number in flow control
* (per tipc portid)
* */
-  ncs_encode_16bit(, fctrl_seq_num);
+  if (gl_mds_pro_ver == MDS_PROT_FCTRL) {
+  ncs_encode_16bit(, fctrl_seq_num);
+  } else {
+  ncs_encode_16bit(, len - MDTM_FRAG_HDR_LEN - 2);
+  }
+
 #endif
   return NCSCC_RC_SUCCESS;
 }
diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index c9073b2..3d92290 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -132,8 +132,16 @@ uint32_t process_flow_event(const Event& evt) {
   portid = new TipcPortId(evt.id_, data_sock_fd,
   chunk_ack_size, sock_buf_size);
   portid_map[TipcPortId::GetUniqueId(evt.id_)] = portid;
-  rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
-evt.fseq_, evt.svc_id_, evt.snd_type_, is_mcast_enabled);
+  if (evt.legacy_data_ == true) {
+// we create

Re: [devel] [PATCH 1/1] osaf: return a help message if no parameter is specified [#3118]

2019-11-13 Thread Tran Thuan
Hi Gary,

ACK from me.

Best Regards,
ThuanTr

-Original Message-
From: Gary Lee  
Sent: Wednesday, November 13, 2019 1:19 PM
To: minh.c...@dektech.com.au; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Gary Lee 
Subject: [PATCH 1/1] osaf: return a help message if no parameter is specified 
[#3118]

---
 src/osaf/consensus/plugins/tcp/tcp.plugin | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/osaf/consensus/plugins/tcp/tcp.plugin 
b/src/osaf/consensus/plugins/tcp/tcp.plugin
index 1b5ddf5..0be20fc 100755
--- a/src/osaf/consensus/plugins/tcp/tcp.plugin
+++ b/src/osaf/consensus/plugins/tcp/tcp.plugin
@@ -149,7 +149,12 @@ class ArbitratorPlugin(object):
 params = []
 if args:
 params.append(args)
-return getattr(self, command)(*params)
+if command:
+return getattr(self, command)(*params)
+else:
+ret = {'code': 0,
+   'output': parser.format_help()}
+return ret
 
 def get_node_name(self):
 node_file = open(self.node_name_file)
-- 
2.7.4




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 2/3] mds: Refactor logging [#3111]

2019-11-13 Thread Tran Thuan
Hi,

ACK from me.

Best Regards,
ThuanTr

-Original Message-
From: Minh Chau  
Sent: Friday, November 8, 2019 5:33 PM
To: hans.nordeb...@ericsson.com; gary@dektech.com.au; 
vu.m.ngu...@dektech.com.au; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
Subject: [PATCH 2/3] mds: Refactor logging [#3111]

Since adding TipcPortId:ChangeState(), the patch refactors
logging to shorten the code.
---
 src/mds/mds_tipc_fctrl_portid.cc | 71 
 1 file changed, 21 insertions(+), 50 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index 9b87c74..df53d4d 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -208,17 +208,13 @@ bool TipcPortId::ReceiveCapable(uint16_t sending_len) {
 if (state_ == State::kTxProb) {
   // Too many msgs are not acked by receiver while in txprob state
   // disable flow control
-  state_ = State::kDisabled;
-  m_MDS_LOG_ERR("FCTRL: me --> [node:%x, ref:%u], [nacked:%" PRIu64
-  ", len:%u, rcv_buf_size:%" PRIu64 "], Warning[kTxProb -> kDisabled]",
-  id_.node, id_.ref, sndwnd_.nacked_space_,
-  sending_len, rcv_buf_size_);
+  m_MDS_LOG_ERR("FCTRL: me --> [node:%x, ref:%u], "
+  "Warning[Too many nacked in kTxProb]",
+  id_.node, id_.ref);
+  ChangeState(State::kDisabled);
   return true;
 } else if (state_ == State::kEnabled) {
-  state_ = State::kRcvBuffOverflow;
-  m_MDS_LOG_NOTIFY("FCTRL: [node:%x, ref:%u] --> Overflow, %" PRIu64
-  ", %u, %" PRIu64, id_.node, id_.ref, sndwnd_.nacked_space_,
-  sending_len, rcv_buf_size_);
+  ChangeState(State::kRcvBuffOverflow);
 }
 return false;
   }
@@ -271,20 +267,18 @@ uint32_t TipcPortId::ReceiveData(uint32_t mseq, uint16_t 
mfrag,
   uint32_t rc = NCSCC_RC_SUCCESS;
   if (state_ == State::kDisabled) {
 m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
-"RcvData, TxProb[retries:%u, state:%u], "
-"Error[receive fseq:%u in invalid state]",
+"RcvData[mseq:%u, mfrag:%u, fseq:%u], "
+"rcvwnd[acked:%u, rcv:%u, nacked:%" PRIu64 "], "
+"Warning[Invalid state:%u]",
 id_.node, id_.ref,
-txprob_cnt_, (uint8_t)state_,
-fseq);
+mseq, mfrag, fseq,
+rcvwnd_.acked_.v(), rcvwnd_.rcv_.v(), rcvwnd_.nacked_space_,
+(uint8_t)state_);
 return rc;
   }
   // update state
   if (state_ == State::kTxProb || state_ == State::kStartup) {
-state_ = State::kEnabled;
-m_MDS_LOG_NOTIFY("FCTRL: [me] <-- [node:%x, ref:%u], "
-"RcvData, TxProb[retries:%u, state:%u]",
-id_.node, id_.ref,
-txprob_cnt_, (uint8_t)state_);
+ChangeState(State::kEnabled);
   }
   // if tipc multicast is enabled, receiver does not inspect sequence number
   // for both fragment/unfragment multicast/broadcast message
@@ -398,12 +392,7 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, uint16_t 
chksize) {
   }
   // update state
   if (state_ == State::kTxProb) {
-state_ = State::kEnabled;
-m_MDS_LOG_NOTIFY("FCTRL: [me] <-- [node:%x, ref:%u], "
-"RcvChkAck, "
-"TxProb[retries:%u, state:%u]",
-id_.node, id_.ref,
-txprob_cnt_, (uint8_t)state_);
+ChangeState(State::kEnabled);
   }
   // update sender sequence window
   if (sndwnd_.acked_ < Seq16(fseq)) {
@@ -474,9 +463,7 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, uint16_t 
chksize) {
 }
 // no more unsent message, back to kEnabled
 if (msg == nullptr && state_ == State::kRcvBuffOverflow) {
-  state_ = State::kEnabled;
-  m_MDS_LOG_NOTIFY("FCTRL: [node:%x, ref:%u] Overflow --> Enabled ",
-  id_.node, id_.ref);
+  ChangeState(State::kEnabled);
 }
   } else {
 m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
@@ -517,9 +504,7 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t mfrag,
 }
   }
   if (state_ != State::kRcvBuffOverflow) {
-state_ = State::kRcvBuffOverflow;
-m_MDS_LOG_NOTIFY("FCTRL: [node:%x, ref:%u] --> Overflow ",
-id_.node, id_.ref);
+ChangeState(State::kRcvBuffOverflow);
 sndqueue_.MarkUnsentFrom(Seq16(fseq));
   }
   DataMessage* msg = sndqueue_.Find(Seq16(fseq));
@@ -545,27 +530,15 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t 
mfrag,
 
 bool TipcPortId::ReceiveTmrTxProb(uint8_t max_txprob) {
   bool restart_txprob = false;
-  if (state_ == State::kDisabled ||
-  sndwnd_.acked_ > Seq16(1) ||
-  rcvwnd_.rcv_ > Seq16(1)) return restart_txprob;
+  if (state_ == State::kDisabled) return restart_txprob;
   if (state_ == State::kTxProb || state_ == State::kRcvBuffOverflow) {
 txprob_cnt_++;
 if (txprob_cnt_ >= max_txprob) {
-  state_ = State::kDisabled;
+  ChangeState(State::kDisabled);
   restart_txprob = false;
 } else {
   restart_txprob = true;
 }
-
-// at kDisabled state, 

Re: [devel] [PATCH 3/3] mds: Add backward compatibility mdstest for fragment [#3111]

2019-11-13 Thread Tran Thuan
Hi,

ACK from me.

Best Regards,
ThuanTr

-Original Message-
From: Minh Chau  
Sent: Friday, November 8, 2019 5:33 PM
To: hans.nordeb...@ericsson.com; gary@dektech.com.au; 
vu.m.ngu...@dektech.com.au; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
Subject: [PATCH 3/3] mds: Add backward compatibility mdstest for fragment 
[#3111]

---
 src/mds/apitest/mdstipc_api.c | 83 ---
 1 file changed, 78 insertions(+), 5 deletions(-)

diff --git a/src/mds/apitest/mdstipc_api.c b/src/mds/apitest/mdstipc_api.c
index 5c0e28a..651365e 100644
--- a/src/mds/apitest/mdstipc_api.c
+++ b/src/mds/apitest/mdstipc_api.c
@@ -13512,8 +13512,8 @@ void tet_mds_fctrl_compatibility_tp1(void)
uint32_t msg_num = 1000;
uint32_t msg_size = 500;
 
-   printf("\nTest Case 5: Sender enable MDS FCTRL but Receiver disable\n");
-   /**/
+   printf("\nTest Case 5: Sender enable MDS FCTRL, Receiver disable\n");
+   /*-*/
pid_t pid = fork();
if (pid == 0) {
/* child as sender */
@@ -13545,8 +13545,8 @@ void tet_mds_fctrl_compatibility_tp2(void)
uint32_t msg_num = 1000;
uint32_t msg_size = 500;
 
-   printf("\nTest Case 5: Sender diable MDS FCTRL but Receiver enable\n");
-   /**/
+   printf("\nTest Case 6: Sender disable MDS FCTRL, Receiver enable\n");
+   /*-*/
pid_t pid = fork();
if (pid == 0) {
/* child as sender */
@@ -13644,6 +13644,73 @@ void tet_mds_fctrl_with_sna_tp2(void)
test_validate(FAIL, 0);
 }
 
+
+void tet_mds_fctrl_compatibility_tp3(void)
+{
+   int FAIL = 1;
+   uint32_t msg_num = 5;
+   uint32_t msg_size = 13;
+
+   printf("\nTest Case 9: Sender enable MDS FCTRL, Receiver disable\n");
+   /*-*/
+   pid_t pid = fork();
+   if (pid == 0) {
+   /* child as sender */
+   setenv("MDS_TIPC_FCTRL_ENABLED", "1", 1);
+   mds_startup();
+   MDS_SVC_ID to_svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN};
+   MDS_SVC_ID svc_id = NCSMDS_SVC_ID_INTERNAL_MIN;
+   tet_sender(svc_id, msg_num, msg_size, 1, to_svcids);
+   mds_shutdown();
+   } else if (pid > 0) {
+   /* parent as receiver */
+   mds_startup();
+   MDS_SVC_ID fr_svcids[] = {NCSMDS_SVC_ID_INTERNAL_MIN};
+   MDS_SVC_ID svc_id = NCSMDS_SVC_ID_EXTERNAL_MIN;
+   FAIL = tet_receiver(svc_id, msg_num, msg_size, 1, fr_svcids);
+   printf("\nReceiver finish, kill Sender\n");
+   kill(pid, SIGKILL);
+   mds_shutdown();
+   } else {
+   printf("\nFAIL to fork()\n");
+   }
+
+   test_validate(FAIL, 0);
+}
+
+void tet_mds_fctrl_compatibility_tp4(void)
+{
+   int FAIL = 1;
+   uint32_t msg_num = 10;
+   uint32_t msg_size = 13;
+
+   printf("\nTest Case 10: Sender disable MDS FCTRL, Receiver enable\n");
+   /*--*/
+   pid_t pid = fork();
+   if (pid == 0) {
+   /* child as sender */
+   mds_startup();
+   MDS_SVC_ID to_svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN};
+   MDS_SVC_ID svc_id = NCSMDS_SVC_ID_INTERNAL_MIN;
+   tet_sender(svc_id, msg_num, msg_size, 1, to_svcids);
+   mds_shutdown();
+   } else if (pid > 0) {
+   /* parent as receiver */
+   setenv("MDS_TIPC_FCTRL_ENABLED", "1", 1);
+   mds_startup();
+   MDS_SVC_ID fr_svcids[] = {NCSMDS_SVC_ID_INTERNAL_MIN};
+   MDS_SVC_ID svc_id = NCSMDS_SVC_ID_EXTERNAL_MIN;
+   FAIL = tet_receiver(svc_id, msg_num, msg_size, 1, fr_svcids);
+   printf("\nReceiver finish, kill Sender\n");
+   kill(pid, SIGKILL);
+   mds_shutdown();
+   } else {
+   printf("\nFAIL to fork()\n");
+   }
+   test_validate(FAIL, 0);
+}
+
+
 void Print_return_status(uint32_t rs)
 {
switch (rs) {
@@ -14384,7 +14451,7 @@ __attribute__((constructor)) static void 
mdsTipcAPI_constructor(void)
"Sender enable MDS FCTRL but Receiver disable");
test_case_add(
27, tet_mds_fctrl_compatibility_tp2,
-   "Sender diable MDS FCTRL but Receiver enable");
+   "Sender disable MDS FCTRL but Receiver enable");
test_case_add(
27, tet_mds_fctrl_with_sna_tp1,
"Sender gradually sends more than 65535"
@@ -14395,4 +14462,10 @@ 

Re: [devel] [PATCH 1/3] mds: Distinguish protocol version of fragment [#3111]

2019-11-13 Thread Tran Thuan
Hi Minh,

See my comment inline.

Best Regards,
ThuanTr

-Original Message-
From: Minh Chau  
Sent: Friday, November 8, 2019 5:33 PM
To: hans.nordeb...@ericsson.com; gary@dektech.com.au; 
vu.m.ngu...@dektech.com.au; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
Subject: [PATCH 1/3] mds: Distinguish protocol version of fragment [#3111]

The legacy mds encodes the protocol version in either non fragment
message or the first fragment only. Hence, the subsequent fragment
after the first one is not able for mds to determine the protocol
version.

The patch maintains the encoding of lengthcheck as same as the legacy
mds version. Also, the subsequent fragments needs to consult the
stateful portid to determine the protocol version, so that the
fragment will be skipped if it is sent from legacy mds, or inspected
the sequence if it is sent from new mds.
---
 src/mds/mds_dt.h |   6 ++
 src/mds/mds_dt_tipc.c|  11 ++-
 src/mds/mds_tipc_fctrl_intf.cc   | 154 ++-
 src/mds/mds_tipc_fctrl_msg.cc|  86 +++---
 src/mds/mds_tipc_fctrl_msg.h |   5 ++
 src/mds/mds_tipc_fctrl_portid.cc |  23 ++
 src/mds/mds_tipc_fctrl_portid.h  |   1 +
 7 files changed, 193 insertions(+), 93 deletions(-)

diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h
index 64da600..007ff98 100644
--- a/src/mds/mds_dt.h
+++ b/src/mds/mds_dt.h
@@ -243,6 +243,12 @@ bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, NCSCONTEXT 
msg);
 #define MDS_PROT_VER_MASK 0xFC
 #define MDTM_PRI_MASK 0x3
 
+/* Unknown or undefined MDS protocol/version */
+#define MDS_PROT_UNDEFINED 0x00
+
+/* MDS protocol/version for non flow control (legacy) */
+#define MDS_PROT_LEGACY (MDS_PROT | MDS_VERSION)
+
 /* MDS protocol/version for flow control */
 #define MDS_PROT_FCTRL (0xB0 | MDS_VERSION)
 #define MDS_PROT_FCTRL_ID 0xFDAC13F5
diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index e085de7..fdf0da7 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -166,7 +166,7 @@ NCS_PATRICIA_TREE mdtm_reassembly_list;
 uint32_t mdtm_global_frag_num;
 
 const unsigned int MAX_RECV_THRESHOLD = 30;
-static uint8_t gl_mds_pro_ver = MDS_PROT | MDS_VERSION;
+static uint8_t gl_mds_pro_ver = MDS_PROT_LEGACY;
 static int gl_mds_fctrl_acksize = -1;
 static int gl_mds_fctrl_ackto = -1;
 
@@ -381,7 +381,7 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, uint32_t 
*mds_tipc_ref)
"MDTM:TIPC Failed to unset 
MDS_TIPC_FCTRL_ACKSIZE");
}
} else {
-   gl_mds_pro_ver = MDS_PROT | MDS_VERSION;
+   gl_mds_pro_ver = MDS_PROT_LEGACY;
syslog(LOG_ERR, "MDTM:TIPC Invalid value of"
"MDS_TIPC_FCTRL_ENABLED");
}
@@ -3125,7 +3125,12 @@ uint32_t mdtm_add_frag_hdr(uint8_t *buf_ptr, uint16_t 
len, uint32_t seq_num,
 * hereafter, these 2 bytes will be used as sequence number in flow 
control
 * (per tipc portid)
 * */
-   ncs_encode_16bit(, fctrl_seq_num);
+   if (gl_mds_pro_ver == MDS_PROT_FCTRL) {
+   ncs_encode_16bit(, fctrl_seq_num);
+   } else {
+   ncs_encode_16bit(, len - MDTM_FRAG_HDR_LEN - 2);
+   }
+
 #endif
return NCSCC_RC_SUCCESS;
 }
diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index c9073b2..3d92290 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -132,8 +132,16 @@ uint32_t process_flow_event(const Event& evt) {
   portid = new TipcPortId(evt.id_, data_sock_fd,
   chunk_ack_size, sock_buf_size);
   portid_map[TipcPortId::GetUniqueId(evt.id_)] = portid;
-  rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
-evt.fseq_, evt.svc_id_, evt.snd_type_, is_mcast_enabled);
+  if (evt.legacy_data_ == true) {
+// we create portid and set state kDisabled even though we know
+// this portid has no flow control. It is because the 2nd, 3rd fragment
+// could not reflect the protocol version, so need to keep this portid
+// remained stateful
+portid->ChangeState(TipcPortId::State::kDisabled);
+  } else {
+rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
+  evt.fseq_, evt.svc_id_, evt.snd_type_, is_mcast_enabled);
+  }
 } else if (evt.type_ == Event::Type::kEvtRcvIntro) {
   portid = new TipcPortId(evt.id_, data_sock_fd,
   chunk_ack_size, sock_buf_size);
@@ -146,8 +154,12 @@ uint32_t process_flow_event(const Event& evt) {
 }
   } else {
 if (evt.type_ == Event::Type::kEvtRcvData) {
-  rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
-  evt.fseq_, evt.svc_id_, evt.snd_type_, is_mcast_enabled);
+  if (evt.legacy_data_ == true) {
+portid->ChangeState(TipcPortId::State::kDisabled);
+  } else {
+rc = 

Re: [devel] [PATCH 1/1] mds: fix sender take very long time to send all messages [#3119]

2019-11-13 Thread Tran Thuan
Hi Minh,

Agree with your latest comment. I will send out V2.

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau  
Sent: Wednesday, November 13, 2019 11:15 AM
To: Tran Thuan ; 'Nguyen Minh Vu' 
; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all 
messages [#3119]

Hi Thuan,

Please see comment inline

Thanks

Minh

On 13/11/19 2:24 pm, Tran Thuan wrote:
> Hi Minh,
>
> Please check replies inline. Thanks.
>
> Best Regards,
> ThuanTr
>
> -Original Message-
> From: Minh Hon Chau 
> Sent: Wednesday, November 13, 2019 10:05 AM
> To: Tran Thuan ; 'Nguyen Minh Vu' 
> ; gary@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all 
> messages [#3119]
>
> Hi Thuan,
>
> Please see comment inline.
>
> Thanks
>
> Minh
>
> On 13/11/19 1:11 pm, Tran Thuan wrote:
>> Hi Minh,
>>
>> Thanks for comments, please check my replies inline.
>>
>> Best Regards,
>> ThuanTr
>>
>> -Original Message-
>> From: Minh Hon Chau 
>> Sent: Wednesday, November 13, 2019 7:47 AM
>> To: thuan.tran ; 'Nguyen Minh Vu' 
>> ; gary@dektech.com.au
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all 
>> messages [#3119]
>>
>> Hi Thuan,
>>
>> Some comments inline.
>>
>> Thanks
>>
>> Minh
>>
>> On 12/11/19 5:04 pm, thuan.tran wrote:
>>> When overload happens, sender will wait for chunkAck to continue
>>> sending more messages, it should send number of message equal chunkAck
>>> size of receiver. If not, receiver don't receive enough messages to send
>>> chunkAck and wait until timer timeout to send chunkAck to sender.
>>> This loop will make sender take very long time to sending all messages.
>>> ---
>>> src/mds/mds_tipc_fctrl_portid.cc | 30 +++---
>>> 1 file changed, 7 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/src/mds/mds_tipc_fctrl_portid.cc 
>>> b/src/mds/mds_tipc_fctrl_portid.cc
>>> index 3704baddb..1fff4c855 100644
>>> --- a/src/mds/mds_tipc_fctrl_portid.cc
>>> +++ b/src/mds/mds_tipc_fctrl_portid.cc
>>> @@ -190,6 +190,7 @@ uint32_t TipcPortId::Queue(const uint8_t* data, 
>>> uint16_t length,
>>> sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_);
>>>   } else {
>>> ++sndwnd_.send_;
>>> +sndwnd_.nacked_space_ += length;
>> [Minh] We haven't sent the msg out to wait for ack, thus nacked_space_
>> should not be increased
>>> m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
>>> "QueData[mseq:%u, mfrag:%u, fseq:%u, len:%u], "
>>> "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
>>> @@ -444,32 +445,29 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, 
>>> uint16_t chksize) {
>>> // the nacked_space_ of sender
>>> uint64_t acked_bytes = sndqueue_.Erase(Seq16(fseq) - (chksize-1),
>>> Seq16(fseq));
>>> +assert(sndwnd_.nacked_space_ >= acked_bytes);
>>> sndwnd_.nacked_space_ -= acked_bytes;
>>> 
>>> // try to send a few pending msg
>>> DataMessage* msg = nullptr;
>>> -uint64_t resend_bytes = 0;
>>> -while (resend_bytes < acked_bytes) {
>>> +uint16_t send_msg_cnt = 0;
>>> +while (send_msg_cnt++ < chunk_size_) {
>>>   // find the lowest sequence unsent yet
>>>   msg = sndqueue_.FirstUnsent();
>>>   if (msg == nullptr) {
>>> break;
>>>   } else {
>>> -if (resend_bytes < acked_bytes) {
>>>   if (Send(msg->msg_data_, msg->header_.msg_len_) == 
>>> NCSCC_RC_SUCCESS) {
>>> -sndwnd_.nacked_space_ += msg->header_.msg_len_;
>> [Minh] We now send it out and wait for acked, thus the nacked_space_ is
>> increased here, so any reason moving the nacked_space_ from Queue() to here?
>> [Thuan] Because the message could be in sndwnd (resend) either in sndqueue 
>> (send)
>> Cannot increase nacked_space with resend message.
>> I have tried another way to increase/decrease nacked_space dynamic
>> but it become complex with markUnsent() since sender may receiver Nack for 
>> same msg > 2 ti

Re: [devel] [PATCH 1/1] mds: fix sender take very long time to send all messages [#3119]

2019-11-12 Thread Tran Thuan
Hi Minh,

Please check replies inline. Thanks.

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau  
Sent: Wednesday, November 13, 2019 10:05 AM
To: Tran Thuan ; 'Nguyen Minh Vu' 
; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all 
messages [#3119]

Hi Thuan,

Please see comment inline.

Thanks

Minh

On 13/11/19 1:11 pm, Tran Thuan wrote:
> Hi Minh,
>
> Thanks for comments, please check my replies inline.
>
> Best Regards,
> ThuanTr
>
> -Original Message-
> From: Minh Hon Chau 
> Sent: Wednesday, November 13, 2019 7:47 AM
> To: thuan.tran ; 'Nguyen Minh Vu' 
> ; gary@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all 
> messages [#3119]
>
> Hi Thuan,
>
> Some comments inline.
>
> Thanks
>
> Minh
>
> On 12/11/19 5:04 pm, thuan.tran wrote:
>> When overload happens, sender will wait for chunkAck to continue
>> sending more messages, it should send number of message equal chunkAck
>> size of receiver. If not, receiver don't receive enough messages to send
>> chunkAck and wait until timer timeout to send chunkAck to sender.
>> This loop will make sender take very long time to sending all messages.
>> ---
>>src/mds/mds_tipc_fctrl_portid.cc | 30 +++---
>>1 file changed, 7 insertions(+), 23 deletions(-)
>>
>> diff --git a/src/mds/mds_tipc_fctrl_portid.cc 
>> b/src/mds/mds_tipc_fctrl_portid.cc
>> index 3704baddb..1fff4c855 100644
>> --- a/src/mds/mds_tipc_fctrl_portid.cc
>> +++ b/src/mds/mds_tipc_fctrl_portid.cc
>> @@ -190,6 +190,7 @@ uint32_t TipcPortId::Queue(const uint8_t* data, uint16_t 
>> length,
>>sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_);
>>  } else {
>>++sndwnd_.send_;
>> +sndwnd_.nacked_space_ += length;
> [Minh] We haven't sent the msg out to wait for ack, thus nacked_space_
> should not be increased
>>m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
>>"QueData[mseq:%u, mfrag:%u, fseq:%u, len:%u], "
>>"sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
>> @@ -444,32 +445,29 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, 
>> uint16_t chksize) {
>>// the nacked_space_ of sender
>>uint64_t acked_bytes = sndqueue_.Erase(Seq16(fseq) - (chksize-1),
>>Seq16(fseq));
>> +assert(sndwnd_.nacked_space_ >= acked_bytes);
>>sndwnd_.nacked_space_ -= acked_bytes;
>>
>>// try to send a few pending msg
>>DataMessage* msg = nullptr;
>> -uint64_t resend_bytes = 0;
>> -while (resend_bytes < acked_bytes) {
>> +uint16_t send_msg_cnt = 0;
>> +while (send_msg_cnt++ < chunk_size_) {
>>  // find the lowest sequence unsent yet
>>  msg = sndqueue_.FirstUnsent();
>>  if (msg == nullptr) {
>>break;
>>  } else {
>> -if (resend_bytes < acked_bytes) {
>>  if (Send(msg->msg_data_, msg->header_.msg_len_) == 
>> NCSCC_RC_SUCCESS) {
>> -sndwnd_.nacked_space_ += msg->header_.msg_len_;
> [Minh] We now send it out and wait for acked, thus the nacked_space_ is
> increased here, so any reason moving the nacked_space_ from Queue() to here?
> [Thuan] Because the message could be in sndwnd (resend) either in sndqueue 
> (send)
> Cannot increase nacked_space with resend message.
> I have tried another way to increase/decrease nacked_space dynamic
> but it become complex with markUnsent() since sender may receiver Nack for 
> same msg > 2 times.
[Minh] OK.
>>msg->is_sent_ = true;
>> -resend_bytes += msg->header_.msg_len_;
>>m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
>>"SndQData[fseq:%u, len:%u], "
>>"sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
>>id_.node, id_.ref,
>>msg->header_.fseq_, msg->header_.msg_len_,
>>sndwnd_.acked_.v(), sndwnd_.send_.v(), 
>> sndwnd_.nacked_space_);
>> +  } else {
>> +break;
>>  }
>> -} else {
>> -  break;
>> -}
>>  }
>>}
>>// no more unsent message, back to kEnabled
> [Minh] Agree, the new strategy to resend

Re: [devel] [PATCH 1/1] mds: fix sender take very long time to send all messages [#3119]

2019-11-12 Thread Tran Thuan
Hi Minh,

Thanks for comments, please check my replies inline.

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau  
Sent: Wednesday, November 13, 2019 7:47 AM
To: thuan.tran ; 'Nguyen Minh Vu' 
; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all 
messages [#3119]

Hi Thuan,

Some comments inline.

Thanks

Minh

On 12/11/19 5:04 pm, thuan.tran wrote:
> When overload happens, sender will wait for chunkAck to continue
> sending more messages, it should send number of message equal chunkAck
> size of receiver. If not, receiver don't receive enough messages to send
> chunkAck and wait until timer timeout to send chunkAck to sender.
> This loop will make sender take very long time to sending all messages.
> ---
>   src/mds/mds_tipc_fctrl_portid.cc | 30 +++---
>   1 file changed, 7 insertions(+), 23 deletions(-)
>
> diff --git a/src/mds/mds_tipc_fctrl_portid.cc 
> b/src/mds/mds_tipc_fctrl_portid.cc
> index 3704baddb..1fff4c855 100644
> --- a/src/mds/mds_tipc_fctrl_portid.cc
> +++ b/src/mds/mds_tipc_fctrl_portid.cc
> @@ -190,6 +190,7 @@ uint32_t TipcPortId::Queue(const uint8_t* data, uint16_t 
> length,
>   sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_);
> } else {
>   ++sndwnd_.send_;
> +sndwnd_.nacked_space_ += length;
[Minh] We haven't sent the msg out to wait for ack, thus nacked_space_ 
should not be increased
>   m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
>   "QueData[mseq:%u, mfrag:%u, fseq:%u, len:%u], "
>   "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
> @@ -444,32 +445,29 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, 
> uint16_t chksize) {
>   // the nacked_space_ of sender
>   uint64_t acked_bytes = sndqueue_.Erase(Seq16(fseq) - (chksize-1),
>   Seq16(fseq));
> +assert(sndwnd_.nacked_space_ >= acked_bytes);
>   sndwnd_.nacked_space_ -= acked_bytes;
>   
>   // try to send a few pending msg
>   DataMessage* msg = nullptr;
> -uint64_t resend_bytes = 0;
> -while (resend_bytes < acked_bytes) {
> +uint16_t send_msg_cnt = 0;
> +while (send_msg_cnt++ < chunk_size_) {
> // find the lowest sequence unsent yet
> msg = sndqueue_.FirstUnsent();
> if (msg == nullptr) {
>   break;
> } else {
> -if (resend_bytes < acked_bytes) {
> if (Send(msg->msg_data_, msg->header_.msg_len_) == 
> NCSCC_RC_SUCCESS) {
> -sndwnd_.nacked_space_ += msg->header_.msg_len_;
[Minh] We now send it out and wait for acked, thus the nacked_space_ is 
increased here, so any reason moving the nacked_space_ from Queue() to here?
[Thuan] Because the message could be in sndwnd (resend) either in sndqueue 
(send)
Cannot increase nacked_space with resend message.
I have tried another way to increase/decrease nacked_space dynamic
but it become complex with markUnsent() since sender may receiver Nack for same 
msg > 2 times.
>   msg->is_sent_ = true;
> -resend_bytes += msg->header_.msg_len_;
>   m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
>   "SndQData[fseq:%u, len:%u], "
>   "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
>   id_.node, id_.ref,
>   msg->header_.fseq_, msg->header_.msg_len_,
>   sndwnd_.acked_.v(), sndwnd_.send_.v(), 
> sndwnd_.nacked_space_);
> +  } else {
> +break;
> }
> -} else {
> -  break;
> -}
> }
>   }
>   // no more unsent message, back to kEnabled
[Minh] Agree, the new strategy to resend with chunk_size_ is better than 
with acked_bytes, it will increase transmission rate and not to depend 
on the timer
[Thuan] Thanks
> @@ -502,26 +500,12 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t 
> mfrag,
>   fseq);
>   return;
> }
> -  if (state_ == State::kRcvBuffOverflow) {
> -sndqueue_.MarkUnsentFrom(Seq16(fseq));
> -if (Seq16(fseq) - sndwnd_.acked_ > 1) {
> -  m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
> -  "RcvNack[fseq:%u], "
> -  "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "], "
> -  "queue[size:%" PRIu64 "], "
> -  "Warning[Ignore Nack]",
> -  id_.node, id_.ref, fseq,
> -  sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_,
> -  sndqueue_.Size());
> -  return;
> -}
> -  }
> if (state_ != State::kRcvBuffOverflow) {
>   state_ = State::kRcvBuffOverflow;
>   m_MDS_LOG_NOTIFY("FCTRL: [node:%x, ref:%u] --> Overflow ",
>   id_.node, id_.ref);
> -sndqueue_.MarkUnsentFrom(Seq16(fseq));
> }
> +  sndqueue_.MarkUnsentFrom(Seq16(fseq));
[Minh] I have a doubt with this change in ReceiveNack(), so every Nack 
will trigger a retransmission on the Nacked sequence even though we are 

Re: [devel] [PATCH 1/1] nid: Change the path of TIPC_MODULE [#3110]

2019-11-07 Thread Tran Thuan
Hi Thien,

OK, I see. Code Shellcheck don't allow use it. No more comment from me.

configure_tipc.in:25:6: note: Check exit code directly with e.g. 'if mycmd;', 
not indirectly with $?. [SC2181]

Best Regards,
ThuanTr

-Original Message-
From: Thien Minh Huynh  
Sent: Friday, November 8, 2019 9:28 AM
To: 'Tran Thuan' ; vu.m.ngu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] nid: Change the path of TIPC_MODULE [#3110]

Hi Thuan,

When i run with "if [ $? -ne 0 ]; then" , the test_shellcheck is failed. So
I use ret_val.

Best Regards,
ThienHuynh

-Original Message-----
From: Tran Thuan  
Sent: Friday, November 8, 2019 9:11 AM
To: 'thien.m.huynh' ;
vu.m.ngu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] nid: Change the path of TIPC_MODULE [#3110]

Hi Thien,

ACK with minor comment inline.

Best Regards,
ThuanTr

-Original Message-
From: thien.m.huynh 
Sent: Thursday, November 7, 2019 2:26 PM
To: thuan.t...@dektech.com.au; vu.m.ngu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; thien.m.huynh

Subject: [PATCH 1/1] nid: Change the path of TIPC_MODULE [#3110]

---
 src/nid/configure_tipc.in | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/nid/configure_tipc.in b/src/nid/configure_tipc.in index
73dd1cb..218de65 100644
--- a/src/nid/configure_tipc.in
+++ b/src/nid/configure_tipc.in
@@ -21,7 +21,11 @@
 . $pkgsysconfdir/nid.conf
 
 MANAGE_TIPC=${OPENSAF_MANAGE_TIPC:-"yes"}
-TIPC_MODULE=/lib/modules/$(uname -r)/kernel/net/tipc.ko
+TIPC_MODULE=$(modinfo tipc -n 2> /dev/null) ret_val=$?
[Thuan] Not really need ret_val, can use $? In below IF, e.g: if [ $? -ne 0
]; then
+if [ $ret_val -ne 0 ] ; then
+TIPC_MODULE=/lib/modules/$(uname -r)/kernel/net/tipc/tipc.ko fi
 CHASSIS_ID_FILE=$pkgsysconfdir/chassis_id
 SLOT_ID_FILE=$pkgsysconfdir/slot_id
 SUBSLOT_ID_FILE=$pkgsysconfdir/subslot_id
--
2.7.4






___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] nid: Change the path of TIPC_MODULE [#3110]

2019-11-07 Thread Tran Thuan
Hi Thien,

ACK with minor comment inline.

Best Regards,
ThuanTr

-Original Message-
From: thien.m.huynh  
Sent: Thursday, November 7, 2019 2:26 PM
To: thuan.t...@dektech.com.au; vu.m.ngu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; thien.m.huynh 

Subject: [PATCH 1/1] nid: Change the path of TIPC_MODULE [#3110]

---
 src/nid/configure_tipc.in | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/nid/configure_tipc.in b/src/nid/configure_tipc.in
index 73dd1cb..218de65 100644
--- a/src/nid/configure_tipc.in
+++ b/src/nid/configure_tipc.in
@@ -21,7 +21,11 @@
 . $pkgsysconfdir/nid.conf
 
 MANAGE_TIPC=${OPENSAF_MANAGE_TIPC:-"yes"}
-TIPC_MODULE=/lib/modules/$(uname -r)/kernel/net/tipc.ko
+TIPC_MODULE=$(modinfo tipc -n 2> /dev/null)
+ret_val=$?
[Thuan] Not really need ret_val, can use $? In below IF, e.g: if [ $? -ne 0 ]; 
then
+if [ $ret_val -ne 0 ] ; then
+TIPC_MODULE=/lib/modules/$(uname -r)/kernel/net/tipc/tipc.ko
+fi
 CHASSIS_ID_FILE=$pkgsysconfdir/chassis_id
 SLOT_ID_FILE=$pkgsysconfdir/slot_id
 SUBSLOT_ID_FILE=$pkgsysconfdir/subslot_id
-- 
2.7.4




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] smf: Improve SmfAdminStateHandler() Return false if Fail [#3104]

2019-10-24 Thread Tran Thuan
Hi Phuc,

ACK from me.

Best Regards,
ThuanTr

-Original Message-
From: phuc.h.chau  
Sent: Thursday, October 24, 2019 10:28 AM
To: lennart.l...@ericsson.com; thuan.t...@dektech.com.au;
thang.d.ngu...@dektech.com.au; minh.c...@dektech.com.au;
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; phuc.h.chau

Subject: [PATCH 1/1] smf: Improve SmfAdminStateHandler() Return false if
Fail [#3104]

SW upgrade testing, if found that if a service unit is in
INSTANTIATION_FAILED,
one_step upgrade will not continue with the software installation.
---
 src/smf/smfd/SmfAdminState.cc | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)
 mode change 100644 => 100755 src/smf/smfd/SmfAdminState.cc

diff --git a/src/smf/smfd/SmfAdminState.cc b/src/smf/smfd/SmfAdminState.cc
old mode 100644
new mode 100755
index 076f9f0..4730215
--- a/src/smf/smfd/SmfAdminState.cc
+++ b/src/smf/smfd/SmfAdminState.cc
@@ -858,7 +858,7 @@ bool SmfAdminStateHandler::deleteNodeGroup() {
 bool SmfAdminStateHandler::nodeGroupAdminOperation(
 SaAmfAdminOperationIdT adminOp) {
 
-  bool method_rc = true;
+  bool method_rc = false;
 
   TRACE_ENTER();
 
@@ -920,20 +920,17 @@ bool SmfAdminStateHandler::nodeGroupAdminOperation(
   } else if (imm_rc != SA_AIS_OK) {
 LOG_NO("%s adminOpTimeout Fail %s", __FUNCTION__,
saf_error(imm_rc));
 errno_ = imm_rc;
-method_rc = false;
   } else {
 LOG_NO("%s adminOpTimeout Fail %s", __FUNCTION__,
saf_error(oi_rc));
 errno_ = oi_rc;
-method_rc = false;
   }
 }
   } else {
 LOG_NO("%s: becomeAdminOwnerOf(%s) Fail", __FUNCTION__,
nodeGroupName_s.c_str());
-method_rc = false;
   }
 
-  if (method_rc == true) {
+  if (admset_rc == true) {
 TRACE("%s Admin operation is done. Release ownership if nodegroup",
   __FUNCTION__);
 if (releaseAdminOwnerOf(nodeGroupName_s) == false) {
-- 
2.7.4




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to send response message [#3102]

2019-10-23 Thread Tran Thuan
Hi Minh,

 

Please see my responses inline.

In general, I am trying to implement new version to reuse exist database.

 

Best Regards,

ThuanTr

 

From: Minh Hon Chau  
Sent: Wednesday, October 23, 2019 12:23 PM
To: Tran Thuan ; 'Nguyen Minh Vu' 
; hans.nordeb...@ericsson.com; 
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to 
send response message [#3102]

 

Hi Thuan,

Please see comment inline

Thanks

Minh

On 23/10/19 3:32 pm, Tran Thuan wrote:

Hi Minh,

 

Thanks for comments. See my response inline.

Btw, I am preparing to send out new patch, I think I found an issue in current 
patch.

 

Best Regards,

ThuanTr

 

-Original Message-
From: Minh Hon Chau  <mailto:minh.c...@dektech.com.au> 
 
Sent: Wednesday, October 23, 2019 5:52 AM
To: Tran Thuan  <mailto:thuan.t...@dektech.com.au> ; 
'Nguyen Minh Vu'  <mailto:vu.m.ngu...@dektech.com.au> 
; hans.nordeb...@ericsson.com 
<mailto:hans.nordeb...@ericsson.com> ; gary@dektech.com.au 
<mailto:gary@dektech.com.au> 
Cc: opensaf-devel@lists.sourceforge.net 
<mailto:opensaf-devel@lists.sourceforge.net> 
Subject: Re: [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to 
send response message [#3102]

 

Hi Thuan,

 

I wonder the patch would work in the same reproduced steps if the both 

adests have subscribed each other more than 2 services. The svc_cnt will 

be greater than 1 until it is the last service down event. I think 

that's why mds has the database @subtn_results, in which each item is an 

adest associated with a service id separately.

[Thuan] We can understand that adest still alive, then go with origin flow 
(wait 1.5s).

But can a process send SNDRSP then mds unregister? I think it cannot, because 
it’s in SNDRSP (blocking)

[M] Not unregister, it can be unsubscribe. Or do you mean a process can not 
send two SNDRSP at the same time on 2 different subscribed services? 
[T] I think process cannot do anything relate to MDS API because it is under 
send SNDRSP (blocking).

The scenario of this ticket happens for the process terminated/crash.

[M] Yes my doubt is in the context of this ticket - terminated/crash - you 
would get 2 service down event I think

[T] Yes, will get 2 service down events.

[M] I don't think adding a new database at the global scope for this specific 
case is a good idea, if we can reuse the existing database. Can you try to use 
MDS_SUBSCRIPTION_INFO, add a flag or something similar to indicate which case 
mds should wait 1.5 sec. It would isolate the bug fix in the scope of this 
problem.

[T] I don’t want to mess up current database and its logic. But anyway, I am 
trying to reuse that database. Please wait for next version.

 

The problem originally resides at the services code e.g ntf, imm... 

where the threads structure between mds receiving thread and main thread 

cause a race condition. Thus the service sends a message with a death 

adest which is removed from mds database, that confuses mds and hit 1.5 

secs wait time.

 

If I read the code correctly, the 1.5 wait time is for another case, it 

gives another chance to wait 1.5 when the subscription result is empty 

in @subtn_results because the service up has not arrived yet.

[Thuan] Yes, it will give a chance if adest not yet UP any.

My patch still keep that chance as origin code.

But I think I need reduce timeout for adest down timer, I am verifying this 
change.

 

mds  subscribe >

 

mds  sends message A x

 

mds wait 1.5 sec

 

mds <--- service up 

 

mds  sends message A >

 

So the 1.5 sec time is for early phase of waiting service up.

 

} else if (sub_info->tmr_flag != true) {

if ((MDS_SENDTYPE_RSP == req->i_sendtype) ||

(MDS_SENDTYPE_RRSP == req->i_sendtype)) {

time_wait = true;

m_MDS_LOG_INFO(

"MDS_SND_RCV:Disc queue: Subscr exists no timer 

running: Waiting for some time\n");

 

-> I think at this line, it means: the SUBSCRIPTION_TMR has timeout, and 

mds is sending RSP/RRSP which means mds should have received the 

*request* message (?), so mds wants to wait for another 1.5 second for 

service_up to create the subscription result in database.

 

The problem in this ticket hit 1.5 because the service down has already 

come and mds removed the subscription result item, now the ntf, imm... 

sends msg with a death adest, and mds now it thinks it is waiting for a 

service up to come as at the early phase, so it waits. Both two 

scenarios can be distinguished themselves to avoid to wait 1.5 secs for 

the latter case I think.

 

Thanks

 

Minh

 

On 22/10/19 9:50 pm, Tran Thuan wrote:

> Hi Vu,

> 

> Thanks for additional comments.

> I reply your concerns inline below.

> 

> Best Regards,

> ThuanTr

> 

Re: [devel] [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to send response message [#3102]

2019-10-22 Thread Tran Thuan
Hi Minh,

 

Thanks for comments. See my response inline.

Btw, I am preparing to send out new patch, I think I found an issue in current 
patch.

 

Best Regards,

ThuanTr

 

-Original Message-
From: Minh Hon Chau  
Sent: Wednesday, October 23, 2019 5:52 AM
To: Tran Thuan ; 'Nguyen Minh Vu' 
; hans.nordeb...@ericsson.com; 
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to 
send response message [#3102]

 

Hi Thuan,

 

I wonder the patch would work in the same reproduced steps if the both 

adests have subscribed each other more than 2 services. The svc_cnt will 

be greater than 1 until it is the last service down event. I think 

that's why mds has the database @subtn_results, in which each item is an 

adest associated with a service id separately.

[Thuan] We can understand that adest still alive, then go with origin flow 
(wait 1.5s).

But can a process send SNDRSP then mds unregister? I think it cannot, because 
it’s in SNDRSP (blocking)

The scenario of this ticket happens for the process terminated/crash.

 

The problem originally resides at the services code e.g ntf, imm... 

where the threads structure between mds receiving thread and main thread 

cause a race condition. Thus the service sends a message with a death 

adest which is removed from mds database, that confuses mds and hit 1.5 

secs wait time.

 

If I read the code correctly, the 1.5 wait time is for another case, it 

gives another chance to wait 1.5 when the subscription result is empty 

in @subtn_results because the service up has not arrived yet.

[Thuan] Yes, it will give a chance if adest not yet UP any.

My patch still keep that chance as origin code.

But I think I need reduce timeout for adest down timer, I am verifying this 
change.

 

mds  subscribe >

 

mds  sends message A x

 

mds wait 1.5 sec

 

mds <--- service up 

 

mds  sends message A >

 

So the 1.5 sec time is for early phase of waiting service up.

 

} else if (sub_info->tmr_flag != true) {

if ((MDS_SENDTYPE_RSP == req->i_sendtype) ||

(MDS_SENDTYPE_RRSP == req->i_sendtype)) {

time_wait = true;

m_MDS_LOG_INFO(

"MDS_SND_RCV:Disc queue: Subscr exists no timer 

running: Waiting for some time\n");

 

-> I think at this line, it means: the SUBSCRIPTION_TMR has timeout, and 

mds is sending RSP/RRSP which means mds should have received the 

*request* message (?), so mds wants to wait for another 1.5 second for 

service_up to create the subscription result in database.

 

The problem in this ticket hit 1.5 because the service down has already 

come and mds removed the subscription result item, now the ntf, imm... 

sends msg with a death adest, and mds now it thinks it is waiting for a 

service up to come as at the early phase, so it waits. Both two 

scenarios can be distinguished themselves to avoid to wait 1.5 secs for 

the latter case I think.

 

Thanks

 

Minh

 

On 22/10/19 9:50 pm, Tran Thuan wrote:

> Hi Vu,

> 

> Thanks for additional comments.

> I reply your concerns inline below.

> 

> Best Regards,

> ThuanTr

> 

> -Original Message-

> From: Nguyen Minh Vu < <mailto:vu.m.ngu...@dektech.com.au> 
> vu.m.ngu...@dektech.com.au>

> Sent: Tuesday, October 22, 2019 5:28 PM

> To: thuan.tran < <mailto:thuan.t...@dektech.com.au> 
> thuan.t...@dektech.com.au>; 'Minh Chau' < <mailto:minh.c...@dektech.com.au> 
> minh.c...@dektech.com.au>;  <mailto:hans.nordeb...@ericsson.com> 
> hans.nordeb...@ericsson.com;  <mailto:gary@dektech.com.au> 
> gary@dektech.com.au

> Cc:  <mailto:opensaf-devel@lists.sourceforge.net> 
> opensaf-devel@lists.sourceforge.net

> Subject: Re: [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to 
> send response message [#3102]

> 

> Hi Thuan,

> 

> I have additional comments below.

> 

> Regards, Vu

> 

> On 10/22/19 7:14 AM, thuan.tran wrote:

>> - When sending response message which Adest not exist (already down)

>> current MDS try to wait for 1.5 seconds before conclude no route to

>> send response message.

>> 

>> - There are 2 scenarios may have:

>> UP -> DOWN -> receive SNDRSP -> response timeout after 1.5s

>> UP -> receive SNDRSP -> DOWN -> response timeout after 1.5s

>> 

>> - With this change, MDS will not waste for 1.5s which can cause trouble

>> for higher layer services, e.g: ntf, imm, etc...

>> ---

>>src/mds/mds_c_api.c | 70 -

>>src/mds/mds_c_sndrcv.c  | 52 --

>>src/mds/mds_core.h  | 25 +--

>

Re: [devel] [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to send response message [#3102]

2019-10-22 Thread Tran Thuan
Hi Vu,

Thanks for additional comments.
I reply your concerns inline below.

Best Regards,
ThuanTr

-Original Message-
From: Nguyen Minh Vu  
Sent: Tuesday, October 22, 2019 5:28 PM
To: thuan.tran ; 'Minh Chau' 
; hans.nordeb...@ericsson.com; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to 
send response message [#3102]

Hi Thuan,

I have additional comments below.

Regards, Vu

On 10/22/19 7:14 AM, thuan.tran wrote:
> - When sending response message which Adest not exist (already down)
> current MDS try to wait for 1.5 seconds before conclude no route to
> send response message.
>
> - There are 2 scenarios may have:
> UP -> DOWN -> receive SNDRSP -> response timeout after 1.5s
> UP -> receive SNDRSP -> DOWN -> response timeout after 1.5s
>
> - With this change, MDS will not waste for 1.5s which can cause trouble
> for higher layer services, e.g: ntf, imm, etc...
> ---
>   src/mds/mds_c_api.c | 70 -
>   src/mds/mds_c_sndrcv.c  | 52 --
>   src/mds/mds_core.h  | 25 +--
>   src/mds/mds_dt2c.h  |  2 +-
>   src/mds/mds_dt_common.c | 22 -
>   5 files changed, 162 insertions(+), 9 deletions(-)
>
> diff --git a/src/mds/mds_c_api.c b/src/mds/mds_c_api.c
> index 132555b8e..5dd30c536 100644
> --- a/src/mds/mds_c_api.c
> +++ b/src/mds/mds_c_api.c
> @@ -1900,6 +1900,32 @@ uint32_t mds_mcm_svc_up(PW_ENV_ID pwe_id, MDS_SVC_ID 
> svc_id, V_DEST_RL role,
>   
>   /*** Validation for SCOPE **/
>   
> + if (adest != m_MDS_GET_ADEST) {
> + MDS_ADEST_INFO *adest_info =
> + (MDS_ADEST_INFO *)ncs_patricia_tree_get(
> + _mds_mcm_cb->adest_list,
> + (uint8_t *));
> + if (!adest_info) {
> + /* Add adest to adest list */
> + adest_info = m_MMGR_ALLOC_ADEST_INFO;
> + memset(adest_info, 0, sizeof(MDS_ADEST_INFO));
> + adest_info->adest = adest;
> + adest_info->node.key_info =
> + (uint8_t *)_info->adest;
> + adest_info->svc_cnt = 1;
> + adest_info->tmr_start = false;
> + ncs_patricia_tree_add(
> + _mds_mcm_cb->adest_list,
> + (NCS_PATRICIA_NODE *)adest_info);
> + m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
> + " svc_cnt=%u", adest, adest_info->svc_cnt);
> + } else {
> + adest_info->svc_cnt++;
> + m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
> + " svc_cnt=%u", adest, adest_info->svc_cnt);
> + }
> + }
> +
>   status = mds_get_subtn_res_tbl_by_adest(local_svc_hdl, svc_id, vdest_id,
>   adest, _subtn_result_info);
>   
> @@ -3571,6 +3597,24 @@ uint32_t mds_mcm_svc_down(PW_ENV_ID pwe_id, MDS_SVC_ID 
> svc_id, V_DEST_RL role,
>   /* Discard : Getting down before getting up */
>   } else { /* Entry exist in subscription result table */
>   
> + /* If adest exist and no sndrsp, start a timer */
> + MDS_ADEST_INFO *adest_info =
> + (MDS_ADEST_INFO *)ncs_patricia_tree_get(
> + _mds_mcm_cb->adest_list,
> + (uint8_t *));
> + if (adest_info) {
> + adest_info->svc_cnt--;
> + if (adest_info->svc_cnt == 0 &&
> + adest_info->sndrsp_cnt == 0) {
> + m_MDS_LOG_INFO("MCM:API: Adest=%" PRIu64
> + " down timer start", adest);
> + if (adest_info->tmr_start == false) {
> + adest_info->tmr_start = true;
> + start_mds_down_tmr(adest, svc_id);
> + }
> + }
> + }
> +
>   if (vdest_id == m_VDEST_ID_FOR_ADEST_ENTRY) {
>   status = mds_subtn_res_tbl_del(
>   local_svc_hdl, svc_id, vdest_id, adest,
> @@ -4956,6 +5000,17 @@ uint32_t mds_mcm_init(void)
>   return NCSCC_RC_FAILURE;
>   }
>   
> + /* ADEST TREE */
> + memset(_tree_params, 0, sizeof(NCS_PATRICIA_PARAMS));
> + pat_tree_params.key_size = sizeof(MDS_DEST);
> + if (NCSCC_RC_SUCCESS !=
> + ncs_patricia_tree_init(_mds_mcm_cb->adest_list,
> +_tree_params)) {
> + m_MDS_LOG_ERR(
> + "MCM:API: patricia_tree_init: adest :failure, L 
> mds_mcm_init");
> + return NCSCC_RC_FAILURE;
> + }
> +

Re: [devel] [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to send response message [#3102]

2019-10-22 Thread Tran Thuan
Hi Vu,

Thanks for comments. I reply my answer inline.

Best Regards,
ThuanTr

-Original Message-
From: Nguyen Minh Vu  
Sent: Tuesday, October 22, 2019 4:42 PM
To: thuan.tran ; 'Minh Chau' 
; hans.nordeb...@ericsson.com; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to 
send response message [#3102]

Hi Thuan,

I have few comments below.

Regards, Vu

On 10/22/19 7:14 AM, thuan.tran wrote:
> - When sending response message which Adest not exist (already down)
> current MDS try to wait for 1.5 seconds before conclude no route to
> send response message.
>
> - There are 2 scenarios may have:
> UP -> DOWN -> receive SNDRSP -> response timeout after 1.5s
> UP -> receive SNDRSP -> DOWN -> response timeout after 1.5s
>
> - With this change, MDS will not waste for 1.5s which can cause trouble
> for higher layer services, e.g: ntf, imm, etc...
> ---
>   src/mds/mds_c_api.c | 70 -
>   src/mds/mds_c_sndrcv.c  | 52 --
>   src/mds/mds_core.h  | 25 +--
>   src/mds/mds_dt2c.h  |  2 +-
>   src/mds/mds_dt_common.c | 22 -
>   5 files changed, 162 insertions(+), 9 deletions(-)
>
> diff --git a/src/mds/mds_c_api.c b/src/mds/mds_c_api.c
> index 132555b8e..5dd30c536 100644
> --- a/src/mds/mds_c_api.c
> +++ b/src/mds/mds_c_api.c
> @@ -1900,6 +1900,32 @@ uint32_t mds_mcm_svc_up(PW_ENV_ID pwe_id, MDS_SVC_ID 
> svc_id, V_DEST_RL role,
>   
>   /*** Validation for SCOPE **/
>   
> + if (adest != m_MDS_GET_ADEST) {
> + MDS_ADEST_INFO *adest_info =
> + (MDS_ADEST_INFO *)ncs_patricia_tree_get(
> + _mds_mcm_cb->adest_list,
> + (uint8_t *));
> + if (!adest_info) {
> + /* Add adest to adest list */
> + adest_info = m_MMGR_ALLOC_ADEST_INFO;
> + memset(adest_info, 0, sizeof(MDS_ADEST_INFO));
> + adest_info->adest = adest;
> + adest_info->node.key_info =
> + (uint8_t *)_info->adest;
> + adest_info->svc_cnt = 1;
> + adest_info->tmr_start = false;
> + ncs_patricia_tree_add(
> + _mds_mcm_cb->adest_list,
> + (NCS_PATRICIA_NODE *)adest_info);
> + m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
> + " svc_cnt=%u", adest, adest_info->svc_cnt);
> + } else {
> + adest_info->svc_cnt++;
> + m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
> + " svc_cnt=%u", adest, adest_info->svc_cnt);
> + }
> + }
[Vu] This new database, adest_list, is shared b/w internal osaf_mds 
thread and mds's user thread, hence should be protected by mutex.
[Thuan] It's protected by gl_mds_library_mutex
> +
>   status = mds_get_subtn_res_tbl_by_adest(local_svc_hdl, svc_id, vdest_id,
>   adest, _subtn_result_info);
>   
> @@ -3571,6 +3597,24 @@ uint32_t mds_mcm_svc_down(PW_ENV_ID pwe_id, MDS_SVC_ID 
> svc_id, V_DEST_RL role,
>   /* Discard : Getting down before getting up */
>   } else { /* Entry exist in subscription result table */
>   
> + /* If adest exist and no sndrsp, start a timer */
> + MDS_ADEST_INFO *adest_info =
> + (MDS_ADEST_INFO *)ncs_patricia_tree_get(
> + _mds_mcm_cb->adest_list,
> + (uint8_t *));
> + if (adest_info) {
> + adest_info->svc_cnt--;
> + if (adest_info->svc_cnt == 0 &&
> + adest_info->sndrsp_cnt == 0) {
> + m_MDS_LOG_INFO("MCM:API: Adest=%" PRIu64
> + " down timer start", adest);
> + if (adest_info->tmr_start == false) {
> + adest_info->tmr_start = true;
> + start_mds_down_tmr(adest, svc_id);
[Vu] Seems mds_down tmr is started twice. The first start is at the 
beginning of the function.
[Thuan] That timer is not same with this timer, that timer for current process. 
This timer for remote adest

[Vu] But what is the reason to start the mds down timer here? What if we 
won't start the tmr?
[Thuan] Timer for handle UP -> DOWN -> receive SNDRSP -> send RSP

 /* potentially clean up the process info database */
 MDS_PROCESS_INFO *info = mds_process_info_get(adest, svc_id);
 if (info != NULL) {
 /* Process info exist, delay the cleanup with a timeout to avoid
  * race */
 start_mds_down_tmr(adest, svc_id);
 }
> +   

Re: [devel] [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to send response message [#3102]

2019-10-22 Thread Tran Thuan
Hi Minh,

Thanks for your comments:

1-  The wait time 1.5 in this flow:
mds_mcm_process_disc_queue_checks_redundant()
mds_subtn_tbl_add_disc_queue()
if (true == time_wait) {
timeout_val = 150; /* This may need a tuning */
}

2- We should not touch to current adest db because it is used
everywhere with current logic, if we used it, we have to change logic
of many code places, which become more complex and risky.

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau  
Sent: Tuesday, October 22, 2019 11:31 AM
To: thuan.tran ; hans.nordeb...@ericsson.com; 
gary@dektech.com.au; vu.m.ngu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to 
send response message [#3102]

Hi Thuan,

1- Can you point out where is the mds code that waits for 1.5 seconds, 
is it hard coded within 1.5 secs?

2- Is existing db (mds_c_db.c) in mds not enough so we need to introduce 
adest_list? I think mds must have a memory of adest, perhaps in another 
implicit form, so mds can validate an adest, a svc_id associated with adest.

thanks

Minh

On 22/10/19 11:14 am, thuan.tran wrote:
> - When sending response message which Adest not exist (already down)
> current MDS try to wait for 1.5 seconds before conclude no route to
> send response message.
>
> - There are 2 scenarios may have:
> UP -> DOWN -> receive SNDRSP -> response timeout after 1.5s
> UP -> receive SNDRSP -> DOWN -> response timeout after 1.5s
>
> - With this change, MDS will not waste for 1.5s which can cause trouble
> for higher layer services, e.g: ntf, imm, etc...
> ---
>   src/mds/mds_c_api.c | 70 -
>   src/mds/mds_c_sndrcv.c  | 52 --
>   src/mds/mds_core.h  | 25 +--
>   src/mds/mds_dt2c.h  |  2 +-
>   src/mds/mds_dt_common.c | 22 -
>   5 files changed, 162 insertions(+), 9 deletions(-)
>
> diff --git a/src/mds/mds_c_api.c b/src/mds/mds_c_api.c
> index 132555b8e..5dd30c536 100644
> --- a/src/mds/mds_c_api.c
> +++ b/src/mds/mds_c_api.c
> @@ -1900,6 +1900,32 @@ uint32_t mds_mcm_svc_up(PW_ENV_ID pwe_id, MDS_SVC_ID 
> svc_id, V_DEST_RL role,
>   
>   /*** Validation for SCOPE **/
>   
> + if (adest != m_MDS_GET_ADEST) {
> + MDS_ADEST_INFO *adest_info =
> + (MDS_ADEST_INFO *)ncs_patricia_tree_get(
> + _mds_mcm_cb->adest_list,
> + (uint8_t *));
> + if (!adest_info) {
> + /* Add adest to adest list */
> + adest_info = m_MMGR_ALLOC_ADEST_INFO;
> + memset(adest_info, 0, sizeof(MDS_ADEST_INFO));
> + adest_info->adest = adest;
> + adest_info->node.key_info =
> + (uint8_t *)_info->adest;
> + adest_info->svc_cnt = 1;
> + adest_info->tmr_start = false;
> + ncs_patricia_tree_add(
> + _mds_mcm_cb->adest_list,
> + (NCS_PATRICIA_NODE *)adest_info);
> + m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
> + " svc_cnt=%u", adest, adest_info->svc_cnt);
> + } else {
> + adest_info->svc_cnt++;
> + m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
> + " svc_cnt=%u", adest, adest_info->svc_cnt);
> + }
> + }
> +
>   status = mds_get_subtn_res_tbl_by_adest(local_svc_hdl, svc_id, vdest_id,
>   adest, _subtn_result_info);
>   
> @@ -3571,6 +3597,24 @@ uint32_t mds_mcm_svc_down(PW_ENV_ID pwe_id, MDS_SVC_ID 
> svc_id, V_DEST_RL role,
>   /* Discard : Getting down before getting up */
>   } else { /* Entry exist in subscription result table */
>   
> + /* If adest exist and no sndrsp, start a timer */
> + MDS_ADEST_INFO *adest_info =
> + (MDS_ADEST_INFO *)ncs_patricia_tree_get(
> + _mds_mcm_cb->adest_list,
> + (uint8_t *));
> + if (adest_info) {
> + adest_info->svc_cnt--;
> + if (adest_info->svc_cnt == 0 &&
> + adest_info->sndrsp_cnt == 0) {
> + m_MDS_LOG_INFO("MCM:API: Adest=%" PRIu64
> + " down timer start", adest);
> + if (adest_info->tmr_start == false) {
> + adest_info->tmr_start = true;
> + start_mds_down_tmr(adest, svc_id);
> + }
> + }
> + }
> +
>   if (vdest_id == m_VDEST_ID_FOR_ADEST_ENTRY) {
> 

Re: [devel] [PATCH 1/1] smf: Improve SmfAdminStateHandler() Return false if Fail [#3104]

2019-10-21 Thread Tran Thuan
Hi Phuc,

Instead of add more code, you can change default value to reduce code.
And releaseAdminOwnerOf() should be called base on "admset_rc" check, I
think.

Example:

--- a/src/smf/smfd/SmfAdminState.cc
+++ b/src/smf/smfd/SmfAdminState.cc
@@ -858,7 +858,7 @@ bool SmfAdminStateHandler::deleteNodeGroup() {
 bool SmfAdminStateHandler::nodeGroupAdminOperation(
 SaAmfAdminOperationIdT adminOp) {
 
-  bool method_rc = true;
+  bool method_rc = false;
 
   TRACE_ENTER();
 
@@ -920,20 +920,17 @@ bool SmfAdminStateHandler::nodeGroupAdminOperation(
   } else if (imm_rc != SA_AIS_OK) {
 LOG_NO("%s adminOpTimeout Fail %s", __FUNCTION__,
saf_error(imm_rc));
 errno_ = imm_rc;
-method_rc = false;
   } else {
 LOG_NO("%s adminOpTimeout Fail %s", __FUNCTION__,
saf_error(oi_rc));
 errno_ = oi_rc;
-method_rc = false;
   }
 }
   } else {
 LOG_NO("%s: becomeAdminOwnerOf(%s) Fail", __FUNCTION__,
nodeGroupName_s.c_str());
-method_rc = false;
   }
 
-  if (method_rc == true) {
+  if (admset_rc == true) {
 TRACE("%s Admin operation is done. Release ownership if nodegroup",
   __FUNCTION__);
 if (releaseAdminOwnerOf(nodeGroupName_s) == false) {


Best Regards,
ThuanTr

-Original Message-
From: phuc.h.chau  
Sent: Monday, October 21, 2019 6:23 PM
To: thuan.t...@dektech.com.au; thang.d.ngu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; phuc.h.chau

Subject: [PATCH 1/1] smf: Improve SmfAdminStateHandler() Return false if
Fail [#3104]

SW upgrade testing, if found that if a service unit is in
INSTANTIATION_FAILED,
one_step upgrade will not continue with the software installation.
---
 src/smf/smfd/SmfAdminState.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/smf/smfd/SmfAdminState.cc b/src/smf/smfd/SmfAdminState.cc
index 076f9f0..a54f47f 100644
--- a/src/smf/smfd/SmfAdminState.cc
+++ b/src/smf/smfd/SmfAdminState.cc
@@ -900,11 +900,13 @@ bool SmfAdminStateHandler::nodeGroupAdminOperation(
 LOG_NO(
 "%s: saImmOmAdminOperationInvoke_2 Fail %s",
 __FUNCTION__, saf_error(imm_rc));
+method_rc = false;
 errno_ = imm_rc;
 break;
   } else if (oi_rc != SA_AIS_OK) {
 LOG_NO("%s: SaAmfAdminOperationId %d Fail %s", __FUNCTION__,
adminOp,
saf_error(oi_rc));
+method_rc = false;
 errno_ = oi_rc;
 break;
   } else {
-- 
2.7.4




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] mds: Disable mds flow control for mds broadcast/multicast message [#3101]

2019-10-20 Thread Tran Thuan
Hi Minh,

I suggest commit message as following 
mds: skip flow control for bcast/mcast message if tipc multicast
enabled.
because "disable mds flow control" cause misunderstood overwrite configure
MDS_TIPC_FCTRL_ENABLED

And another comment with [Thuan] inline. Thanks.

Best Regards,
ThuanTr

-Original Message-
From: Minh Chau  
Sent: Thursday, October 17, 2019 10:00 AM
To: hans.nordeb...@ericsson.com; thuan.t...@dektech.com.au;
gary@dektech.com.au; vu.m.ngu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau

Subject: [PATCH 1/1] mds: Disable mds flow control for mds
broadcast/multicast message [#3101]

The mds flow control has been disabled for broadcast/mulitcast unfragment
message if tipc multicast is enabled. This patch revisits and continues with
fragment messages.
---
 src/mds/mds_tipc_fctrl_intf.cc   | 47

 src/mds/mds_tipc_fctrl_msg.h | 11 +++---
 src/mds/mds_tipc_fctrl_portid.cc | 47
++--
 src/mds/mds_tipc_fctrl_portid.h  |  3 ++-
 4 files changed, 69 insertions(+), 39 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index b803bfe..fe3dbd5 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -133,7 +133,7 @@ uint32_t process_flow_event(const Event& evt) {
   kChunkAckSize, sock_buf_size);
   portid_map[TipcPortId::GetUniqueId(evt.id_)] = portid;
   rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
-evt.fseq_, evt.svc_id_);
+evt.fseq_, evt.svc_id_, evt.snd_type_, is_mcast_enabled);
 } else if (evt.type_ == Event::Type::kEvtRcvIntro) {
   portid = new TipcPortId(evt.id_, data_sock_fd,
   kChunkAckSize, sock_buf_size); @@ -147,7 +147,7 @@ uint32_t
process_flow_event(const Event& evt) {
   } else {
 if (evt.type_ == Event::Type::kEvtRcvData) {
   rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
-  evt.fseq_, evt.svc_id_);
+  evt.fseq_, evt.svc_id_, evt.snd_type_, is_mcast_enabled);
 }
 if (evt.type_ == Event::Type::kEvtRcvChunkAck) {
   portid->ReceiveChunkAck(evt.fseq_, evt.chunk_size_); @@ -430,6 +430,7
@@ uint32_t mds_tipc_fctrl_drop_data(uint8_t *buffer, uint16_t len,
 
   HeaderMessage header;
   header.Decode(buffer);
+  Event* pevt = nullptr;
   // if mds support flow control
   if ((header.pro_ver_ & MDS_PROT_VER_MASK) == MDS_PROT_FCTRL) {
 if (header.pro_id_ == MDS_PROT_FCTRL_ID) { @@ -438,9 +439,10 @@
uint32_t mds_tipc_fctrl_drop_data(uint8_t *buffer, uint16_t len,
 ChunkAck ack;
 ack.Decode(buffer);
 // send to the event thread
-if (m_NCS_IPC_SEND(_events,
-new Event(Event::Type::kEvtSendChunkAck, id, ack.svc_id_,
-header.mseq_, header.mfrag_, ack.acked_fseq_,
ack.chunk_size_),
+pevt = new Event(Event::Type::kEvtSendChunkAck, id, ack.svc_id_,
+header.mseq_, header.mfrag_, ack.acked_fseq_);
+pevt->chunk_size_ = ack.chunk_size_;
+if (m_NCS_IPC_SEND(_events, pevt,
 NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
   m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events,
Error[%s]",
   strerror(errno));
@@ -453,9 +455,9 @@ uint32_t mds_tipc_fctrl_drop_data(uint8_t *buffer,
uint16_t len,
   DataMessage data;
   data.Decode(buffer);
   // send to the event thread
-  if (m_NCS_IPC_SEND(_events,
-  new Event(Event::Type::kEvtDropData, id, data.svc_id_,
-  header.mseq_, header.mfrag_, header.fseq_),
+  pevt = new Event(Event::Type::kEvtDropData, id, data.svc_id_,
+  header.mseq_, header.mfrag_, header.fseq_);
+  if (m_NCS_IPC_SEND(_events, pevt,
   NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
 m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events, Error[%s]",
 strerror(errno));
@@ -474,6 +476,7 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer,
uint16_t len,
 
   HeaderMessage header;
   header.Decode(buffer);
+  Event* pevt = nullptr;
   // if mds support flow control
   if ((header.pro_ver_ & MDS_PROT_VER_MASK) == MDS_PROT_FCTRL) {
 if (header.pro_id_ == MDS_PROT_FCTRL_ID) { @@ -482,9 +485,10 @@
uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, uint16_t len,
 ChunkAck ack;
 ack.Decode(buffer);
 // send to the event thread
-if (m_NCS_IPC_SEND(_events,
-new Event(Event::Type::kEvtRcvChunkAck, id, ack.svc_id_,
-header.mseq_, header.mfrag_, ack.acked_fseq_,
ack.chunk_size_),
+pevt = new Event(Event::Type::kEvtRcvChunkAck, id, ack.svc_id_,
+header.mseq_, header.mfrag_, ack.acked_fseq_);
+pevt->chunk_size_ = ack.chunk_size_;
+if (m_NCS_IPC_SEND(_events, pevt,
 NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
   m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events,
Error[%s]",
   

Re: [devel] [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to send response message [#3102]

2019-10-17 Thread Tran Thuan
Hi,

There is a issue with this change, I need consider more.
Please ignore this review for now.

Best Regards,
ThuanTr

-Original Message-
From: thuan.tran  
Sent: Thursday, October 17, 2019 4:56 PM
To: 'Minh Chau' ; hans.nordeb...@ericsson.com;
gary@dektech.com.au; vu.m.ngu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; thuan.tran

Subject: [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to
send response message [#3102]

- When sending response message which Adest not exist (already down) current
MDS try to waiting in 1.5 seconds before conclude no route to send response
message.
- With this change, MDS will not waste for 1.5s which can cause trouble for
higher layer services, e.g: ntf, imm, etc...
---
 src/mds/mds_c_sndrcv.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/mds/mds_c_sndrcv.c b/src/mds/mds_c_sndrcv.c index
7850ac714..f0c60a8b7 100644
--- a/src/mds/mds_c_sndrcv.c
+++ b/src/mds/mds_c_sndrcv.c
@@ -2713,17 +2713,17 @@ static uint32_t
mds_mcm_process_disc_queue_checks_redundant(
"MDS_SND_RCV: Subscription made but no pointer
available\n");
return NCSCC_RC_FAILURE;
}
-   } else if (sub_info->tmr_flag != true) {
-   if ((MDS_SENDTYPE_RSP == req->i_sendtype) ||
-   (MDS_SENDTYPE_RRSP == req->i_sendtype)) {
+   } else if (sub_info->tmr_flag == true) {
+   if (((MDS_SENDTYPE_RSP == req->i_sendtype) ||
+   (MDS_SENDTYPE_RRSP == req->i_sendtype))) {
time_wait = true;
m_MDS_LOG_INFO(
-   "MDS_SND_RCV:Disc queue red: Subscr exists no
timer running: Waiting for some time\n");
-   } else {
-   m_MDS_LOG_INFO(
-   "MDS_SND_RCV: Subscription exists but Timer has
expired\n");
-   return NCSCC_RC_FAILURE;
+   "MDS_SND_RCV:Disc queue red: Subscr exists timer
running: 
+Waiting for some time\n");
}
+   } else {
+   m_MDS_LOG_INFO(
+   "MDS_SND_RCV: Subscription exists but Timer has
expired\n");
+   return NCSCC_RC_FAILURE;
}
 
/* Add this message to the DISC Queue, One function call */
--
2.17.1




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] mds: Do not check upper limit of window size [#3100]

2019-10-16 Thread Tran Thuan
Hi Minh,

ACK from me.

Best Regards,
ThuanTr

-Original Message-
From: Minh Chau  
Sent: Wednesday, October 16, 2019 6:53 PM
To: thuan.t...@dektech.com.au; hans.nordeb...@ericsson.com;
gary@dektech.com.au; vu.m.ngu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau

Subject: [PATCH 1/1] mds: Do not check upper limit of window size [#3100]

According to RFC1982: "Addition of a value outside the range
[0 .. (2^(SERIAL_BITS - 1) - 1)] is undefined.". Mds uses 16 bits for mds
flow control, thus the maximum allowed range of window size is 2^15 - 1 =
32767.
The 'mdstest 27 8' has randomly hit this limitation with the counter errors
that is detected in mds as belog logging:

FCTRL: [me] <-- [node:1001001, ref:2784751213], RcvChkAck[fseq:31067,
chunk:3], sndwnd[acked:31064, send:63850, nacked:1901634],
queue[size:32785], Error[msg disordered]

The fseq should always be less then sndwnd_.send_, hence mds should check
the sender being capable of sending more message only if D = sndwnd_.send_ -
sndwnd_.acked_ < 2^15 - 1 = 32767 If a burst of message is sent, D could be
> 32767, mds in this case should notify the sender try to send again later;
which however could leads to a backward compatibility. For now mds weaken
the windown size verification, only logs a warning and let the transmission
continue.
---
 src/mds/mds_tipc_fctrl_portid.cc | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/src/mds/mds_tipc_fctrl_portid.cc
b/src/mds/mds_tipc_fctrl_portid.cc
index a9fa7d3..6eae7d4 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -378,7 +378,28 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq,
uint16_t chksize) {
 txprob_cnt_, (uint8_t)state_);
   }
   // update sender sequence window
-  if (sndwnd_.acked_ < Seq16(fseq) && Seq16(fseq) < sndwnd_.send_) {
+  if (sndwnd_.acked_ < Seq16(fseq)) {
+// The fseq_ should always be less then sndwnd_.send_, hence
+// mds should check the sender being capable of sending more
+// message only if D = sndwnd_.send_ - sndwnd_.acked_ < 2^15 - 1 =
32767
+// If a burst of message is sent, D could be > 32767
+// mds in this case should notify the sender try to send again
+// later; which however could leads to a backward compatibility
+// For now mds logs a warning and let the transmission continue
+// (mds could be changed to return try again if it is not a backward
+// compatibility problem to a specific client.
+if (Seq16(fseq) >= sndwnd_.send_) {
+  m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
+  "RcvChkAck[fseq:%u, chunk:%u], "
+  "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "], "
+  "queue[size:%" PRIu64 "], "
+  "Warning[ack sequence out of window]",
+  id_.node, id_.ref,
+  fseq, chksize,
+  sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_,
+  sndqueue_.Size());
+}
+
 m_MDS_LOG_DBG("FCTRL: [me] <-- [node:%x, ref:%u], "
 "RcvChkAck[fseq:%u, chunk:%u], "
 "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "], "
--
2.7.4




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] mds: Add Intro message [#3090]

2019-10-14 Thread Tran Thuan
Hi bro.Minh,

ACK from me.

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau  
Sent: Tuesday, October 15, 2019 8:54 AM
To: hans.nordeb...@ericsson.com; vu.m.ngu...@dektech.com.au; 
gary@dektech.com.au; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: Add Intro message [#3090]

Hi,

The counters reset will be removed in ReceiveIntro().

Thanks

Minh


On 15/10/19 12:50 pm, Minh Chau wrote:
> mds relies on data message sent from the peer to determine whether the 
> MDS_TIPC_FCTRL_ENABLED is set. The data message may not be sent right 
> after TIPC_PUBLISHED event, which can cause the tx probation timer 
> timeout.
>
> This patch add Intro message, which is sent right after the 
> TIPC_PUBLISHED to help mds determine the flow control supported at the 
> peer earlier.
> ---
>   src/mds/mds_main.c   |  2 +-
>   src/mds/mds_tipc_fctrl_intf.cc   | 27 ++
>   src/mds/mds_tipc_fctrl_msg.cc| 11 +
>   src/mds/mds_tipc_fctrl_msg.h | 18 +++
>   src/mds/mds_tipc_fctrl_portid.cc | 49 
> ++--
>   src/mds/mds_tipc_fctrl_portid.h  |  2 ++
>   6 files changed, 96 insertions(+), 13 deletions(-)
>
> diff --git a/src/mds/mds_main.c b/src/mds/mds_main.c index 
> 8c9b1f1..c7d2f7b 100644
> --- a/src/mds/mds_main.c
> +++ b/src/mds/mds_main.c
> @@ -408,7 +408,7 @@ uint32_t mds_lib_req(NCS_LIB_REQ_INFO *req)
>   if (tipc_mcast_enabled != false)
>   tipc_mcast_enabled = true;
>   
> - m_MDS_LOG_DBG(
> + m_MDS_LOG_NOTIFY(
>   "MDS: TIPC_MCAST_ENABLED: %d  Set argument 
> \n",
>   tipc_mcast_enabled);
>   }
> diff --git a/src/mds/mds_tipc_fctrl_intf.cc 
> b/src/mds/mds_tipc_fctrl_intf.cc index 6271890..b803bfe 100644
> --- a/src/mds/mds_tipc_fctrl_intf.cc
> +++ b/src/mds/mds_tipc_fctrl_intf.cc
> @@ -39,6 +39,7 @@ using mds::DataMessage;
>   using mds::ChunkAck;
>   using mds::HeaderMessage;
>   using mds::Nack;
> +using mds::Intro;
>   
>   namespace {
>   // flow control enabled/disabled
> @@ -124,12 +125,20 @@ uint32_t process_flow_event(const Event& evt) {
> uint32_t rc = NCSCC_RC_SUCCESS;
> TipcPortId *portid = portid_lookup(evt.id_);
> if (portid == nullptr) {
> +// the null portid normally should not happen; however because the
> +// tipc_cb.Dsock and tipc_cb.BSRsock are separated; the data message
> +// sent from BSRsock may come before reception of TIPC_PUBLISHED
>   if (evt.type_ == Event::Type::kEvtRcvData) {
> portid = new TipcPortId(evt.id_, data_sock_fd,
> kChunkAckSize, sock_buf_size);
> portid_map[TipcPortId::GetUniqueId(evt.id_)] = portid;
> rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
>   evt.fseq_, evt.svc_id_);
> +} else if (evt.type_ == Event::Type::kEvtRcvIntro) {
> +  portid = new TipcPortId(evt.id_, data_sock_fd,
> +  kChunkAckSize, sock_buf_size);
> +  portid_map[TipcPortId::GetUniqueId(evt.id_)] = portid;
> +  portid->ReceiveIntro();
>   } else {
> m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
> "RcvEvt[evt:%d], Error[PortId not found]", @@ -151,6 
> +160,9 @@ uint32_t process_flow_event(const Event& evt) {
> portid->ReceiveNack(evt.mseq_, evt.mfrag_,
> evt.fseq_);
>   }
> +if (evt.type_ == Event::Type::kEvtRcvIntro) {
> +  portid->ReceiveIntro();
> +}
> }
> return rc;
>   }
> @@ -489,6 +501,16 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, 
> uint16_t len,
> m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events, 
> Error[%s]",
> strerror(errno));
>   }
> +  } else if (header.msg_type_ == Intro::kIntroMsgType) {
> +// no need to decode intro message
> +// the decoding intro message type is done in header decoding
> +// send to the event thread
> +if (m_NCS_IPC_SEND(_events,
> +new Event(Event::Type::kEvtRcvIntro, id, 0, 0, 0, 0),
> +NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
> +  m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events, Error[%s]",
> +  strerror(errno));
> +}
> } else {
>   m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
>   "[msg_type:%u], Error[not supported message type]", @@ 
> -516,6 +538,11 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, uint16_t 
> len,
> portid_map_mutex.unlock();
> return rc;
>   }
> +  } else {
> +m_MDS_LOG_DBG("FCTRL: [me] <-- [node:%x, ref:%u], "
> +"Receive non-flow-control data message, "
> +"header.pro_ver:%u",
> +id.node, id.ref, header.pro_ver_);
> }
> return NCSCC_RC_SUCCESS;
>   }
> diff --git 

Re: [devel] [PATCH 1/1] mds: Add Reset message [#3090]

2019-10-14 Thread Tran Thuan
Hi bro.Minh,

Thanks, no more comment from me.

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau  
Sent: Monday, October 14, 2019 7:28 PM
To: Tran Thuan ; hans.nordeb...@ericsson.com; 
gary@dektech.com.au; vu.m.ngu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: Add Reset message [#3090]

Hi Thuan,

I can rename it as "Intro" message, then the rcvwnd counter shall be removed.

This new message can not replace the tx prob timer. This new message is to 
speed up the determinatin of flow control at the peer side rather than mds data 
message. It is needed for the flow control sender 'talk' 
with the non-flow-control receiver who will not send any ack back.

THanks,

Minh

On 14/10/19 7:06 pm, Tran Thuan wrote:
> Hi bro.Minh,
>
> Thanks for explanation.
> I think the "reset" message should be rename to "introduce" message.
> Another question: with this fix, will tx probation timer become redundant or 
> still useful in somehow?
>
> Best Regards,
> ThuanTr
>
> -Original Message-
> From: Minh Hon Chau 
> Sent: Monday, October 14, 2019 1:01 PM
> To: Tran Thuan ; 
> hans.nordeb...@ericsson.com; gary@dektech.com.au; 
> vu.m.ngu...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1/1] mds: Add Reset message [#3090]
>
> Hi Thuan,
>
> If the chunkack is configured to send after a few data messages, then the 
> sender is not getting any chunkack for the first message from receiver until 
> chunkack timeout (which is also configurable to be a bit larger value). Then, 
> the probation timer would be timeout at sender.
>
> The rcvwnd.acked_ will be fixed.
>
> Thanks
>
> Minh
>
> On 14/10/19 4:39 pm, Tran Thuan wrote:
>> Hi bro.Minh,
>>
>> - In my understanding, tx probation timer only start when sender send 
>> first message.
>> Then sender relies on chunkAck to know receiver support MDS FCTRL or 
>> timeout as not support.
>> But from what you describe, sender got tx probation timer timeout 
>> before sending first message?
>> Or after sending first message but sender cannot get any chunkAck somehow?
>> I am confused this point. Could you help explain?
>>
>> - About the code, mistake set '0' twice for .acked_ in
>> TipcPortId::ReceiveReset()
>>
>> Best Regards,
>> ThuanTr
>>
>> -Original Message-
>> From: Minh Chau 
>> Sent: Friday, October 11, 2019 10:52 AM
>> To: hans.nordeb...@ericsson.com; gary@dektech.com.au; 
>> vu.m.ngu...@dektech.com.au; thuan.t...@dektech.com.au
>> Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
>> 
>> Subject: [PATCH 1/1] mds: Add Reset message [#3090]
>>
>> mds relies on data message sent from the peer to determine whether 
>> the MDS_TIPC_FCTRL_ENABLED is set. The data message may not be sent 
>> right after TIPC_PUBLISHED event, which can cause the tx probation timer 
>> timeout.
>>
>> This patch add Reset message, which is sent right after the 
>> TIPC_PUBLISHED to help mds determine the flow control supported at the peer 
>> earlier.
>> ---
>>src/mds/mds_main.c   |  2 +-
>>src/mds/mds_tipc_fctrl_intf.cc   | 27 ++
>>src/mds/mds_tipc_fctrl_msg.cc| 11 +
>>src/mds/mds_tipc_fctrl_msg.h | 18 +++
>>src/mds/mds_tipc_fctrl_portid.cc | 49
>> ++--
>>src/mds/mds_tipc_fctrl_portid.h  |  2 ++
>>6 files changed, 96 insertions(+), 13 deletions(-)
>>
>> diff --git a/src/mds/mds_main.c b/src/mds/mds_main.c index 
>> 8c9b1f1..c7d2f7b
>> 100644
>> --- a/src/mds/mds_main.c
>> +++ b/src/mds/mds_main.c
>> @@ -408,7 +408,7 @@ uint32_t mds_lib_req(NCS_LIB_REQ_INFO *req)
>>  if (tipc_mcast_enabled != false)
>>  tipc_mcast_enabled = true;
>>
>> -m_MDS_LOG_DBG(
>> +m_MDS_LOG_NOTIFY(
>>  "MDS: TIPC_MCAST_ENABLED: %d  Set argument 
>> \n",
>>  tipc_mcast_enabled);
>>  }
>> diff --git a/src/mds/mds_tipc_fctrl_intf.cc 
>> b/src/mds/mds_tipc_fctrl_intf.cc index 6271890..e8c9121 100644
>> --- a/src/mds/mds_tipc_fctrl_intf.cc
>> +++ b/src/mds/mds_tipc_fctrl_intf.cc
>> @@ -39,6 +39,7 @@ using mds::DataMessage;  using mds::ChunkAck;  
>> using mds::HeaderMessage;  using mds::Nack;
>> +using mds::Reset;
>>
>>namespace {
>>//

Re: [devel] [PATCH 1/1] mds: Add Reset message [#3090]

2019-10-14 Thread Tran Thuan
Hi bro.Minh,

Thanks for explanation.
I think the "reset" message should be rename to "introduce" message.
Another question: with this fix, will tx probation timer become redundant or 
still useful in somehow?

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau  
Sent: Monday, October 14, 2019 1:01 PM
To: Tran Thuan ; hans.nordeb...@ericsson.com; 
gary@dektech.com.au; vu.m.ngu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: Add Reset message [#3090]

Hi Thuan,

If the chunkack is configured to send after a few data messages, then the 
sender is not getting any chunkack for the first message from receiver until 
chunkack timeout (which is also configurable to be a bit larger value). Then, 
the probation timer would be timeout at sender.

The rcvwnd.acked_ will be fixed.

Thanks

Minh

On 14/10/19 4:39 pm, Tran Thuan wrote:
> Hi bro.Minh,
>
> - In my understanding, tx probation timer only start when sender send 
> first message.
> Then sender relies on chunkAck to know receiver support MDS FCTRL or 
> timeout as not support.
> But from what you describe, sender got tx probation timer timeout 
> before sending first message?
> Or after sending first message but sender cannot get any chunkAck somehow?
> I am confused this point. Could you help explain?
>
> - About the code, mistake set '0' twice for .acked_ in
> TipcPortId::ReceiveReset()
>
> Best Regards,
> ThuanTr
>
> -Original Message-
> From: Minh Chau 
> Sent: Friday, October 11, 2019 10:52 AM
> To: hans.nordeb...@ericsson.com; gary@dektech.com.au; 
> vu.m.ngu...@dektech.com.au; thuan.t...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net; Minh Chau 
> 
> Subject: [PATCH 1/1] mds: Add Reset message [#3090]
>
> mds relies on data message sent from the peer to determine whether the 
> MDS_TIPC_FCTRL_ENABLED is set. The data message may not be sent right 
> after TIPC_PUBLISHED event, which can cause the tx probation timer timeout.
>
> This patch add Reset message, which is sent right after the 
> TIPC_PUBLISHED to help mds determine the flow control supported at the peer 
> earlier.
> ---
>   src/mds/mds_main.c   |  2 +-
>   src/mds/mds_tipc_fctrl_intf.cc   | 27 ++
>   src/mds/mds_tipc_fctrl_msg.cc| 11 +
>   src/mds/mds_tipc_fctrl_msg.h | 18 +++
>   src/mds/mds_tipc_fctrl_portid.cc | 49
> ++--
>   src/mds/mds_tipc_fctrl_portid.h  |  2 ++
>   6 files changed, 96 insertions(+), 13 deletions(-)
>
> diff --git a/src/mds/mds_main.c b/src/mds/mds_main.c index 
> 8c9b1f1..c7d2f7b
> 100644
> --- a/src/mds/mds_main.c
> +++ b/src/mds/mds_main.c
> @@ -408,7 +408,7 @@ uint32_t mds_lib_req(NCS_LIB_REQ_INFO *req)
>   if (tipc_mcast_enabled != false)
>   tipc_mcast_enabled = true;
>   
> - m_MDS_LOG_DBG(
> + m_MDS_LOG_NOTIFY(
>   "MDS: TIPC_MCAST_ENABLED: %d  Set argument 
> \n",
>   tipc_mcast_enabled);
>   }
> diff --git a/src/mds/mds_tipc_fctrl_intf.cc 
> b/src/mds/mds_tipc_fctrl_intf.cc index 6271890..e8c9121 100644
> --- a/src/mds/mds_tipc_fctrl_intf.cc
> +++ b/src/mds/mds_tipc_fctrl_intf.cc
> @@ -39,6 +39,7 @@ using mds::DataMessage;  using mds::ChunkAck;  using 
> mds::HeaderMessage;  using mds::Nack;
> +using mds::Reset;
>   
>   namespace {
>   // flow control enabled/disabled
> @@ -124,12 +125,20 @@ uint32_t process_flow_event(const Event& evt) {
> uint32_t rc = NCSCC_RC_SUCCESS;
> TipcPortId *portid = portid_lookup(evt.id_);
> if (portid == nullptr) {
> +// the null portid normally should not happen; however because the
> +// tipc_cb.Dsock and tipc_cb.BSRsock are separated; the data message
> +// sent from BSRsock may come before reception of TIPC_PUBLISHED
>   if (evt.type_ == Event::Type::kEvtRcvData) {
> portid = new TipcPortId(evt.id_, data_sock_fd,
> kChunkAckSize, sock_buf_size);
> portid_map[TipcPortId::GetUniqueId(evt.id_)] = portid;
> rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
>   evt.fseq_, evt.svc_id_);
> +} else if (evt.type_ == Event::Type::kEvtRcvReset) {
> +  portid = new TipcPortId(evt.id_, data_sock_fd,
> +  kChunkAckSize, sock_buf_size);
> +  portid_map[TipcPortId::GetUniqueId(evt.id_)] = portid;
> +  portid->ReceiveReset();
>   } else {
> m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
> "RcvEvt[evt:%d], Error[

Re: [devel] [PATCH 1/1] mds: Add Reset message [#3090]

2019-10-13 Thread Tran Thuan
Hi bro.Minh,

- In my understanding, tx probation timer only start when sender send first
message.
Then sender relies on chunkAck to know receiver support MDS FCTRL or timeout
as not support.
But from what you describe, sender got tx probation timer timeout before
sending first message?
Or after sending first message but sender cannot get any chunkAck somehow?
I am confused this point. Could you help explain?

- About the code, mistake set '0' twice for .acked_ in
TipcPortId::ReceiveReset()

Best Regards,
ThuanTr

-Original Message-
From: Minh Chau  
Sent: Friday, October 11, 2019 10:52 AM
To: hans.nordeb...@ericsson.com; gary@dektech.com.au;
vu.m.ngu...@dektech.com.au; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau

Subject: [PATCH 1/1] mds: Add Reset message [#3090]

mds relies on data message sent from the peer to determine whether the
MDS_TIPC_FCTRL_ENABLED is set. The data message may not be sent right after
TIPC_PUBLISHED event, which can cause the tx probation timer timeout.

This patch add Reset message, which is sent right after the TIPC_PUBLISHED
to help mds determine the flow control supported at the peer earlier.
---
 src/mds/mds_main.c   |  2 +-
 src/mds/mds_tipc_fctrl_intf.cc   | 27 ++
 src/mds/mds_tipc_fctrl_msg.cc| 11 +
 src/mds/mds_tipc_fctrl_msg.h | 18 +++
 src/mds/mds_tipc_fctrl_portid.cc | 49
++--
 src/mds/mds_tipc_fctrl_portid.h  |  2 ++
 6 files changed, 96 insertions(+), 13 deletions(-)

diff --git a/src/mds/mds_main.c b/src/mds/mds_main.c index 8c9b1f1..c7d2f7b
100644
--- a/src/mds/mds_main.c
+++ b/src/mds/mds_main.c
@@ -408,7 +408,7 @@ uint32_t mds_lib_req(NCS_LIB_REQ_INFO *req)
if (tipc_mcast_enabled != false)
tipc_mcast_enabled = true;
 
-   m_MDS_LOG_DBG(
+   m_MDS_LOG_NOTIFY(
"MDS: TIPC_MCAST_ENABLED: %d  Set
argument \n",
tipc_mcast_enabled);
}
diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index 6271890..e8c9121 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -39,6 +39,7 @@ using mds::DataMessage;  using mds::ChunkAck;  using
mds::HeaderMessage;  using mds::Nack;
+using mds::Reset;
 
 namespace {
 // flow control enabled/disabled
@@ -124,12 +125,20 @@ uint32_t process_flow_event(const Event& evt) {
   uint32_t rc = NCSCC_RC_SUCCESS;
   TipcPortId *portid = portid_lookup(evt.id_);
   if (portid == nullptr) {
+// the null portid normally should not happen; however because the
+// tipc_cb.Dsock and tipc_cb.BSRsock are separated; the data message
+// sent from BSRsock may come before reception of TIPC_PUBLISHED
 if (evt.type_ == Event::Type::kEvtRcvData) {
   portid = new TipcPortId(evt.id_, data_sock_fd,
   kChunkAckSize, sock_buf_size);
   portid_map[TipcPortId::GetUniqueId(evt.id_)] = portid;
   rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
 evt.fseq_, evt.svc_id_);
+} else if (evt.type_ == Event::Type::kEvtRcvReset) {
+  portid = new TipcPortId(evt.id_, data_sock_fd,
+  kChunkAckSize, sock_buf_size);
+  portid_map[TipcPortId::GetUniqueId(evt.id_)] = portid;
+  portid->ReceiveReset();
 } else {
   m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
   "RcvEvt[evt:%d], Error[PortId not found]", @@ -151,6 +160,9 @@
uint32_t process_flow_event(const Event& evt) {
   portid->ReceiveNack(evt.mseq_, evt.mfrag_,
   evt.fseq_);
 }
+if (evt.type_ == Event::Type::kEvtRcvReset) {
+  portid->ReceiveReset();
+}
   }
   return rc;
 }
@@ -489,6 +501,16 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer,
uint16_t len,
   m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events,
Error[%s]",
   strerror(errno));
 }
+  } else if (header.msg_type_ == Reset::kResetMsgType) {
+// no need to decode reset message
+// the decoding reset message type is done in header decoding
+// send to the event thread
+if (m_NCS_IPC_SEND(_events,
+new Event(Event::Type::kEvtRcvReset, id, 0, 0, 0, 0),
+NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
+  m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events,
Error[%s]",
+  strerror(errno));
+}
   } else {
 m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
 "[msg_type:%u], Error[not supported message type]", @@ -516,6
+538,11 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, uint16_t len,
   portid_map_mutex.unlock();
   return rc;
 }
+  } else {
+m_MDS_LOG_DBG("FCTRL: [me] <-- [node:%x, ref:%u], "
+"Receive non-flow-control data message, "
+

Re: [devel] [PATCH 1/1] osaf: perform handshake in tcp_server in new thread [#3099]

2019-10-11 Thread Tran Thuan
Hi Gary,

ACK from me.

Best Regards,
ThuanTr

-Original Message-
From: Gary Lee  
Sent: Friday, October 11, 2019 10:33 AM
To: hans.nordeb...@ericsson.com; minh.c...@dektech.com.au; 
thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] osaf: perform handshake in tcp_server in new thread 
[#3099]

Hi

I should have put one more comment in.

Currently, the handshake is done in the equivalent of accept() running 
in the 'main thread'. If a client is malicious or faulty, then no one 
else can connect. But finish_request() is run from the thread created 
for each client.

Gary

On 11/10/19 2:22 pm, Gary Lee wrote:
> ---
>   src/osaf/consensus/plugins/tcp/tcp_server.py | 7 ++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/src/osaf/consensus/plugins/tcp/tcp_server.py 
> b/src/osaf/consensus/plugins/tcp/tcp_server.py
> index a7f22f2..c10859c 100755
> --- a/src/osaf/consensus/plugins/tcp/tcp_server.py
> +++ b/src/osaf/consensus/plugins/tcp/tcp_server.py
> @@ -73,10 +73,15 @@ class ThreadedRPCServer(ThreadingMixIn,
>   certfile=CERTFILE,
>   keyfile=KEYFILE,
>   cert_reqs=ssl.CERT_NONE,
> -ssl_version=ssl.PROTOCOL_TLSv1_2)
> +ssl_version=ssl.PROTOCOL_TLSv1_2,
> +do_handshake_on_connect=False)
>   self.server_bind()
>   self.server_activate()
>   
> +def finish_request(self, request, client_address):
> + request.do_handshake()
> + return SimpleXMLRPCServer.finish_request(self, request, 
> client_address)
> +
>   
>   class Arbitrator(object):
>   """ Implementation of a simple arbitrator """




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] osaf: return new takeover_request immediately [#3098]

2019-10-10 Thread Tran Thuan
Hi Gary,

ACK from me.

Best Regards,
ThuanTr

-Original Message-
From: Gary Lee  
Sent: Thursday, October 10, 2019 11:01 AM
To: hans.nordeb...@ericsson.com; minh.c...@dektech.com.au;
thang.d.ngu...@dektech.com.au; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Gary Lee 
Subject: [PATCH 1/1] osaf: return new takeover_request immediately [#3098]

If a takeover_request is created just before the active controller calls
'watch takeover_request', then it's possible that the active rded instance
is not informed of the request.

When 'watch takeover_request' is called, check if there's already a
takeover_request in 'NEW' state and return immediately.
---
 src/osaf/consensus/plugins/etcd3.plugin | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/osaf/consensus/plugins/etcd3.plugin
b/src/osaf/consensus/plugins/etcd3.plugin
index d926885..4e09ef6 100644
--- a/src/osaf/consensus/plugins/etcd3.plugin
+++ b/src/osaf/consensus/plugins/etcd3.plugin
@@ -337,13 +337,22 @@ watch() {
   orig_value=$(get "$watch_key")
   result=$?
 
-  if [ "$result" -le "1" ]; then
+  if [ "$result" -le 1 ]; then
+  if [ "$result" -eq 0 ] && [ "$watch_key" == "$takeover_request" ];
then
+state=$(echo $orig_value | awk '{print $4}')
+if [ "$state" == "NEW" ]; then
+  # takeover_request already exists; maybe it was written created
+  # while this node was being promoted
+  echo $orig_value
+  return 0
+fi
+  fi
 while true
 do
   sleep $heartbeat_interval
   current_value=$(get "$watch_key")
   result=$?
-  if [ "$result" -gt "1" ]; then
+  if [ "$result" -gt 1 ]; then
 # etcd down?
 if [ "$watch_key" == "$takeover_request" ]; then
   hostname=`cat $node_name_file`
--
2.7.4




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] ntfd: Do not send response to client if client down [#3084]

2019-10-07 Thread Tran Thuan
Hi Thien,

My yesterday idea seems not right because if agent down before clientAdded()
then it does not work.

Another idea is:
   mds_svc_event()
- Handle Agent Down: add call new function markAgentDown() to add
mds_dest to down list
- Handle Agent Up: add call new function markAgentUp() to remove if
mds_dest in down list
  ntfs_mds_msg_send()
- Check Agent in down list to skip send message.

Actually similar with your current solution but we need "handle Agent Up"
and check Agent down before sending any msg to agent (more general).
Also, I think markAgentDown() may be new function, not new code in
SearchAndSetClientsDownFlag().

Best Regards,
ThuanTr

-Original Message-----
From: Tran Thuan  
Sent: Monday, October 7, 2019 5:55 PM
To: 'thien.m.huynh' ;
vu.m.ngu...@dektech.com.au; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] ntfd: Do not send response to client if
client down [#3084]

Hi Thien,

I think you just need create a new function
SearchAndGetClientsDownFlag(MDS_DEST  mds_dest) Then in
client_added_res_lib(), you will call SearchAndGetClientsDownFlag() to
decide send response or skip.

Best Regards,
ThuanTr

-Original Message-
From: thien.m.huynh 
Sent: Monday, October 7, 2019 10:41 AM
To: vu.m.ngu...@dektech.com.au; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] ntfd: Do not send response to client if client
down [#3084]

Ntfd will not send response to a client when client already down.
This will avoid timeout when ntfd send via mds.
---
 src/ntf/ntfd/NtfAdmin.cc | 61

 src/ntf/ntfd/NtfAdmin.h  |  1 +
 src/ntf/ntfd/ntfs_cb.h   |  6 +
 src/ntf/ntfd/ntfs_com.c  |  4 
 src/ntf/ntfd/ntfs_com.h  |  1 +
 src/ntf/ntfd/ntfs_evt.c  |  1 +
 src/ntf/ntfd/ntfs_mds.c  |  2 +-
 7 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/src/ntf/ntfd/NtfAdmin.cc b/src/ntf/ntfd/NtfAdmin.cc index
8bbee69..3a0e9f6 100644
--- a/src/ntf/ntfd/NtfAdmin.cc
+++ b/src/ntf/ntfd/NtfAdmin.cc
@@ -555,6 +555,28 @@ void NtfAdmin::SearchAndSetClientsDownFlag(MDS_DEST
mds_dest) {
   client->SetClientDownFlag();
 }
   }
+
+  CLIENT_DOWN_LIST *client_down_rec = NULL;  if ((client_down_rec = 
+ reinterpret_cast(
+   malloc(sizeof(CLIENT_DOWN_LIST == NULL) {
+LOG_ER("memory allocation for the CLIENT_DOWN_LIST failed");
+return;
+  }
+  memset(client_down_rec, 0, sizeof(CLIENT_DOWN_LIST)); 
+ client_down_rec->mds_dest = mds_dest;  client_down_rec->next = NULL;
+
+  if (ntfs_cb->client_down_list_head == NULL) {
+ntfs_cb->client_down_list_head = client_down_rec;  } else {
+CLIENT_DOWN_LIST *p = ntfs_cb->client_down_list_head;
+while (p->next != NULL) {
+  p = p->next;
+}
+p->next = client_down_rec;
+  }
+  TRACE_1("MDS dest added: %" PRIx64, client_down_rec->mds_dest);
+
   osaf_mutex_unlock_ordie(_map_mutex);
   TRACE_LEAVE();
 }
@@ -1096,6 +1118,41 @@ bool NtfAdmin::is_stale_client(unsigned int
client_id) {
 return false;
 }
 
+/**
+ * @brief  Checks client added to CLIENT_DOWN_LIST.
+ * Remove client out of list if existed.
+ * @param  mds_dest
+ * @return true/false.
+ */
+bool NtfAdmin::checkAddedAndRemove(MDS_DEST mds_dest) {
+  bool found = false;
+  CLIENT_DOWN_LIST *client_down_rec = ntfs_cb->client_down_list_head;
+  CLIENT_DOWN_LIST *prev = NULL;
+  TRACE_ENTER();
+  while (client_down_rec != NULL) {
+if (mds_dest == client_down_rec->mds_dest) {
+  if (client_down_rec == ntfs_cb->client_down_list_head) {
+if (client_down_rec->next == NULL) {
+  ntfs_cb->client_down_list_head = NULL;
+} else {
+  ntfs_cb->client_down_list_head = client_down_rec->next;
+}
+  } else if (prev) {
+prev->next = client_down_rec->next;
+  }
+  TRACE("MDS dest %" PRIx64 " already delete",
client_down_rec->mds_dest);
+  free(client_down_rec);
+  client_down_rec = NULL;
+  found = true;
+  break;
+}
+prev = client_down_rec;
+client_down_rec = client_down_rec->next;
+  }
+  TRACE_LEAVE();
+  return found;
+}
+
 // C wrapper funcions start here
 
 /**
@@ -1300,6 +1357,10 @@ uint32_t
send_clm_node_status_change(SaClmClusterChangesT cluster_change,
   cluster_change, node_id));
 }
 
+bool checkAddedAndRemove(MDS_DEST mds_dest) {
+  return (NtfAdmin::theNtfAdmin->checkAddedAndRemove(mds_dest));
+}
+
 /**
  * @brief  Checks CLM membership status of a client.
  * A0101 clients are always CLM member.
diff --git a/src/ntf/ntfd/NtfAdmin.h b/src/ntf/ntfd/NtfAdmin.h index
4808ca9..d0c5528 100644
--- a/src/ntf/ntfd/NtfAdmin.h
+++ b/src/ntf/ntfd/NtfAdmin.h
@@ -109,6 +109,7 @@ class NtfAdmin {
   uint32_t send_cluster_membership_msg_to_clients(
   SaClmClusterChangesT cluster_cha

Re: [devel] [PATCH 1/1] ntfd: Do not send response to client if client down [#3084]

2019-10-07 Thread Tran Thuan
Hi Thien,

I think you just need create a new function
SearchAndGetClientsDownFlag(MDS_DEST  mds_dest)
Then in client_added_res_lib(), you will call SearchAndGetClientsDownFlag()
to decide send response or skip.

Best Regards,
ThuanTr

-Original Message-
From: thien.m.huynh  
Sent: Monday, October 7, 2019 10:41 AM
To: vu.m.ngu...@dektech.com.au; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] ntfd: Do not send response to client if client
down [#3084]

Ntfd will not send response to a client when client already down.
This will avoid timeout when ntfd send via mds.
---
 src/ntf/ntfd/NtfAdmin.cc | 61

 src/ntf/ntfd/NtfAdmin.h  |  1 +
 src/ntf/ntfd/ntfs_cb.h   |  6 +
 src/ntf/ntfd/ntfs_com.c  |  4 
 src/ntf/ntfd/ntfs_com.h  |  1 +
 src/ntf/ntfd/ntfs_evt.c  |  1 +
 src/ntf/ntfd/ntfs_mds.c  |  2 +-
 7 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/src/ntf/ntfd/NtfAdmin.cc b/src/ntf/ntfd/NtfAdmin.cc index
8bbee69..3a0e9f6 100644
--- a/src/ntf/ntfd/NtfAdmin.cc
+++ b/src/ntf/ntfd/NtfAdmin.cc
@@ -555,6 +555,28 @@ void NtfAdmin::SearchAndSetClientsDownFlag(MDS_DEST
mds_dest) {
   client->SetClientDownFlag();
 }
   }
+
+  CLIENT_DOWN_LIST *client_down_rec = NULL;  if ((client_down_rec = 
+ reinterpret_cast(
+   malloc(sizeof(CLIENT_DOWN_LIST == NULL) {
+LOG_ER("memory allocation for the CLIENT_DOWN_LIST failed");
+return;
+  }
+  memset(client_down_rec, 0, sizeof(CLIENT_DOWN_LIST));  
+ client_down_rec->mds_dest = mds_dest;  client_down_rec->next = NULL;
+
+  if (ntfs_cb->client_down_list_head == NULL) {
+ntfs_cb->client_down_list_head = client_down_rec;  } else {
+CLIENT_DOWN_LIST *p = ntfs_cb->client_down_list_head;
+while (p->next != NULL) {
+  p = p->next;
+}
+p->next = client_down_rec;
+  }
+  TRACE_1("MDS dest added: %" PRIx64, client_down_rec->mds_dest);
+
   osaf_mutex_unlock_ordie(_map_mutex);
   TRACE_LEAVE();
 }
@@ -1096,6 +1118,41 @@ bool NtfAdmin::is_stale_client(unsigned int
client_id) {
 return false;
 }
 
+/**
+ * @brief  Checks client added to CLIENT_DOWN_LIST.
+ * Remove client out of list if existed.
+ * @param  mds_dest
+ * @return true/false.
+ */
+bool NtfAdmin::checkAddedAndRemove(MDS_DEST mds_dest) {
+  bool found = false;
+  CLIENT_DOWN_LIST *client_down_rec = ntfs_cb->client_down_list_head;
+  CLIENT_DOWN_LIST *prev = NULL;
+  TRACE_ENTER();
+  while (client_down_rec != NULL) {
+if (mds_dest == client_down_rec->mds_dest) {
+  if (client_down_rec == ntfs_cb->client_down_list_head) {
+if (client_down_rec->next == NULL) {
+  ntfs_cb->client_down_list_head = NULL;
+} else {
+  ntfs_cb->client_down_list_head = client_down_rec->next;
+}
+  } else if (prev) {
+prev->next = client_down_rec->next;
+  }
+  TRACE("MDS dest %" PRIx64 " already delete",
client_down_rec->mds_dest);
+  free(client_down_rec);
+  client_down_rec = NULL;
+  found = true;
+  break;
+}
+prev = client_down_rec;
+client_down_rec = client_down_rec->next;
+  }
+  TRACE_LEAVE();
+  return found;
+}
+
 // C wrapper funcions start here
 
 /**
@@ -1300,6 +1357,10 @@ uint32_t
send_clm_node_status_change(SaClmClusterChangesT cluster_change,
   cluster_change, node_id));
 }
 
+bool checkAddedAndRemove(MDS_DEST mds_dest) {
+  return (NtfAdmin::theNtfAdmin->checkAddedAndRemove(mds_dest));
+}
+
 /**
  * @brief  Checks CLM membership status of a client.
  * A0101 clients are always CLM member.
diff --git a/src/ntf/ntfd/NtfAdmin.h b/src/ntf/ntfd/NtfAdmin.h index
4808ca9..d0c5528 100644
--- a/src/ntf/ntfd/NtfAdmin.h
+++ b/src/ntf/ntfd/NtfAdmin.h
@@ -109,6 +109,7 @@ class NtfAdmin {
   uint32_t send_cluster_membership_msg_to_clients(
   SaClmClusterChangesT cluster_change, NODE_ID node_id);
   bool is_stale_client(unsigned int clientId);
+  bool checkAddedAndRemove(MDS_DEST mds_dest);
 
  private:
   void processNotification(unsigned int clientId, diff --git
a/src/ntf/ntfd/ntfs_cb.h b/src/ntf/ntfd/ntfs_cb.h index 96eedc1..e09f8fb
100644
--- a/src/ntf/ntfd/ntfs_cb.h
+++ b/src/ntf/ntfd/ntfs_cb.h
@@ -38,6 +38,11 @@ typedef struct {
   MDS_DEST mds_dest;
 } ntf_client_t;
 
+typedef struct client_down_list {
+  MDS_DEST mds_dest;
+  struct client_down_list *next;
+} CLIENT_DOWN_LIST;
+
 typedef struct ntfs_cb {
   SYSF_MBX mbx;   /* NTFS's mailbox */
   MDS_HDL mds_hdl;/* PWE Handle for interacting with NTFAs  */
@@ -71,6 +76,7 @@ typedef struct ntfs_cb {
   NCS_SEL_OBJ usr2_sel_obj; /* Selection object for CLM initialization.*/
   uint16_t peer_mbcsv_version; /*Remeber peer NTFS MBCSV version.*/
   bool clm_initialized;// For CLM init status;
+  CLIENT_DOWN_LIST *client_down_list_head; /* client down reccords */
 } ntfs_cb_t;
 
 extern uint32_t ntfs_cb_init(ntfs_cb_t *); diff --git

Re: [devel] [PATCH 0/2] Review Request for mds: Add Nack message for MDS_TIPC_FCTRL_ENABLED [#3095] V2

2019-10-06 Thread Tran Thuan
Hi Minh,

ACK from me.

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau  
Sent: Monday, October 7, 2019 7:12 AM
To: hans.nordeb...@ericsson.com; vu.m.ngu...@dektech.com.au; 
gary@dektech.com.au; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 0/2] Review Request for mds: Add Nack message for 
MDS_TIPC_FCTRL_ENABLED [#3095] V2

Hi,

I would like to push the patches today if no more comment for them.

Thanks

Minh

On 4/10/19 3:20 pm, Minh Chau wrote:
> Summary: mds: Add Nack message for MDS_TIPC_FCTRL_ENABLED [#3095] V2 
> Review request for Ticket(s): 3095 Peer Reviewer(s): Hans, Vu, Gary, 
> Thuan Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** 
> Affected branch(es): develop Development branch: ticket-3095 Base 
> revision: 05064a1cfd0aeaf824dce7602d535654b3457e30
> Personal repository: git://git.code.sf.net/u/minh-chau/review
>
> 
> Impacted area   Impact y/n
> 
>   Docsn
>   Build systemn
>   RPM/packaging   n
>   Configuration files n
>   Startup scripts n
>   SAF servicesn
>   OpenSAF servicesn
>   Core libraries  y
>   Samples n
>   Tests   n
>   Other   n
>
>
> Comments (indicate scope for each "y" above):
> -
> *** EXPLAIN/COMMENT THE PATCH SERIES HERE ***
>
> revision cbbeab8f2299620aa3eb9b0e29710a2b159b5a45
> Author:   Minh Chau 
> Date: Fri, 4 Oct 2019 12:59:27 +1000
>
> mds: Improve error log for MDS_TIPC_FCTRL_ENABLED [#3095]
>
> This commit as part of #3095 updates the error string with pattern 
> "FCTRL:*Error[*]", in order to help grep-ing the error in mds debug 
> log.
>
>
>
> revision cc666586717fa82df70471748d8766e8fe901460
> Author:   Minh Chau 
> Date: Fri, 4 Oct 2019 12:59:16 +1000
>
> mds: Add Nack message for MDS_TIPC_FCTRL_ENABLED [#3095]
>
> In the scenario of recovery from split-brain, where both active 
> director services may suffer mds message loss due to lost-contact tipc 
> link. If MDS_TIPC_FCTRL_ENABLED is set, the out-of-order message will 
> be dropped, and there is no mechanism to trigger the retransmission 
> from receiver side at this moment (the retransmission is only 
> triggered from sender as result of TIPC_ERR_OVERLOAD).
>
> In reception of disordered message, the receiver can send 
> not-acknowledgement to notify the sender for retransmission.
> Therefore, the sender can trigger retransmisison in the same way as 
> receiving TIPC_ERR_OVERLOAD.
>
> This patch adds Nack message for retransmission of disordered message 
> detected from receiver side, and adds a missing call to 
> portid_map_mutex.unlock() in process_all_events().
>
>
>
> Complete diffstat:
> --
>   src/mds/mds_c_api.c  |  2 +-
>   src/mds/mds_dt_common.c  |  2 +-
>   src/mds/mds_tipc_fctrl_intf.cc   | 72 
> +---
>   src/mds/mds_tipc_fctrl_msg.cc| 35 ++-
>   src/mds/mds_tipc_fctrl_msg.h | 22 
>   src/mds/mds_tipc_fctrl_portid.cc | 42 ---
>   src/mds/mds_tipc_fctrl_portid.h  |  3 +-
>   7 files changed, 143 insertions(+), 35 deletions(-)
>
>
> Testing Commands:
> -
> *** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***
>
>
> Testing, Expected Results:
> --
> *** PASTE COMMAND OUTPUTS / TEST RESULTS ***
>
>
> Conditions of Submission:
> -
> *** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***
>
>
> Arch  Built StartedLinux distro
> ---
> mipsn  n
> mips64  n  n
> x86 n  n
> x86_64  n  n
> powerpc n  n
> powerpc64   n  n
>
>
> Reviewer Checklist:
> ---
> [Submitters: make sure that your review doesn't trigger any 
> checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomplete; it has too many blank entries
>  that need proper data filled in.
>
> ___ You have failed to nominate the proper persons for review and push.
>
> ___ Your patches do not have proper short+long header
>
> ___ You have grammar/spelling in your header that is unacceptable.
>
> ___ You have exceeded a sensible line length in your headers/comments/text.
>
> ___ You have failed to put in a proper Trac Ticket # into your commits.
>
> ___ You have incorrectly put/left internal data in your comments/files
>  (i.e. internal bug tracking tool IDs, product names etc)
>
> ___ You have not given any evidence of testing beyond basic build tests.
>  Demonstrate some level of runtime or other sanity testing.
>
> ___ You have ^M present in some of your files. These have to be removed.
>
> ___ You have 

Re: [devel] [PATCH 1/1] mds: Enhance decoding for mds flow control message [#3097]

2019-10-06 Thread Tran Thuan
Hi Minh,

Some minor comments from me, check [Thuan] inline.
Thanks.

Best Regards,
ThuanTr

-Original Message-
From: Minh Chau  
Sent: Monday, October 7, 2019 7:12 AM
To: hans.nordeb...@ericsson.com; vu.m.ngu...@dektech.com.au;
gary@dektech.com.au; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Minh Chau

Subject: [PATCH 1/1] mds: Enhance decoding for mds flow control message
[#3097]

mds currently uses MDS_PROT_FCTRL_ID 4 bytes value (0x00AC13F5) from octet11
to octet14 to identify the flow control message e.g., chunkack message. In
case of fragmentation from big message, the second fragment onwards will
start from the octet11, which may have arbitrary value and cause mds to
incorrectly decode as a flow control message if the fragment starts with
value of 0x00AC13F5.

This patch fixes this rare case by decoding flow control message only if the
oct2-5 (mds global sequence number) and oct6-7 (mds fragment number) are
non-zero. Change MDS_PROT_FCTRL_ID:0xFDAC13F5
[Thuan]: typo "non-zero" -> "zero"?
[Thuan] Can you give info in commit message about why change
MDS_PROT_FCTRL_ID to FDAC13F5?
---
 src/mds/mds_dt.h  |  2 +-
 src/mds/mds_tipc_fctrl_msg.cc | 20 +---
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h index d9e8633..64da600
100644
--- a/src/mds/mds_dt.h
+++ b/src/mds/mds_dt.h
@@ -245,7 +245,7 @@ bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, NCSCONTEXT
msg);
 
 /* MDS protocol/version for flow control */  #define MDS_PROT_FCTRL (0xB0 |
MDS_VERSION) -#define MDS_PROT_FCTRL_ID 0x00AC13F5
+#define MDS_PROT_FCTRL_ID 0xFDAC13F5
 
 /* Added for the subscription changes */  #define MDS_NCS_CHASSIS_ID
(m_NCS_GET_NODE_ID & 0x00ff) diff --git a/src/mds/mds_tipc_fctrl_msg.cc
b/src/mds/mds_tipc_fctrl_msg.cc index 064d977..8375673 100644
--- a/src/mds/mds_tipc_fctrl_msg.cc
+++ b/src/mds/mds_tipc_fctrl_msg.cc
@@ -64,13 +64,19 @@ void HeaderMessage::Decode(uint8_t *msg) {
 // decode flow control sequence number
 ptr = [HeaderMessage::FieldIndex::kFlowControlSequenceNumber];
 fseq_ = ncs_decode_16bit();
-// decode protocol identifier
-ptr = [ChunkAck::FieldIndex::kProtocolIdentifier];
-pro_id_ = ncs_decode_32bit();
-if (pro_id_ == MDS_PROT_FCTRL_ID) {
-  // decode message type
-  ptr = [ChunkAck::FieldIndex::kFlowControlMessageType];
-  msg_type_ = ncs_decode_8bit();
+// decode protocol identifier if the mfrag_ and mseq_ are 0
+// otherwise it is always DataMessage within non-zero mseq_ and mfrag_
+if (mfrag_ == 0 && mseq_ == 0) {
+  ptr = [ChunkAck::FieldIndex::kProtocolIdentifier];
+  pro_id_ = ncs_decode_32bit();
+  if (pro_id_ == MDS_PROT_FCTRL_ID) {
+// decode message type
+ptr = [ChunkAck::FieldIndex::kFlowControlMessageType];
+msg_type_ = ncs_decode_8bit();
+  }
+} else {
+  pro_id_ = 0;
+  msg_type_ = 0;
[Thuan] Don't need ELSE as values 0 already?
 }
   } else {
 if (mfrag_ != 0) {
--
2.7.4




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amf: add asserts to problematic areas identified by codechecker [#3077]

2019-10-06 Thread Tran Thuan
Hi Gary,

ACK from me.

Best Regards,
ThuanTr

-Original Message-
From: Gary Lee  
Sent: Thursday, October 3, 2019 12:11 PM
To: thuan.t...@dektech.com.au; minh.c...@dektech.com.au;
hans.nordeb...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net; Gary Lee 
Subject: [PATCH 1/1] amf: add asserts to problematic areas identified by
codechecker [#3077]

---
 src/amf/amfd/sg_nway_fsm.cc | 2 ++
 src/amf/amfd/sgtype.cc  | 1 +
 src/amf/amfnd/comp.cc   | 2 ++
 src/amf/amfnd/susm.cc   | 1 +
 4 files changed, 6 insertions(+)

diff --git a/src/amf/amfd/sg_nway_fsm.cc b/src/amf/amfd/sg_nway_fsm.cc index
f7ddc57..2c17b5a 100644
--- a/src/amf/amfd/sg_nway_fsm.cc
+++ b/src/amf/amfd/sg_nway_fsm.cc
@@ -2589,6 +2589,8 @@ static AVD_SU_SI_REL
*find_pref_standby_susi(AVD_SU_SI_REL *sisu) {
 
   TRACE_ENTER();
 
+  osafassert(sisu != nullptr);
+  osafassert(sisu->si != nullptr);
   curr_sisu = sisu->si->list_of_sisu;
   while (curr_sisu) {
 if ((SA_AMF_READINESS_IN_SERVICE ==
curr_sisu->su->saAmfSuReadinessState) && diff --git a/src/amf/amfd/sgtype.cc
b/src/amf/amfd/sgtype.cc index 15fae9c..64ebbd7 100644
--- a/src/amf/amfd/sgtype.cc
+++ b/src/amf/amfd/sgtype.cc
@@ -439,6 +439,7 @@ static void sgtype_ccb_apply_modify_hdlr(struct
CcbUtilOperationData *opdata) {
 LOG_WA("SGT modify apply (STDBY): sgt does not exist");
 return;
   }
+  osafassert(sgt != nullptr);
 
   while ((attr_mod = opdata->param.modify.attrMods[i++]) != nullptr) {
 bool value_is_deleted;
diff --git a/src/amf/amfnd/comp.cc b/src/amf/amfnd/comp.cc index
b520550..a12171c 100644
--- a/src/amf/amfnd/comp.cc
+++ b/src/amf/amfnd/comp.cc
@@ -2785,6 +2785,7 @@ uint32_t comp_restart_initiate(AVND_COMP *comp) {
 // reset contained comps for this container
 AVND_COMP_CSI_REC *curr_csi(m_AVND_CSI_REC_FROM_COMP_DLL_NODE_GET(
   m_NCS_DBLIST_FIND_FIRST(>csi_list)));
+osafassert(curr_csi != nullptr);
 const std::string& containerCsi(curr_csi->name);
 
 for (auto  : cb->compdb) {
@@ -2837,6 +2838,7 @@ uint32_t comp_restart_initiate(AVND_COMP *comp) {
   if (!m_AVND_COMP_TYPE_IS_PREINSTANTIABLE(comp)) {
 AVND_COMP_CSI_REC *csi = m_AVND_CSI_REC_FROM_COMP_DLL_NODE_GET(
 m_NCS_DBLIST_FIND_FIRST(>csi_list));
+osafassert(csi != nullptr);
 if (m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_ASSIGNED(csi) ||
 m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_RESTARTING(csi)) {
   m_AVND_COMP_CSI_CURR_ASSIGN_STATE_SET(
diff --git a/src/amf/amfnd/susm.cc b/src/amf/amfnd/susm.cc index
6376327..c1aa9e4 100644
--- a/src/amf/amfnd/susm.cc
+++ b/src/amf/amfnd/susm.cc
@@ -392,6 +392,7 @@ uint32_t avnd_su_si_msg_prc(AVND_CB *cb, AVND_SU *su,
AVND_SU_SI_PARAM *info) {
   if (true == info->single_csi) {
 AVND_COMP_CSI_PARAM *csi_param;
 AVND_COMP_CSI_REC *csi_rec;
+osafassert(si != nullptr);
 si->single_csi_add_rem_in_si = AVSV_SUSI_ACT_DEL;
 osafassert((info->num_assigns == 1));
 csi_param = info->list;
--
2.7.4




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] mds: optimize mdstest suite 27 [#3087]

2019-09-24 Thread Tran Thuan
Hi bro.Minh,

I will update and send out V3.

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau  
Sent: Wednesday, September 25, 2019 9:00 AM
To: thuan.tran ; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: optimize mdstest suite 27 [#3087]

Hi Thuan,

Some comments:

- a few warnings for >80 chars line

- Need to free(msg) that is returned from each MDS callback

- Another minor comment below

Thanks

Minh

On 24/9/19 1:10 pm, thuan.tran wrote:
> - Just allocate a small buffer instead of huge buffer
> ---
>   src/mds/apitest/mdstipc_api.c | 116 +++---
>   1 file changed, 52 insertions(+), 64 deletions(-)
>
> diff --git a/src/mds/apitest/mdstipc_api.c 
> b/src/mds/apitest/mdstipc_api.c index 805728464..33e7d6c12 100644
> --- a/src/mds/apitest/mdstipc_api.c
> +++ b/src/mds/apitest/mdstipc_api.c
> @@ -13105,10 +13105,14 @@ void tet_create_default_PWE_VDEST_tp()
>   test_validate(FAIL, 0);
>   }
>   
> -void tet_sender(char *send_buff, uint32_t buff_len, int msg_count)
> +void tet_sender(uint32_t msg_count, uint32_t msg_size)
>   {
>   int live = 100; // sender live max 100s
>   TET_MDS_MSG *mesg;
> + if (msg_size > TET_MSG_SIZE_MIN) {
> + printf("\nSender: msg_size cannot bigger than 
> TET_MSG_SIZE_MIN\n");
> + exit(1);
> + }
>   mesg = (TET_MDS_MSG *)malloc(sizeof(TET_MDS_MSG));
>   memset(mesg, 0, sizeof(TET_MDS_MSG));
>   
> @@ -13134,7 +13138,7 @@ void tet_sender(char *send_buff, uint32_t buff_len, 
> int msg_count)
>   exit(1);
>   }
>   
> - while(!gl_tet_adest.svc[0].svcevt[0].dest && live-- > 0) {
> + while (!gl_tet_adest.svc[0].svcevt[0].dest && live-- > 0) {
>   printf("\nSender is waiting for receiver UP\n");
>   sleep(1);
>   }
> @@ -13147,11 +13151,11 @@ void tet_sender(char *send_buff, uint32_t buff_len, 
> int msg_count)
>   // otherwise, receiver won't detect loss message
>   sleep(1);
>   
> - uint32_t offset = 0;
> - uint32_t msg_len = buff_len / msg_count;
> - for (int i = 1; i <= msg_count; i++) {
> - memcpy(mesg->send_data, _buff[offset], msg_len);
> - mesg->send_len = msg_len;
> + for (uint32_t i = 1; i <= msg_count; i++) {
> + /* to verify received correct order */
> + memset(mesg->send_data, 'X', msg_size);
> + sprintf(mesg->send_data, "%u", i);
> + mesg->send_len = msg_size;
>   if (mds_just_send(gl_tet_adest.mds_pwe1_hdl,
> NCSMDS_SVC_ID_INTERNAL_MIN,
> NCSMDS_SVC_ID_EXTERNAL_MIN,
> @@ -13163,23 +13167,25 @@ void tet_sender(char *send_buff, uint32_t buff_len, 
> int msg_count)
>   } else {
>   printf("\nSender SENT message %d successfully\n", i);
>   }
> - offset += msg_len;
>   }
>   free(mesg);
> - while(live-- > 0) {
> + while (live-- > 0) {
>   // Keep sender alive for retransmission
>   sleep(1);
>   }
>   }
>   
> -bool tet_receiver(char *expected_buff, uint32_t buff_len, int 
> msg_count)
> +bool tet_receiver(uint32_t msg_count, uint32_t msg_size)
>   {
> - int ret = 1;
> + if (msg_size > TET_MSG_SIZE_MIN) {
> + printf("\nReceiver: msg_size cannot bigger than 
> TET_MSG_SIZE_MIN\n");
> + return 1;
> + }
>   printf("\nStarted Receiver (pid:%d) svc_id=%d\n",
>   (int)getpid(), NCSMDS_SVC_ID_EXTERNAL_MIN);
>   if (adest_get_handle() != NCSCC_RC_SUCCESS) {
>   printf("\nReceiver FAIL to get adest handle\n");
> - return ret;
> + return 1;
>   }
>   
>   sleep(1); //Let sender subscribe before receiver install @@ 
> -13197,14 +13203,12 @@ bool tet_receiver(char *expected_buff, uint32_t 
> buff_len, int msg_count)
>   exit(1);
>   }
>   
> - char *received_buff = malloc(buff_len);
> - memset(received_buff, 0, buff_len);
> - uint32_t offset = 0;
> + char *expected_buff = malloc(msg_size);
>   struct pollfd sel;
> - int counter = 0;
> + uint32_t counter = 0;
>   sel.fd = m_GET_FD_FROM_SEL_OBJ(gl_tet_adest.svc[0].sel_obj);
>   sel.events = POLLIN;
> - while(counter < msg_count) {
> + while (counter < msg_count) {
>   int ret = osaf_poll(, 1, 1);
>   if (ret > 0) {
>   gl_rcvdmsginfo.msg = NULL;
> @@ -13214,11 +13218,23 @@ bool tet_receiver(char *expected_buff, uint32_t 
> buff_len, int msg_count)
>   printf("\nReceiver FAIL to retrieve message\n");
>   break;
>   }
> - TET_MDS_MSG *msg = (TET_MDS_MSG*)gl_rcvdmsginfo.msg;
> + TET_MDS_MSG *msg = (TET_MDS_MSG *)gl_rcvdmsginfo.msg;
>   

Re: [devel] [PATCH 1/1] smf: allow to commit merged camp after a manual cluster reboot [#3063]

2019-08-26 Thread Tran Thuan
Hi Khanh,

ACK from me (code review, not test)

Best Regards,
ThuanTr

-Original Message-
From: khanh.h.dang  
Sent: Monday, August 26, 2019 6:02 PM
To: lennart.l...@ericsson.com; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; khanh.h.dang

Subject: [PATCH 1/1] smf: allow to commit merged camp after a manual cluster
reboot [#3063]

Return OK to immSteps if step is already completed in order to continue
executing after a manual cluster reboot.
---
 src/smf/smfd/SmfUpgradeProcedure.cc | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/smf/smfd/SmfUpgradeProcedure.cc
b/src/smf/smfd/SmfUpgradeProcedure.cc
index fd99e88..dfe9853 100644
--- a/src/smf/smfd/SmfUpgradeProcedure.cc
+++ b/src/smf/smfd/SmfUpgradeProcedure.cc
@@ -3382,6 +3382,15 @@ SaAisErrorT
SmfUpgradeProcedure::getImmStepsMergedSingleStep() {
 return SA_AIS_ERR_INIT;
   }
 
+  if (newStep->getState() == SA_SMF_STEP_COMPLETED) {
+SmfCampaignThread::instance()->campaign()->setError("");
+LOG_NO("SmfUpgradeProcedure::getImmStepsMergedSingleStep: State %d",
+   SA_SMF_STEP_COMPLETED);
+delete newStep;
+TRACE_LEAVE();
+return SA_AIS_OK;
+  }
+
   if ((newStep->getState() != SA_SMF_STEP_INITIAL) &&
   (newStep->getState() != SA_SMF_STEP_EXECUTING)) {
 LOG_NO("SmfUpgradeProcedure::getImmStepsMergedSingleStep: Invalid state
%d",
--
2.7.4




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] smf: allow to commit merged camp after a manual cluster reboot [#3063]

2019-08-26 Thread Tran Thuan
Hi Khanh,

 

See my comments inline.

 

Best Regards,

ThuanTr

 

-Original Message-
From: Lennart Lund  
Sent: Monday, August 26, 2019 2:01 PM
To: Khanh Hoang Dang ; Thuan Tran

Cc: opensaf-devel@lists.sourceforge.net; Khanh Hoang Dang
; Lennart Lund 
Subject: RE: [PATCH 1/1] smf: allow to commit merged camp after a manual
cluster reboot [#3063]

 

Hi Khanh,

 

I have not done a thorough review or tested but the reasoning and the
solution seems ok. Not contradicting anything in AIS and not NBC, also the
new rules are documented (sort of).

Let's trust the other reviewers!

 

Thanks

Lennart

 

-Original Message-

From: khanh.h.dang < 
khanh.h.d...@dektech.com.au> 

Sent: den 23 augusti 2019 13:04

To: Lennart Lund < 
lennart.l...@ericsson.com>; Thuan Tran < 
thuan.t...@dektech.com.au>

Cc:  
opensaf-devel@lists.sourceforge.net; Khanh Hoang Dang <
 khanh.h.d...@dektech.com.au>

Subject: [PATCH 1/1] smf: allow to commit merged camp after a manual cluster
reboot [#3063]

 

Return OK to immSteps if step is already completed in order to continue
executing after a manual cluster reboot.

---

src/smf/smfd/SmfUpgradeCampaign.cc  | 1 +
src/smf/smfd/SmfUpgradeProcedure.cc | 7 +++

2 files changed, 8 insertions(+)

 

diff --git a/src/smf/smfd/SmfUpgradeCampaign.cc
b/src/smf/smfd/SmfUpgradeCampaign.cc

index 3c50bf7..a50f4da 100644

--- a/src/smf/smfd/SmfUpgradeCampaign.cc

+++ b/src/smf/smfd/SmfUpgradeCampaign.cc

@@ -903,6 +903,7 @@ void SmfUpgradeCampaign::procResult(SmfUpgradeProcedure
*i_procedure,  void SmfUpgradeCampaign::continueExec() {

   TRACE_ENTER();

   SaSmfCmpgStateT currentState = m_state->getState();

+  SmfCampaignThread::instance()->campaign()->setError("");

[Thuan]: Can we only reset error in new code block (see below comment)?

   // Check if the campaign execution continues after a campaign restart

   // resulting from a SMF ordered cluster reboot or SI_SWAP diff --git
a/src/smf/smfd/SmfUpgradeProcedure.cc b/src/smf/smfd/SmfUpgradeProcedure.cc

index fd99e88..92ae35d 100644

--- a/src/smf/smfd/SmfUpgradeProcedure.cc

+++ b/src/smf/smfd/SmfUpgradeProcedure.cc

@@ -3382,6 +3382,13 @@ SaAisErrorT
SmfUpgradeProcedure::getImmStepsMergedSingleStep() {

 return SA_AIS_ERR_INIT;

   }

+  if (newStep->getState() == SA_SMF_STEP_COMPLETED) {

[Thuan]: Can we reset error only code go into this condition?

SmfCampaignThread::instance()->campaign()->setError(error);

+LOG_NO("SmfUpgradeProcedure::getImmStepsMergedSingleStep: state %d",

+   SA_SMF_STEP_COMPLETED);

[Thuan] Do we need "delete newStep;" here?

+TRACE_LEAVE();

+return SA_AIS_OK;

+  }

+

   if ((newStep->getState() != SA_SMF_STEP_INITIAL) &&

   (newStep->getState() != SA_SMF_STEP_EXECUTING)) {

 LOG_NO("SmfUpgradeProcedure::getImmStepsMergedSingleStep: Invalid state
%d",

--

2.7.4

 


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] smfd: set campaign state failed when error msg is introduced [#3063]

2019-08-12 Thread Tran Thuan
Hi,

Khanh and I have a verbal talk what he found in his basic test (still crash
later)
I think the solution need to investigate more.

Best Regards,
ThuanTr

-Original Message-
From: Tran Thuan  
Sent: Monday, August 12, 2019 9:59 AM
To: 'khanh.h.dang' ; lennart.l...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] smfd: set campaign state failed when error
msg is introduced [#3063]

Hi Khanh,

ACK from me (code review).

Best Regards,
ThuanTr

-Original Message-
From: khanh.h.dang 
Sent: Tuesday, August 6, 2019 10:16 AM
To: lennart.l...@ericsson.com; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; khanh.h.dang

Subject: [PATCH 1/1] smfd: set campaign state failed when error msg is
introduced [#3063]

With single step upgrade method, after a manual cluster reboot, campaign
status gets error but state remains EXECUTION_COMPLETED.
This change corrects the state of campaign in such case.
---
 src/smf/smfd/SmfCampaign.cc | 17 +
 1 file changed, 17 insertions(+)

diff --git a/src/smf/smfd/SmfCampaign.cc b/src/smf/smfd/SmfCampaign.cc index
6f51483..0dc2f0f 100644
--- a/src/smf/smfd/SmfCampaign.cc
+++ b/src/smf/smfd/SmfCampaign.cc
@@ -729,6 +729,23 @@ bool SmfCampaign::startProcedure(SmfUpgradeProcedure
*procedure) {
 "Start of procedure thread failed for " + procedure->getDn();
 LOG_ER("%s", error.c_str());
 SmfCampaignThread::instance()->campaign()->setError(error);
+
+// When (merged to single step) campaign reachs EXECUTION_COMPLETED,
+// if unexpected cluster reboot occurs, change state to
EXECUTION_FAILED
+// for not leading to SMFD crash when trying to commit the campaign.
+SmfUpgradeCampaign *p_uc = getUpgradeCampaign();
+if ((p_uc->getProcExecutionMode() == SMF_MERGE_TO_SINGLE_STEP) &&
+(SmfCampaignThread::instance()->campaign()->getState() ==
+SA_SMF_CMPG_EXECUTION_COMPLETED)) {
+  std::string error =
+  "CAMP: campaign=" +
+  SmfCampaignThread::instance()->campaign()->getDn() +
+  " state: EXECUTION_COMPLETED => EXECUTION_FAILED " +
+  "due to unexpected cluster reboot.";
+  LOG_ER("%s", error.c_str());
+  SmfCampaignThread::instance()->campaign()->setState(
+  SA_SMF_CMPG_EXECUTION_FAILED);
+}
 return false;
   }
   return true;
--
2.7.4




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] smf: retry if ccb is aborted at init proc [#3061]

2019-08-12 Thread Tran Thuan
Hi Thang,

ACK from me (code review)

Best Regards,
ThuanTr

-Original Message-
From: thang.d.nguyen  
Sent: Monday, August 12, 2019 3:44 PM
To: lennart.l...@ericsson.com; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; thang.d.nguyen

Subject: [PATCH 1/1] smf: retry if ccb is aborted at init proc [#3061]

Retry if ccb is aborted at init proc. Retry when imm init action returned
TRY_AGAIN or NOT_EXIST.
---
 src/smf/smfd/SmfUpgradeAction.cc | 85 ++--
 src/smf/smfd/SmfUpgradeStep.cc   |  2 +-
 2 files changed, 71 insertions(+), 16 deletions(-)

diff --git a/src/smf/smfd/SmfUpgradeAction.cc
b/src/smf/smfd/SmfUpgradeAction.cc
index d05eea4d7..b0919c2b8 100644
--- a/src/smf/smfd/SmfUpgradeAction.cc
+++ b/src/smf/smfd/SmfUpgradeAction.cc
@@ -461,12 +461,12 @@ SaAisErrorT SmfImmCcbAction::execute(SaImmOiHandleT
i_oiHandle,
  const std::string* i_rollbackDn) {
   SaAisErrorT result = SA_AIS_OK;
   SmfRollbackCcb* rollbackCcb = NULL;
+  std::string immRollbackCcbDn;
 
   TRACE_ENTER();
 
   TRACE("Imm ccb actions id %d, size %zu", m_id, m_operations.size());
   if (i_rollbackDn != NULL) {
-std::string immRollbackCcbDn;
 char idStr[16];
 snprintf(idStr, 16, "%08d", m_id);
 immRollbackCcbDn = "smfRollbackElement=ccb_"; @@ -481,27 +481,82 @@
SaAisErrorT SmfImmCcbAction::execute(SaImmOiHandleT i_oiHandle,
   immRollbackCcbDn.c_str(), saf_error(result));
   return result;
 }
-
-rollbackCcb =
-new (std::nothrow) SmfRollbackCcb(immRollbackCcbDn, i_oiHandle);
-if (rollbackCcb == NULL) {
-  LOG_ER("SmfImmCcbAction::execute failed to create SmfRollbackCcb");
-  return SA_AIS_ERR_NO_MEMORY;
-}
   }
 
   if (m_operations.size() > 0) {
-  SmfImmUtils immUtil;
-if ((result = immUtil.doImmOperations(m_operations, rollbackCcb)) !=
-SA_AIS_OK) {
- delete rollbackCcb;
- rollbackCcb = NULL;
-}
+TRACE("Imm Ccb Action");
+SmfImmUtils immUtil;
+uint32_t retry_count = 0;
+while (1) {
+  if (i_rollbackDn != NULL) {
+rollbackCcb =
+new (std::nothrow) SmfRollbackCcb(immRollbackCcbDn,
i_oiHandle);
+if (rollbackCcb == NULL) {
+  LOG_ER("SmfImmCcbAction::execute failed to create
SmfRollbackCcb");
+  return SA_AIS_ERR_NO_MEMORY;
+}
+  }
+  result = immUtil.doImmOperations(m_operations, rollbackCcb);
+  if (((result == SA_AIS_ERR_TIMEOUT) || (result ==
SA_AIS_ERR_NOT_EXIST))
+  && (retry_count <= 6)) {
+int interval = 5;  // seconds
+// When IMM aborts a CCB because of synch request from a payload,
then
+// the next call of CCBInitialize() will return TRY_AGAIN till the
time
+// the synch is complete.
+// There is no direct information available to the OM that can
indicate
+// that the CCB or the Adminownerset failed because of an abort and
also
+// there is no notification that can indicate that IMM is ready
now.
+// That leaves SMF with the option to correlate error codes
returned.
+//
+// Notes on treatment of SA_AIS_ERR_TIMEOUT and
SA_AIS_ERR_NOT_EXIST
+// error codes:
+//
+// 1) CCB abort when it is not the first
operation(create/modify/delete)
+// in the CCB
+//and if there is dependency between objects in the CCB:-
+//
+// An abort of a CCB and if the objects(Create/Modify/delete) had
+// some dependency(parent-child) among them, then an API call of
+// AdminOwnerSet() or the CCBCreate/Delete/Modify() on a dependant
+// object will return SA_AIS_ERR_NOT_EXIST, because the CCB
aborted.
+//
+// 2) CCB abort when it is a first operation and/or there is no
+// intra-ccb objects-dependency:-
+//
+// When an ongoing CCB is aborted because of a synch request
originating
+// from a payload, then the AdminOwnerSet() or the
+// CCBCreate/Delete/Modify() will return timeout.
+
+++retry_count;
+LOG_NO("SmfImmCcbAction::execute failed with error: %s",
+   saf_error(result));
+LOG_NO("CCB was aborted!?, Retrying: %u", retry_count);
+// Total retry time of 2.5 minutes for a worst case IMM loaded with
say
+// < 300k objects. Retry every 25 seconds. i.e. (nanosleep for 5
+// seconds)  + (immutil_ccbInitialize will worstcase wait till 20
+// seconds).
+struct timespec sleepTime = {interval, 0};
+osaf_nanosleep();
+if (rollbackCcb != NULL) {
+  delete rollbackCcb;
+  rollbackCcb = NULL;
+}
+continue;
+  } else if (result != SA_AIS_OK) {
+LOG_ER("SmfImmCcbAction::execute failed, result=%s",
+   saf_error(result));
+if (rollbackCcb != NULL) {
+  delete rollbackCcb;
+  rollbackCcb = NULL;
+}
+ 

Re: [devel] [PATCH 1/1] smf: retry if ccb is aborted at init proc [#3061]

2019-08-11 Thread Tran Thuan
Hi Thang,

I have 2 comments:

+ TRACE("Modifying information model");
=> This is a copy from modifyInformationModel(), please update the message 
(e.g: "IMM CCB Actions") or remove it.

+ rollbackCcb = new (std::nothrow) SmfRollbackCcb(immRollbackCcbDn, i_oiHandle);
=> Can you consider to merge L485 into this loop() to reduce code?

Best Regards,
ThuanTr

-Original Message-
From: Tran Thuan  
Sent: Tuesday, July 23, 2019 9:37 AM
To: 'Thang Duc NGUYEN' 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] smf: retry if ccb is aborted at init proc 
[#3061]

Hi Thang,

 

See my response inline.

 

Best Regards,

ThuanTr

 

From: Thang Duc NGUYEN  
Sent: Tuesday, July 23, 2019 8:42 AM
To: Tran Thuan 
Cc: lennart.l...@ericsson.com; opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] smf: retry if ccb is aborted at init proc 
[#3061]

 

Hi Thuan,

Thanks for your comment.

See my reply.

Quoting Tran Thuan mailto:thuan.t...@dektech.com.au> >:

Hi Thang,

See my comments inline.

Best Regards,

ThuanTr

 

-Original Message-
From: thang.d.nguyen mailto:thang.d.ngu...@dektech.com.au> >
Sent: Monday, July 22, 2019 5:14 PM
To: lennart.l...@ericsson.com <mailto:lennart.l...@ericsson.com> 
Cc: opensaf-devel@lists.sourceforge.net 
<mailto:opensaf-devel@lists.sourceforge.net> 
Subject: [devel] [PATCH 1/1] smf: retry if ccb is aborted at init proc [#3061]

 

Retry if ccb is aborted at init proc. Retry when the model modification will 
return TRY_AGAIN or NOT_EXIST.

---

src/smf/smfd/SmfUpgradeAction.cc | 73 +++-

1 file changed, 65 insertions(+), 8 deletions(-)

 

diff --git a/src/smf/smfd/SmfUpgradeAction.cc b/src/smf/smfd/SmfUpgradeAction.cc

 

index d05eea4..4f2696a 100644

 

--- a/src/smf/smfd/SmfUpgradeAction.cc

 

+++ b/src/smf/smfd/SmfUpgradeAction.cc

 

@@ -461,12 +461,12 @@ SaAisErrorT SmfImmCcbAction::execute(SaImmOiHandleT 
i_oiHandle,

 

  const std::string* i_rollbackDn) {

 

   SaAisErrorT result = SA_AIS_OK;

 

   SmfRollbackCcb* rollbackCcb = NULL;

 

+  std::string immRollbackCcbDn;

 

   TRACE_ENTER();

   TRACE("Imm ccb actions id %d, size %zu", m_id, m_operations.size());

 

   if (i_rollbackDn != NULL) {

 

-std::string immRollbackCcbDn;

 

 char idStr[16];

 

 snprintf(idStr, 16, "%08d", m_id);

 

 immRollbackCcbDn = "smfRollbackElement=ccb_"; @@ -491,17 +491,74 @@ 
SaAisErrorT SmfImmCcbAction::execute(SaImmOiHandleT i_oiHandle,

 

   }

 

 

 

   if (m_operations.size() > 0) {

 

-  SmfImmUtils immUtil;

 

-if ((result = immUtil.doImmOperations(m_operations, rollbackCcb)) !=

 

-SA_AIS_OK) {

 

- delete rollbackCcb;

 

- rollbackCcb = NULL;

 

-}

 

+TRACE("Modifying information model");

 

+SmfImmUtils immUtil;

 

+while (1) {

 

+  uint32_t retry_count = 0;

 

+  result = immUtil.doImmOperations(m_operations, rollbackCcb);

 

+  if (((result == SA_AIS_ERR_TIMEOUT) || (result == SA_AIS_ERR_NOT_EXIST))

 

+  && (retry_count <= 6)) {

 

[Thuan] retry_count is always 0 since you declare it in while loop

[Thang]: OK

+int interval = 5;  // seconds

 

+// When IMM aborts a CCB because of synch request from a payload, then

 

+// the next call of CCBInitialize() will return TRY_AGAIN till the time

 

+// the synch is complete.

 

+// There is no direct information available to the OM that can indicate

 

+// that the CCB or the Adminownerset failed because of an abort and 
also

 

+// there is no notification that can indicate that IMM is ready now.

 

+// That leaves SMF with the option to correlate error codes returned.

 

+//

 

+// Notes on treatment of SA_AIS_ERR_TIMEOUT and SA_AIS_ERR_NOT_EXIST

 

+// error codes:

 

+//

 

+// 1) CCB abort when it is not the first 
operation(create/modify/delete)

 

+// in the CCB

 

+//and if there is dependency between objects in the CCB:-

 

+//

 

+// An abort of a CCB and if the objects(Create/Modify/delete) had

 

+// some dependency(parent-child) among them, then an API call of

 

+// AdminOwnerSet() or the CCBCreate/Delete/Modify() on a dependant

 

+// object will return SA_AIS_ERR_NOT_EXIST, because the CCB aborted.

 

+//

 

+// 2) CCB abort when it is a first operation and/or there is no

 

+// intra-ccb objects-dependency:-

 

+//

 

+// When an ongoing CCB is aborted because of a synch request 
originating

 

+// from a payload, then the AdminOwnerSet() or the

 

+// CCBCreate/Delete/Modify() will return timeout.

 

+

 

+++retry_count;

 

+L

Re: [devel] [PATCH 1/1] smfd: set campaign state failed when error msg is introduced [#3063]

2019-08-11 Thread Tran Thuan
Hi Khanh,

ACK from me (code review).

Best Regards,
ThuanTr

-Original Message-
From: khanh.h.dang  
Sent: Tuesday, August 6, 2019 10:16 AM
To: lennart.l...@ericsson.com; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; khanh.h.dang

Subject: [PATCH 1/1] smfd: set campaign state failed when error msg is
introduced [#3063]

With single step upgrade method, after a manual cluster reboot, campaign
status gets error but state remains EXECUTION_COMPLETED.
This change corrects the state of campaign in such case.
---
 src/smf/smfd/SmfCampaign.cc | 17 +
 1 file changed, 17 insertions(+)

diff --git a/src/smf/smfd/SmfCampaign.cc b/src/smf/smfd/SmfCampaign.cc index
6f51483..0dc2f0f 100644
--- a/src/smf/smfd/SmfCampaign.cc
+++ b/src/smf/smfd/SmfCampaign.cc
@@ -729,6 +729,23 @@ bool SmfCampaign::startProcedure(SmfUpgradeProcedure
*procedure) {
 "Start of procedure thread failed for " + procedure->getDn();
 LOG_ER("%s", error.c_str());
 SmfCampaignThread::instance()->campaign()->setError(error);
+
+// When (merged to single step) campaign reachs EXECUTION_COMPLETED,
+// if unexpected cluster reboot occurs, change state to
EXECUTION_FAILED
+// for not leading to SMFD crash when trying to commit the campaign.
+SmfUpgradeCampaign *p_uc = getUpgradeCampaign();
+if ((p_uc->getProcExecutionMode() == SMF_MERGE_TO_SINGLE_STEP) &&
+(SmfCampaignThread::instance()->campaign()->getState() ==
+SA_SMF_CMPG_EXECUTION_COMPLETED)) {
+  std::string error =
+  "CAMP: campaign=" +
+  SmfCampaignThread::instance()->campaign()->getDn() +
+  " state: EXECUTION_COMPLETED => EXECUTION_FAILED " +
+  "due to unexpected cluster reboot.";
+  LOG_ER("%s", error.c_str());
+  SmfCampaignThread::instance()->campaign()->setState(
+  SA_SMF_CMPG_EXECUTION_FAILED);
+}
 return false;
   }
   return true;
--
2.7.4




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amf: fix no active assignment even one in-service SU can be assigned [#3020]

2019-08-08 Thread Tran Thuan
Hi bro.Minh,

I am fine with rename that original function.
Please update and help push.
Thank you.

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau  
Sent: Friday, August 9, 2019 10:13 AM
To: thuan.tran ; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] amf: fix no active assignment even one in-service SU 
can be assigned [#3020]

Hi Thuan,

ack with minor comments.

Thanks

Minh

On 18/3/19 7:04 pm, thuan.tran wrote:
> AMFD should try assign SI active for other in-service SUs if fail to 
> assign for current in-service SU
> ---
>   src/amf/amfd/sg_2n_fsm.cc | 75 
> +--
>   1 file changed, 46 insertions(+), 29 deletions(-)
>
> diff --git a/src/amf/amfd/sg_2n_fsm.cc b/src/amf/amfd/sg_2n_fsm.cc 
> index 91ffc63..ba0f72e 100644
> --- a/src/amf/amfd/sg_2n_fsm.cc
> +++ b/src/amf/amfd/sg_2n_fsm.cc
> @@ -630,6 +630,43 @@ done:
>   }
>   
>   
> /*
> 
> + * Function: avd_sg_2n_assign_si
> + *
> + * Purpose:  This function choose and assign SIs in the SG that dont have
> + *   active assignment.
> + *
> + * Input: cb - the AVD control block
> + *sg - The pointer to the service group.
> + *su - The pointer to the service unit to be assigned ACTIVE.
> + *
> + * Returns: True if assign succeed, otherwise return false
> + *
> + 
> +*
> +*/ static bool avd_sg_2n_assign_si(AVD_CL_CB *cb, AVD_SG *sg, 
> +AVD_SU *su) {
[M]: This function only creates active assignment, the name could be 
avd_sg_2n_assign_act_si (or you can come up another name) to suggest what it is 
actually doing inside. And add TRACE_ENTER()/LEAVE().
> +  bool l_flag = false;
> +  AVD_SU_SI_REL *tmp_susi;
> +  /* choose and assign SIs in the SG that dont have active assignment 
> +*/
> +  for (const auto _si : sg->list_of_si) {
> +if ((i_si->saAmfSIAdminState == SA_AMF_ADMIN_UNLOCKED) &&
> +(i_si->list_of_csi != nullptr) &&
> +(i_si->si_dep_state != AVD_SI_SPONSOR_UNASSIGNED) &&
> +(i_si->si_dep_state != AVD_SI_UNASSIGNING_DUE_TO_DEP) &&
> +(i_si->si_dep_state != AVD_SI_READY_TO_UNASSIGN) &&
> +(i_si->list_of_sisu == AVD_SU_SI_REL_NULL) &&
> +(su->saAmfSUNumCurrActiveSIs < sg->saAmfSGMaxActiveSIsperSU)) {
> +  /* found a SI that needs active assignment. */
> +  if (avd_new_assgn_susi(cb, su, i_si, SA_AMF_HA_ACTIVE, false,
> + _susi) == NCSCC_RC_SUCCESS) {
> +l_flag = true;
> +  } else {
> +LOG_ER("%s:%u: %s", __FILE__, __LINE__, i_si->name.c_str());
> +  }
> +}
> +  }
> +  return l_flag;
> +}
> +
> +/
> +*
>* Function: avd_sg_2n_su_chose_asgn
>*
>* Purpose:  This function will identify the current active SU.
> @@ -675,7 +712,10 @@ static AVD_SU *avd_sg_2n_su_chose_asgn(AVD_CL_CB *cb, 
> AVD_SG *sg) {
>   for (const auto  : sg->list_of_su) {
> if (iter->saAmfSuReadinessState == SA_AMF_READINESS_IN_SERVICE) {
>   a_su = iter;
> -break;
> +l_flag = avd_sg_2n_assign_si(cb, sg, a_su);
> +if (l_flag == true) {
> +  break;
> +}
> }
>   }
>   
> @@ -683,36 +723,13 @@ static AVD_SU *avd_sg_2n_su_chose_asgn(AVD_CL_CB *cb, 
> AVD_SG *sg) {
> TRACE("No in service SUs available in the SG");
> goto done;
>   }
> -  } else { /* if (a_susi == AVD_SU_SI_REL_NULL) */
> -
> +  } else { /* if (a_susi != AVD_SU_SI_REL_NULL) */
>   a_su = a_susi->su;
> -  }
> -
> -  if (a_su->saAmfSuReadinessState != SA_AMF_READINESS_IN_SERVICE) {
> -TRACE("The current active SU is OOS so return");
> -goto done;
> -  }
> -
> -  /* check if any more active SIs can be assigned to this SU */
> -  l_flag = false;
> -
> -  /* choose and assign SIs in the SG that dont have active assignment 
> */
> -  for (const auto _si : sg->list_of_si) {
> -if ((i_si->saAmfSIAdminState == SA_AMF_ADMIN_UNLOCKED) &&
> -(i_si->list_of_csi != nullptr) &&
> -(i_si->si_dep_state != AVD_SI_SPONSOR_UNASSIGNED) &&
> -(i_si->si_dep_state != AVD_SI_UNASSIGNING_DUE_TO_DEP) &&
> -(i_si->si_dep_state != AVD_SI_READY_TO_UNASSIGN) &&
> -(i_si->list_of_sisu == AVD_SU_SI_REL_NULL) &&
> -(a_su->saAmfSUNumCurrActiveSIs < sg->saAmfSGMaxActiveSIsperSU)) {
> -  /* found a SI that needs active assignment. */
> -  if (avd_new_assgn_susi(cb, a_su, i_si, SA_AMF_HA_ACTIVE, false,
> - _susi) == NCSCC_RC_SUCCESS) {
> -l_flag = true;
> -  } else {
> -LOG_ER("%s:%u: %s", __FILE__, __LINE__, i_si->name.c_str());
> -  }
> +if (a_su->saAmfSuReadinessState != SA_AMF_READINESS_IN_SERVICE) {
> +  TRACE("The current active SU is OOS so return");
> +  

Re: [devel] [PATCH 1/1] smfd: set campaign state failed when error msg is introduced [#3063]

2019-08-05 Thread Tran Thuan
Hi Khanh,

It's better if you add comment for code.
Example:
// This happen when unexpected cluster reboot
// when campaign (merged to single step) reach EXECUTION_COMPLETED
// Keep state EXCUTION_COMPLETED then user try to commit lead to
// SMFD crash and campaign cannot be committed in anyway.

Best Regards,
ThuanTr

-Original Message-
From: khanh.h.dang  
Sent: Monday, August 5, 2019 11:52 AM
To: lennart.l...@ericsson.com; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; khanh.h.dang

Subject: [PATCH 1/1] smfd: set campaign state failed when error msg is
introduced [#3063]

With single step upgrade method, after a manual cluster reboot, campaign
status gets error but state remains EXECUTION_COMPLETED.
This change corrects the state of campaign in such case.
---
 src/smf/smfd/SmfCampaign.cc | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/smf/smfd/SmfCampaign.cc b/src/smf/smfd/SmfCampaign.cc index
6f514831b..d87afbe4d 100644
--- a/src/smf/smfd/SmfCampaign.cc
+++ b/src/smf/smfd/SmfCampaign.cc
@@ -729,6 +729,13 @@ bool SmfCampaign::startProcedure(SmfUpgradeProcedure
*procedure) {
 "Start of procedure thread failed for " + procedure->getDn();
 LOG_ER("%s", error.c_str());
 SmfCampaignThread::instance()->campaign()->setError(error);
+SmfUpgradeCampaign *p_uc = getUpgradeCampaign();
+if ((p_uc->getProcExecutionMode() == SMF_MERGE_TO_SINGLE_STEP) &&
+(SmfCampaignThread::instance()->campaign()->getState() ==
+SA_SMF_CMPG_EXECUTION_COMPLETED)) {
+  SmfCampaignThread::instance()->campaign()->setState(
+  SA_SMF_CMPG_EXECUTION_FAILED);
+}
 return false;
   }
   return true;
--
2.22.0




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] smf: retry if ccb is aborted at init proc [#3061]

2019-07-22 Thread Tran Thuan
Hi Thang,

 

See my response inline.

 

Best Regards,

ThuanTr

 

From: Thang Duc NGUYEN  
Sent: Tuesday, July 23, 2019 8:42 AM
To: Tran Thuan 
Cc: lennart.l...@ericsson.com; opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] smf: retry if ccb is aborted at init proc 
[#3061]

 

Hi Thuan,

Thanks for your comment.

See my reply.

Quoting Tran Thuan mailto:thuan.t...@dektech.com.au> >:

Hi Thang,

See my comments inline.

Best Regards,

ThuanTr

 

-Original Message-
From: thang.d.nguyen mailto:thang.d.ngu...@dektech.com.au> >
Sent: Monday, July 22, 2019 5:14 PM
To: lennart.l...@ericsson.com <mailto:lennart.l...@ericsson.com> 
Cc: opensaf-devel@lists.sourceforge.net 
<mailto:opensaf-devel@lists.sourceforge.net> 
Subject: [devel] [PATCH 1/1] smf: retry if ccb is aborted at init proc [#3061]

 

Retry if ccb is aborted at init proc. Retry when the model modification will 
return TRY_AGAIN or NOT_EXIST.

---

src/smf/smfd/SmfUpgradeAction.cc | 73 +++-

1 file changed, 65 insertions(+), 8 deletions(-)

 

diff --git a/src/smf/smfd/SmfUpgradeAction.cc b/src/smf/smfd/SmfUpgradeAction.cc

 

index d05eea4..4f2696a 100644

 

--- a/src/smf/smfd/SmfUpgradeAction.cc

 

+++ b/src/smf/smfd/SmfUpgradeAction.cc

 

@@ -461,12 +461,12 @@ SaAisErrorT SmfImmCcbAction::execute(SaImmOiHandleT 
i_oiHandle,

 

  const std::string* i_rollbackDn) {

 

   SaAisErrorT result = SA_AIS_OK;

 

   SmfRollbackCcb* rollbackCcb = NULL;

 

+  std::string immRollbackCcbDn;

 

   TRACE_ENTER();

   TRACE("Imm ccb actions id %d, size %zu", m_id, m_operations.size());

 

   if (i_rollbackDn != NULL) {

 

-std::string immRollbackCcbDn;

 

 char idStr[16];

 

 snprintf(idStr, 16, "%08d", m_id);

 

 immRollbackCcbDn = "smfRollbackElement=ccb_"; @@ -491,17 +491,74 @@ 
SaAisErrorT SmfImmCcbAction::execute(SaImmOiHandleT i_oiHandle,

 

   }

 

 

 

   if (m_operations.size() > 0) {

 

-  SmfImmUtils immUtil;

 

-if ((result = immUtil.doImmOperations(m_operations, rollbackCcb)) !=

 

-SA_AIS_OK) {

 

- delete rollbackCcb;

 

- rollbackCcb = NULL;

 

-}

 

+TRACE("Modifying information model");

 

+SmfImmUtils immUtil;

 

+while (1) {

 

+  uint32_t retry_count = 0;

 

+  result = immUtil.doImmOperations(m_operations, rollbackCcb);

 

+  if (((result == SA_AIS_ERR_TIMEOUT) || (result == SA_AIS_ERR_NOT_EXIST))

 

+  && (retry_count <= 6)) {

 

[Thuan] retry_count is always 0 since you declare it in while loop

[Thang]: OK

+int interval = 5;  // seconds

 

+// When IMM aborts a CCB because of synch request from a payload, then

 

+// the next call of CCBInitialize() will return TRY_AGAIN till the time

 

+// the synch is complete.

 

+// There is no direct information available to the OM that can indicate

 

+// that the CCB or the Adminownerset failed because of an abort and 
also

 

+// there is no notification that can indicate that IMM is ready now.

 

+// That leaves SMF with the option to correlate error codes returned.

 

+//

 

+// Notes on treatment of SA_AIS_ERR_TIMEOUT and SA_AIS_ERR_NOT_EXIST

 

+// error codes:

 

+//

 

+// 1) CCB abort when it is not the first 
operation(create/modify/delete)

 

+// in the CCB

 

+//and if there is dependency between objects in the CCB:-

 

+//

 

+// An abort of a CCB and if the objects(Create/Modify/delete) had

 

+// some dependency(parent-child) among them, then an API call of

 

+// AdminOwnerSet() or the CCBCreate/Delete/Modify() on a dependant

 

+// object will return SA_AIS_ERR_NOT_EXIST, because the CCB aborted.

 

+//

 

+// 2) CCB abort when it is a first operation and/or there is no

 

+// intra-ccb objects-dependency:-

 

+//

 

+// When an ongoing CCB is aborted because of a synch request 
originating

 

+// from a payload, then the AdminOwnerSet() or the

 

+// CCBCreate/Delete/Modify() will return timeout.

 

+

 

+++retry_count;

 

+LOG_NO("xngthan: SmfUpgradeAction modify IMM failed with error: %s",

 

+   saf_error(result));

 

[Thuan] remove "xngthan:", log message should be “SmfImmCcbAction::execute 
failed with error: %s”

[Thang]: OK

+LOG_NO("CCB was aborted!?, Retrying: %u", retry_count);

 

+// Total retry time of 2.5 minutes for a worst case IMM loaded with say

 

+// < 300k objects. Retry every 25 seconds. i.e. (nanosleep for 5

 

+// seconds)  + (immutil_ccbInitialize will worstcase wait till 20

 

+// seconds).

 

+struct time

Re: [devel] [PATCH 1/1] smf: retry if ccb is aborted at init proc [#3061]

2019-07-22 Thread Tran Thuan
Hi Thang,

 

See my comments inline.

 

Best Regards,

ThuanTr

 

-Original Message-
From: thang.d.nguyen  
Sent: Monday, July 22, 2019 5:14 PM
To: lennart.l...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] smf: retry if ccb is aborted at init proc
[#3061]

 

Retry if ccb is aborted at init proc. Retry when the model modification will
return TRY_AGAIN or NOT_EXIST.

---

src/smf/smfd/SmfUpgradeAction.cc | 73
+++-

1 file changed, 65 insertions(+), 8 deletions(-)

 

diff --git a/src/smf/smfd/SmfUpgradeAction.cc
b/src/smf/smfd/SmfUpgradeAction.cc

index d05eea4..4f2696a 100644

--- a/src/smf/smfd/SmfUpgradeAction.cc

+++ b/src/smf/smfd/SmfUpgradeAction.cc

@@ -461,12 +461,12 @@ SaAisErrorT SmfImmCcbAction::execute(SaImmOiHandleT
i_oiHandle,

  const std::string* i_rollbackDn) {

   SaAisErrorT result = SA_AIS_OK;

   SmfRollbackCcb* rollbackCcb = NULL;

+  std::string immRollbackCcbDn;

   TRACE_ENTER();

   TRACE("Imm ccb actions id %d, size %zu", m_id, m_operations.size());

   if (i_rollbackDn != NULL) {

-std::string immRollbackCcbDn;

 char idStr[16];

 snprintf(idStr, 16, "%08d", m_id);

 immRollbackCcbDn = "smfRollbackElement=ccb_"; @@ -491,17 +491,74 @@
SaAisErrorT SmfImmCcbAction::execute(SaImmOiHandleT i_oiHandle,

   }

   if (m_operations.size() > 0) {

-  SmfImmUtils immUtil;

-if ((result = immUtil.doImmOperations(m_operations, rollbackCcb)) !=

-SA_AIS_OK) {

- delete rollbackCcb;

- rollbackCcb = NULL;

-}

+TRACE("Modifying information model");

+SmfImmUtils immUtil;

+while (1) {

+  uint32_t retry_count = 0;

+  result = immUtil.doImmOperations(m_operations, rollbackCcb);

+  if (((result == SA_AIS_ERR_TIMEOUT) || (result ==
SA_AIS_ERR_NOT_EXIST))

+  && (retry_count <= 6)) {

[Thuan] retry_count is always 0 since you declare it in while loop

+int interval = 5;  // seconds

+// When IMM aborts a CCB because of synch request from a payload,
then

+// the next call of CCBInitialize() will return TRY_AGAIN till the
time

+// the synch is complete.

+// There is no direct information available to the OM that can
indicate

+// that the CCB or the Adminownerset failed because of an abort and
also

+// there is no notification that can indicate that IMM is ready
now.

+// That leaves SMF with the option to correlate error codes
returned.

+//

+// Notes on treatment of SA_AIS_ERR_TIMEOUT and
SA_AIS_ERR_NOT_EXIST

+// error codes:

+//

+// 1) CCB abort when it is not the first
operation(create/modify/delete)

+// in the CCB

+//and if there is dependency between objects in the CCB:-

+//

+// An abort of a CCB and if the objects(Create/Modify/delete) had

+// some dependency(parent-child) among them, then an API call of

+// AdminOwnerSet() or the CCBCreate/Delete/Modify() on a dependant

+// object will return SA_AIS_ERR_NOT_EXIST, because the CCB
aborted.

+//

+// 2) CCB abort when it is a first operation and/or there is no

+// intra-ccb objects-dependency:-

+//

+// When an ongoing CCB is aborted because of a synch request
originating

+// from a payload, then the AdminOwnerSet() or the

+// CCBCreate/Delete/Modify() will return timeout.

+

+++retry_count;

+LOG_NO("xngthan: SmfUpgradeAction modify IMM failed with error:
%s",

+   saf_error(result));

[Thuan] remove "xngthan:", log message should be "SmfImmCcbAction::execute
failed with error: %s"

+LOG_NO("CCB was aborted!?, Retrying: %u", retry_count);

+// Total retry time of 2.5 minutes for a worst case IMM loaded with
say

+// < 300k objects. Retry every 25 seconds. i.e. (nanosleep for 5

+// seconds)  + (immutil_ccbInitialize will worstcase wait till 20

+// seconds).

+struct timespec sleepTime = {interval, 0};

+osaf_nanosleep();

[Thuan] Why not use "sleep(5);"

+// Note: Alternatively Make rollbackCcb unique by adding a method
for

+// this to the rollbackCcb.

[Thuan] Need check if rollbackCcb != NULL

+delete rollbackCcb;

+rollbackCcb =

+new (std::nothrow) SmfRollbackCcb(immRollbackCcbDn,
i_oiHandle);

+if (rollbackCcb == NULL) {

+  LOG_ER("SmfImmCcbAction::execute failed to create
SmfRollbackCcb");

+  return SA_AIS_ERR_NO_MEMORY;

+}

+continue;

+  } else if (result != SA_AIS_OK) {

+LOG_ER("Giving up, SmfUpgradeAction modify IMM failed, result=%s",

[Thuan] log message "SmfImmCcbAction::execute failed, result=%s", "giving
up" may not correct if no retry yet.

+   saf_error(result));

+delete rollbackCcb;


Re: [devel] [PATCH 1/1] amf: realign sg if si_dep_state of si in FAILOVER_UNDER_PROGRESS [#3054]

2019-07-11 Thread Tran Thuan
Hi Thang,

OK. No more comment from me.
Thanks.

Best Regards,
ThuanTr

-Original Message-
From: Thang Nguyen  
Sent: Thursday, July 11, 2019 2:09 PM
To: 'Tran Thuan' ; gary@dektech.com.au;
minh.c...@dektech.com.au; hans.nordeb...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [devel] [PATCH 1/1] amf: realign sg if si_dep_state of si in
FAILOVER_UNDER_PROGRESS [#3054]

Hi Thuan,

Thanks for your review. I tested with your suggestion but it failed for
another test case.


B.R
/Thang

-Original Message-
From: Tran Thuan 
Sent: Thursday, July 11, 2019 11:06 AM
To: 'thang.d.nguyen' ;
gary@dektech.com.au; minh.c...@dektech.com.au;
hans.nordeb...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [devel] [PATCH 1/1] amf: realign sg if si_dep_state of si in
FAILOVER_UNDER_PROGRESS [#3054]

Hi Thang,

See my comment [Thuan] inline.

Best Regards,
ThuanTr

-Original Message-
From: thang.d.nguyen 
Sent: Tuesday, July 2, 2019 10:21 PM
To: gary@dektech.com.au; minh.c...@dektech.com.au;
hans.nordeb...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] amf: realign sg if si_dep_state of si in
FAILOVER_UNDER_PROGRESS [#3054]

With SI dependency, when failover happen the failover is processed on SU had
sponsor SI first. It also update the si_dep_state of dependent SI to
FAILOVER_UNDER_PROGRESS. Then failover is processed on SU had dependent SI.
But failover is deferred as sponsors role failover is under going.

Amfd crashes when the SU had dependent SI was instantiated on standy node
while the assignment ACTIVE sponsor SI for SU on ACTIVE node still
assigning.
---
 src/amf/amfd/sg.cc | 13 +
 src/amf/amfd/sg.h  |  1 +
 src/amf/amfd/sgproc.cc |  6 +-
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfd/sg.cc b/src/amf/amfd/sg.cc index 09f8f31..c815489
100644
--- a/src/amf/amfd/sg.cc
+++ b/src/amf/amfd/sg.cc
@@ -2345,6 +2345,19 @@ bool AVD_SG::any_assignment_excessive() {
   return pending;
 }
 
+bool AVD_SG::any_failover_under_progress() {
+  bool pending = false;
+  TRACE_ENTER2("SG:'%s'", name.c_str());
+  for (const auto  : list_of_si) {
+if (si->si_dep_state == AVD_SI_FAILOVER_UNDER_PROGRESS) {
+  pending = true;
+  break;
+}
+  }
+  TRACE_LEAVE();
+  return pending;
+}
+
 /*
  * Going through all SU of this SG, if any SU has over assigned,
  * reboot the node that hosts the SU.
diff --git a/src/amf/amfd/sg.h b/src/amf/amfd/sg.h index 55e7dbe..147429b
100644
--- a/src/amf/amfd/sg.h
+++ b/src/amf/amfd/sg.h
@@ -441,6 +441,7 @@ class AVD_SG {
   bool any_assignment_absent();
   bool any_assignment_assigned();
   bool any_assignment_excessive();
+  bool any_failover_under_progress();
   void failover_absent_assignment();
   bool ng_using_saAmfSGAdminState;
   bool headless_validation;
diff --git a/src/amf/amfd/sgproc.cc b/src/amf/amfd/sgproc.cc index
7c8d9a5..ebf11c8 100644
--- a/src/amf/amfd/sgproc.cc
+++ b/src/amf/amfd/sgproc.cc
@@ -1007,7 +1007,11 @@ void avd_su_oper_state_evh(AVD_CL_CB *cb, AVD_EVT
*evt) {
  */
 if (su->sg_of_su->any_assignment_in_progress() == false &&
 su->sg_of_su->any_assignment_absent() == false) {
-  su->sg_of_su->set_fsm_state(AVD_SG_FSM_STABLE);
+  if (su->sg_of_su->any_failover_under_progress() == false) {
+su->sg_of_su->set_fsm_state(AVD_SG_FSM_STABLE);
+  } else {
+su->sg_of_su->set_fsm_state(AVD_SG_FSM_SG_REALIGN);
+  }

[Thuan] just need add one more condition to set fsm state STABLE.
 if (su->sg_of_su->any_assignment_in_progress() == false &&
 su->sg_of_su->any_assignment_absent() == false &&
 su->sg_of_su->any_failover_under_progress() == false) {
 su->sg_of_su->set_fsm_state(AVD_SG_FSM_STABLE);
 }
[Thuan] I have tested this with your TC, it works.

 if (su->sg_of_su->sg_ncs_spec == true) {
   if (su->saAmfSUAdminState == SA_AMF_ADMIN_UNLOCKED) {
--
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel





___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amf: realign sg if si_dep_state of si in FAILOVER_UNDER_PROGRESS [#3054]

2019-07-10 Thread Tran Thuan
Hi Thang,

See my comment [Thuan] inline.

Best Regards,
ThuanTr

-Original Message-
From: thang.d.nguyen  
Sent: Tuesday, July 2, 2019 10:21 PM
To: gary@dektech.com.au; minh.c...@dektech.com.au;
hans.nordeb...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] amf: realign sg if si_dep_state of si in
FAILOVER_UNDER_PROGRESS [#3054]

With SI dependency, when failover happen the failover is processed on SU had
sponsor SI first. It also update the si_dep_state of dependent SI to
FAILOVER_UNDER_PROGRESS. Then failover is processed on SU had dependent SI.
But failover is deferred as sponsors role failover is under going.

Amfd crashes when the SU had dependent SI was instantiated on standy node
while the assignment ACTIVE sponsor SI for SU on ACTIVE node still
assigning.
---
 src/amf/amfd/sg.cc | 13 +
 src/amf/amfd/sg.h  |  1 +
 src/amf/amfd/sgproc.cc |  6 +-
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfd/sg.cc b/src/amf/amfd/sg.cc index 09f8f31..c815489
100644
--- a/src/amf/amfd/sg.cc
+++ b/src/amf/amfd/sg.cc
@@ -2345,6 +2345,19 @@ bool AVD_SG::any_assignment_excessive() {
   return pending;
 }
 
+bool AVD_SG::any_failover_under_progress() {
+  bool pending = false;
+  TRACE_ENTER2("SG:'%s'", name.c_str());
+  for (const auto  : list_of_si) {
+if (si->si_dep_state == AVD_SI_FAILOVER_UNDER_PROGRESS) {
+  pending = true;
+  break;
+}
+  }
+  TRACE_LEAVE();
+  return pending;
+}
+
 /*
  * Going through all SU of this SG, if any SU has over assigned,
  * reboot the node that hosts the SU.
diff --git a/src/amf/amfd/sg.h b/src/amf/amfd/sg.h index 55e7dbe..147429b
100644
--- a/src/amf/amfd/sg.h
+++ b/src/amf/amfd/sg.h
@@ -441,6 +441,7 @@ class AVD_SG {
   bool any_assignment_absent();
   bool any_assignment_assigned();
   bool any_assignment_excessive();
+  bool any_failover_under_progress();
   void failover_absent_assignment();
   bool ng_using_saAmfSGAdminState;
   bool headless_validation;
diff --git a/src/amf/amfd/sgproc.cc b/src/amf/amfd/sgproc.cc index
7c8d9a5..ebf11c8 100644
--- a/src/amf/amfd/sgproc.cc
+++ b/src/amf/amfd/sgproc.cc
@@ -1007,7 +1007,11 @@ void avd_su_oper_state_evh(AVD_CL_CB *cb, AVD_EVT
*evt) {
  */
 if (su->sg_of_su->any_assignment_in_progress() == false &&
 su->sg_of_su->any_assignment_absent() == false) {
-  su->sg_of_su->set_fsm_state(AVD_SG_FSM_STABLE);
+  if (su->sg_of_su->any_failover_under_progress() == false) {
+su->sg_of_su->set_fsm_state(AVD_SG_FSM_STABLE);
+  } else {
+su->sg_of_su->set_fsm_state(AVD_SG_FSM_SG_REALIGN);
+  }

[Thuan] just need add one more condition to set fsm state STABLE.
 if (su->sg_of_su->any_assignment_in_progress() == false &&
 su->sg_of_su->any_assignment_absent() == false &&
 su->sg_of_su->any_failover_under_progress() == false) {
 su->sg_of_su->set_fsm_state(AVD_SG_FSM_STABLE);
 }
[Thuan] I have tested this with your TC, it works.

 if (su->sg_of_su->sg_ncs_spec == true) {
   if (su->saAmfSUAdminState == SA_AMF_ADMIN_UNLOCKED) {
--
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amf: fix SU get stuck in INSTANTIATING presence state [#3047]

2019-06-13 Thread Tran Thuan
Hi bro.Minh,

Thanks for your comment.
I did update to make it clearer and sent out V2.

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau  
Sent: Thursday, June 13, 2019 6:31 PM
To: thuan.tran ; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Hans Nordeback 

Subject: Re: [PATCH 1/1] amf: fix SU get stuck in INSTANTIATING presence state 
[#3047]

Hi Thuan,

ack with minor comment.

Thanks

Minh

On 3/6/19 5:10 pm, thuan.tran wrote:
> COMP restart recovery during SU restart recovery can lead to SU stuck 
> in INSTANTIATING without further action. Because COMP instaniated 
> event in RESTARTING does not trigger avnd_su_pres_fsm_run().
> ---
>   src/amf/amfnd/clc.cc  | 4 
>   src/amf/amfnd/susm.cc | 4 +++-
>   2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/src/amf/amfnd/clc.cc b/src/amf/amfnd/clc.cc index 
> 675ca49..9b1b3a7 100644
> --- a/src/amf/amfnd/clc.cc
> +++ b/src/amf/amfnd/clc.cc
> @@ -926,6 +926,7 @@ uint32_t avnd_comp_clc_st_chng_prc(AVND_CB *cb, AVND_COMP 
> *comp,
> AVND_SU_PRES_FSM_EV ev = AVND_SU_PRES_FSM_EV_MAX;
> AVND_COMP_CSI_REC *csi = 0;
> bool is_en;
> +  bool pi_comp_recover = false;
> uint32_t rc = NCSCC_RC_SUCCESS;
> TRACE_ENTER2("Comp '%s', Prv_state '%s', Final_state '%s'",
>  comp->name.c_str(), presence_state[prv_st], @@ -953,6 
> +954,8 @@ uint32_t avnd_comp_clc_st_chng_prc(AVND_CB *cb, AVND_COMP *comp,
>   TRACE_1(
>   "Component restart is through admin opration, admin oper flag 
> reset");
>   comp->admin_oper = false;
> +  } else if (m_AVND_COMP_TYPE_IS_PREINSTANTIABLE(comp)) {
> +pi_comp_recover = true;

[M]: It looks doubtful, the check itself only wants to know if the @comp is pi, 
it does not relate to the first *if* (@admin_oper and @final_st)?

> }
>   
> if ((SA_AMF_PRESENCE_INSTANTIATED == prv_st) && @@ -1487,6 +1490,7 
> @@ uint32_t avnd_comp_clc_st_chng_prc(AVND_CB *cb, AVND_COMP *comp,
>(SA_AMF_PRESENCE_ORPHANED != prv_st) &&
>((prv_st == SA_AMF_PRESENCE_INSTANTIATING) ||
> (prv_st == SA_AMF_PRESENCE_TERMINATING) ||
> +  (prv_st == SA_AMF_PRESENCE_RESTARTING && 
> + pi_comp_recover) ||
> (comp->su->admin_op_Id == SA_AMF_ADMIN_RESTART)))
> ev = AVND_SU_PRES_FSM_EV_COMP_INSTANTIATED;
>   else if (SA_AMF_PRESENCE_INSTANTIATION_FAILED == final_st) diff 
> --git a/src/amf/amfnd/susm.cc b/src/amf/amfnd/susm.cc index 
> c023c8d..62e2db9 100644
> --- a/src/amf/amfnd/susm.cc
> +++ b/src/amf/amfnd/susm.cc
> @@ -2282,7 +2282,9 @@ uint32_t avnd_su_pres_insting_compinst_hdler(AVND_CB 
> *cb, AVND_SU *su,
>  curr_comp; curr_comp = m_AVND_COMP_FROM_SU_DLL_NODE_GET(
> m_NCS_DBLIST_FIND_NEXT(_comp->su_dll_node))) 
> {
>   /* instantiate the pi comp */
> -if (m_AVND_COMP_TYPE_IS_PREINSTANTIABLE(curr_comp)) {
> +if (m_AVND_COMP_TYPE_IS_PREINSTANTIABLE(curr_comp) &&
> +   (!m_AVND_COMP_IS_FAILED(curr_comp) ||
> +curr_comp->pres != SA_AMF_PRESENCE_RESTARTING)) {
> TRACE("Running the component clc FSM");
> rc = avnd_comp_clc_fsm_run(cb, curr_comp,
>
> AVND_COMP_CLC_PRES_FSM_EV_INST);



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amfnd: reboot to recovery if msg id received by amfd mismatch with msg id sent by amfnd [#3040]

2019-05-20 Thread Tran Thuan
Hi,

I have minor comments with [Thuan] inline.
Feel free to update or not.

Best Regards,
ThuanTr

-Original Message-
From: Gary Lee  
Sent: Monday, May 20, 2019 1:06 PM
To: thang.d.nguyen ;
hans.nordeb...@ericsson.com; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] amfnd: reboot to recovery if msg id
received by amfd mismatch with msg id sent by amfnd [#3040]

Hi Thang

Looks good to me. Nagu, any comments?

Thanks

Gary

On 15/5/19 12:14 am, thang.d.nguyen wrote:
> During SC failover, message received on ACTIVE AMFD can not be checked 
> point to AMFD on STANDBY SC.
> But the AMFND still process the message ack for that message then it 
> remove from queue.
> STANDBY SC takes ACTIVE and mismatch message id b/w AMFD and AMFND on 
> new ACTIVE. As consequence, clm track start can not invoked to update 
> cluster member nodes if these nodes was rebooted.
>
> In this case, amfnd need rebooting automatically to recovery it.
> ---
>   src/amf/amfnd/verify.cc | 15 ++-
>   1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/src/amf/amfnd/verify.cc b/src/amf/amfnd/verify.cc index 
> 5726ad9..ddb1d15 100644
> --- a/src/amf/amfnd/verify.cc
> +++ b/src/amf/amfnd/verify.cc
> @@ -116,12 +116,14 @@ uint32_t avnd_evt_avd_verify_evh(AVND_CB *cb,
AVND_EVT *evt) {
> avnd_diq_rec_del(cb, rec);
> continue;
>   } else {
> +  if ((rcv_id + 1) == (*((uint32_t *)(>msg.info.avd->msg_info)))
&&
> +  (msg_found == false)) {
[Thuan] I think not need check msg_found == false, since it's declared as
false.
> +msg_found = true;
> +  }
> avnd_diq_rec_send(cb, rec);
>   
> TRACE_1("AVND record %u sent, upon fail-over",
> *((uint32_t *)(>msg.info.avd->msg_info)));
> -
> -  msg_found = true;
>   }
>   ++iter;
> }
> @@ -129,9 +131,12 @@ uint32_t avnd_evt_avd_verify_evh(AVND_CB *cb,
AVND_EVT *evt) {
> if ((cb->snd_msg_id != info->rcv_id_cnt) && (msg_found == false)) {
>   /* Log error, seems to be some problem.*/
>   LOG_EM(
> -"AVND record not found, after failover, snd_msg_id = %u, receive
id = %u",
> -cb->snd_msg_id, info->rcv_id_cnt);
[Thuan] It's better to keep this log error message (more info).
> -return NCSCC_RC_FAILURE;
> +"AVND record not found for msg id = %u", (rcv_id + 1));
> +opensaf_reboot(
> +avnd_cb->node_info.nodeId,
> +
osaf_extended_name_borrow(_cb->node_info.executionEnvironment),
> +"AVND record not found, after failover");
> +exit(0);
> }
>   
> /*


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3

2019-05-03 Thread Tran Thuan
Hi Hans,

 

static uint32_t mdtm_sendto(uint8_t *buffer, uint16_t buff_len,

struct tipc_portid tipc_id);

static uint32_t mdtm_mcast_sendto(void *buffer, size_t size,

  const 
MDTM_SEND_REQ *req);

 

Before the fix, fragment package is sent via mdtm_sendto() which is designed 
(MDS design) to send to one destination.

After the fix, fragment package is sent via mdtm_mcast_sendto() which is 
designed (MDS design) to send to all destination.

 

Both functions are call sendto() of TIPC but just different parameters.

 

Best Regards,

ThuanTr

 

From: Hans Nordebäck  
Sent: Friday, May 3, 2019 3:50 PM
To: Thuan Tran ; Minh Hon Chau 
; Vu Minh Nguyen 
Cc: opensaf-devel@lists.sourceforge.net; Anders Widell 

Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

 

Hi Thuan,

a question, the old code uses unicast for the fragments, now multicast is used. 
But from the 'sendto' TIPC documentation:

"If the destination is a service range, the message is a multicast to all 
matching sockets."

so before when unicast was used for the fragments, TIPC  multicast the 
fragments? What problem do this patch

solves, can you clarify? /Thanks HansN 

On 2019-05-03 10:20, Tran Thuan wrote:

Hi Hans,
 
Yes, we try that kind of basic test, IMMD can deliver big message via multicast.
 
Best Regards,
ThuanTr
 
-Original Message-
From: Hans Nordebäck  <mailto:hans.nordeb...@ericsson.com> 
 
Sent: Friday, May 3, 2019 3:11 PM
To: Thuan Tran  <mailto:thuan.t...@dektech.com.au> ; 
Minh Hon Chau  <mailto:minh.c...@dektech.com.au> ; Vu 
Minh Nguyen  <mailto:vu.m.ngu...@dektech.com.au> 
Cc: opensaf-devel@lists.sourceforge.net 
<mailto:opensaf-devel@lists.sourceforge.net> 
Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3
 
Hi Thuan,
 
that sounds good, is this how you test now? When looking at the MDS code before 
this change it looks that
 
large multicast messages are fragmented and only sent to one receiver using 
unicast, but with this change the fragments
 
are multicasted to all receivers, which seems more correct. /Thanks HansN
 
On 2019-05-03 10:04, Tran Thuan wrote:

Hi Hans,
 
Current MDS apitest only binary execution on one node.
It is easier if create IMM test case to make IMMD send broadcast big message.
I think we can create new ticket for this additional test.
 
Best Regards,
ThuanTr
 
-Original Message-
From: Hans Nordebäck  <mailto:hans.nordeb...@ericsson.com> 

Sent: Friday, May 3, 2019 2:45 PM
To: Thuan Tran  <mailto:thuan.t...@dektech.com.au> ; 
Minh Hon Chau 
 <mailto:minh.c...@dektech.com.au> ; Vu Minh Nguyen 
 <mailto:vu.m.ngu...@dektech.com.au> 
Cc: opensaf-devel@lists.sourceforge.net 
<mailto:opensaf-devel@lists.sourceforge.net> 
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
fragmented messages [#3033] V3
 
Hi Thuan,
ok, if we can add additional tests to the mds api test suite would be 
good/Thanks HansN
 
-Original Message-
From: Tran Thuan  <mailto:thuan.t...@dektech.com.au> 
Sent: den 3 maj 2019 09:41
To: Hans Nordebäck  <mailto:hans.nordeb...@ericsson.com> 
; Minh Hon Chau 
 <mailto:minh.c...@dektech.com.au> ; Vu Minh Nguyen 
 <mailto:vu.m.ngu...@dektech.com.au> 
Cc: opensaf-devel@lists.sourceforge.net 
<mailto:opensaf-devel@lists.sourceforge.net> 
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
fragmented messages [#3033] V3
 
Hi Hans,
 
I don't see this kind of test in mds apitests.
 
Best Regards,
ThuanTr
 
-Original Message-
From: Hans Nordebäck  <mailto:hans.nordeb...@ericsson.com> 

Sent: Friday, May 3, 2019 2:31 PM
To: Thuan Tran  <mailto:thuan.t...@dektech.com.au> ; 
Minh Hon Chau 
 <mailto:minh.c...@dektech.com.au> ; Vu Minh Nguyen 
 <mailto:vu.m.ngu...@dektech.com.au> 
Cc: opensaf-devel@lists.sourceforge.net 
<mailto:opensaf-devel@lists.sourceforge.net> 
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
fragmented messages [#3033] V3
 
Hi Thuan,
I'm reviewing the patch now. I haven't checked yet but do you know if 
the mds apitests cover this case sending large multicast messages? 
/Thanks HansN
 
-Original Message-
From: Tran Thuan  <mailto:thuan.t...@dektech.com.au> 
Sent: den 2 maj 2019 05:56
To: Minh Hon Chau  <mailto:minh.c...@dektech.com.au> 
; Vu Minh Nguyen 
 <mailto:vu.m.ngu...@dektech.com.au> ; Hans 
Nordebäck 
 <mailto:hans.nordeb...@ericsson.com> 
Cc: opensaf-devel@lists.sourceforge.net 
<mailto:opensaf-devel@lists.sourceforge.net> 
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
fragmented messages [#3033] V3
 
Hi Hans,
 
Do you have any further comment?
Can we push the patch?
 
Bes

Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3

2019-05-03 Thread Tran Thuan
Hi Hans,

Yes, we try that kind of basic test, IMMD can deliver big message via multicast.

Best Regards,
ThuanTr

-Original Message-
From: Hans Nordebäck  
Sent: Friday, May 3, 2019 3:11 PM
To: Thuan Tran ; Minh Hon Chau 
; Vu Minh Nguyen 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi Thuan,

that sounds good, is this how you test now? When looking at the MDS code before 
this change it looks that

large multicast messages are fragmented and only sent to one receiver using 
unicast, but with this change the fragments

are multicasted to all receivers, which seems more correct. /Thanks HansN

On 2019-05-03 10:04, Tran Thuan wrote:
> Hi Hans,
>
> Current MDS apitest only binary execution on one node.
> It is easier if create IMM test case to make IMMD send broadcast big message.
> I think we can create new ticket for this additional test.
>
> Best Regards,
> ThuanTr
>
> -Original Message-
> From: Hans Nordebäck 
> Sent: Friday, May 3, 2019 2:45 PM
> To: Thuan Tran ; Minh Hon Chau 
> ; Vu Minh Nguyen 
> 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
> fragmented messages [#3033] V3
>
> Hi Thuan,
> ok, if we can add additional tests to the mds api test suite would be 
> good/Thanks HansN
>
> -Original Message-
> From: Tran Thuan 
> Sent: den 3 maj 2019 09:41
> To: Hans Nordebäck ; Minh Hon Chau 
> ; Vu Minh Nguyen 
> 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
> fragmented messages [#3033] V3
>
> Hi Hans,
>
> I don't see this kind of test in mds apitests.
>
> Best Regards,
> ThuanTr
>
> -Original Message-
> From: Hans Nordebäck 
> Sent: Friday, May 3, 2019 2:31 PM
> To: Thuan Tran ; Minh Hon Chau 
> ; Vu Minh Nguyen 
> 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
> fragmented messages [#3033] V3
>
> Hi Thuan,
> I'm reviewing the patch now. I haven't checked yet but do you know if 
> the mds apitests cover this case sending large multicast messages? 
> /Thanks HansN
>
> -Original Message-
> From: Tran Thuan 
> Sent: den 2 maj 2019 05:56
> To: Minh Hon Chau ; Vu Minh Nguyen 
> ; Hans Nordebäck 
> 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
> fragmented messages [#3033] V3
>
> Hi Hans,
>
> Do you have any further comment?
> Can we push the patch?
>
> Best Regards,
> ThuanTr
>
> -Original Message-
> From: Minh Hon Chau 
> Sent: Friday, April 26, 2019 4:11 PM
> To: Vu Minh Nguyen ; 'Hans Nordebäck' 
> ; 'Thuan Tran' 
> 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast 
> fragmented messages [#3033] V3
>
> Hi,
>
> ack from me (code review)
>
> Thanks
>
> Minh
>
> On 25/4/19 9:33 pm, Vu Minh Nguyen wrote:
>> Hi Hans,
>>
>> Probably you were looking at code that included this Thuan's patch.
>>
>> In legacy code, only mdtm_sendto() is called inside the function 
>> mdtm_frag_and_send().
>>
>> Regards, Vu
>>
>>> -Original Message-
>>> From: Hans Nordebäck 
>>> Sent: Thursday, April 25, 2019 6:10 PM
>>> To: Vu Minh Nguyen ; Thuan Tran 
>>> ; Minh Hon Chau 
>>> 
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
>>> fragmented messages [#3033] V3
>>>
>>>
>>> Hi Vu,
>>> It seems mdtm_mcast_sendto is used in mdtm_frag_and_send, at 
>>> MDS_SENDTYPE_BCAST/BR Hans -Original Message-
>>> From: Vu Minh Nguyen 
>>> Sent: den 25 april 2019 12:20
>>> To: Hans Nordebäck ; Thuan Tran 
>>> ; Minh Hon Chau 
>>> 
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
>>> fragmented messages [#3033] V3
>>>
>>> Hi Hans,
>>>
>>> See my responses inline.
>>>
>>> Regards, Vu
>>>
>>>> -Original Message-
>>>> From: Hans Nordebäck 
>>>> Sent: Thursday, April 25, 2019 4:28 PM
>>>> To: Thuan Tran ; Vu Minh Nguyen 
>>>> ; Minh Hon Chau
>>> 
>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>> Subject: Re: [PATCH 0/1] Review Request for mds: support multic

Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3

2019-05-03 Thread Tran Thuan
Hi Hans,

Current MDS apitest only binary execution on one node.
It is easier if create IMM test case to make IMMD send broadcast big message.
I think we can create new ticket for this additional test.

Best Regards,
ThuanTr

-Original Message-
From: Hans Nordebäck  
Sent: Friday, May 3, 2019 2:45 PM
To: Thuan Tran ; Minh Hon Chau 
; Vu Minh Nguyen 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi Thuan,
ok, if we can add additional tests to the mds api test suite would be 
good/Thanks HansN

-Original Message-
From: Tran Thuan 
Sent: den 3 maj 2019 09:41
To: Hans Nordebäck ; Minh Hon Chau 
; Vu Minh Nguyen 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi Hans,

I don't see this kind of test in mds apitests.

Best Regards,
ThuanTr

-Original Message-
From: Hans Nordebäck 
Sent: Friday, May 3, 2019 2:31 PM
To: Thuan Tran ; Minh Hon Chau 
; Vu Minh Nguyen 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi Thuan,
I'm reviewing the patch now. I haven't checked yet but do you know if the mds 
apitests cover this case sending large multicast messages? /Thanks HansN 

-Original Message-
From: Tran Thuan 
Sent: den 2 maj 2019 05:56
To: Minh Hon Chau ; Vu Minh Nguyen 
; Hans Nordebäck 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi Hans,

Do you have any further comment?
Can we push the patch?

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau 
Sent: Friday, April 26, 2019 4:11 PM
To: Vu Minh Nguyen ; 'Hans Nordebäck' 
; 'Thuan Tran' 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi,

ack from me (code review)

Thanks

Minh

On 25/4/19 9:33 pm, Vu Minh Nguyen wrote:
> Hi Hans,
>
> Probably you were looking at code that included this Thuan's patch.
>
> In legacy code, only mdtm_sendto() is called inside the function 
> mdtm_frag_and_send().
>
> Regards, Vu
>
>> -Original Message-
>> From: Hans Nordebäck 
>> Sent: Thursday, April 25, 2019 6:10 PM
>> To: Vu Minh Nguyen ; Thuan Tran 
>> ; Minh Hon Chau 
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
>> fragmented messages [#3033] V3
>>
>>
>> Hi Vu,
>> It seems mdtm_mcast_sendto is used in mdtm_frag_and_send, at 
>> MDS_SENDTYPE_BCAST/BR Hans -Original Message-
>> From: Vu Minh Nguyen 
>> Sent: den 25 april 2019 12:20
>> To: Hans Nordebäck ; Thuan Tran 
>> ; Minh Hon Chau 
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
>> fragmented messages [#3033] V3
>>
>> Hi Hans,
>>
>> See my responses inline.
>>
>> Regards, Vu
>>
>>> -Original Message-
>>> From: Hans Nordebäck 
>>> Sent: Thursday, April 25, 2019 4:28 PM
>>> To: Thuan Tran ; Vu Minh Nguyen 
>>> ; Minh Hon Chau
>> 
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast 
>>> fragmented messages [#3033] V3
>>>
>>> Hi Vu and Thuan,
>>>
>>> a few question, is the text in the ticket description correct? E.g 
>>> it says unicast is used if a multicast message is fragmented, (I 
>>> think multicast still is used
>>>
>>> to send the fragments), this is what you mean with 2 different channels?
>>> (only one socket is used, BSRsock),
>> [Vu] Yes. Unicast is used to send fragmented messages. Here is the 
>> current logic in case of sending a large package:
>> Iterate over destinations { // mcm_pvt_process_svc_bcast_common() @ 
>> mds_c_sndrcv.c
>>  1) Fragment the package // mdtm_frag_and_send() @ mds_dt_tipc.c
>>  2) Unicast to a specific adest  // mdtm_sendto() @
>> mds_dt_tipc.c
>>  4) Continue with next adest
>> }
>>
>>> The problem stated is sending one large multicast message and then 
>>> several smaller multicast messages, have you checked the
>>>
>>> fragment re-assembly part of the common code?
>> [Vu] Yes. At the receive side, if msg is fragmented, mds will not 
>> forward to upper layer until all fragmented msgs are collected.
>> If the message is not fragmented, mds will transfer the msg to upper 
>> right away.
&g

Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3

2019-05-03 Thread Tran Thuan
Hi Hans,

I don't see this kind of test in mds apitests.

Best Regards,
ThuanTr

-Original Message-
From: Hans Nordebäck  
Sent: Friday, May 3, 2019 2:31 PM
To: Thuan Tran ; Minh Hon Chau 
; Vu Minh Nguyen 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi Thuan,
I'm reviewing the patch now. I haven't checked yet but do you know if the mds 
apitests cover this case sending large multicast messages? /Thanks HansN 

-Original Message-
From: Tran Thuan 
Sent: den 2 maj 2019 05:56
To: Minh Hon Chau ; Vu Minh Nguyen 
; Hans Nordebäck 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi Hans,

Do you have any further comment?
Can we push the patch?

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau 
Sent: Friday, April 26, 2019 4:11 PM
To: Vu Minh Nguyen ; 'Hans Nordebäck' 
; 'Thuan Tran' 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi,

ack from me (code review)

Thanks

Minh

On 25/4/19 9:33 pm, Vu Minh Nguyen wrote:
> Hi Hans,
>
> Probably you were looking at code that included this Thuan's patch.
>
> In legacy code, only mdtm_sendto() is called inside the function 
> mdtm_frag_and_send().
>
> Regards, Vu
>
>> -Original Message-
>> From: Hans Nordebäck 
>> Sent: Thursday, April 25, 2019 6:10 PM
>> To: Vu Minh Nguyen ; Thuan Tran 
>> ; Minh Hon Chau 
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
>> fragmented messages [#3033] V3
>>
>>
>> Hi Vu,
>> It seems mdtm_mcast_sendto is used in mdtm_frag_and_send, at 
>> MDS_SENDTYPE_BCAST/BR Hans -Original Message-
>> From: Vu Minh Nguyen 
>> Sent: den 25 april 2019 12:20
>> To: Hans Nordebäck ; Thuan Tran 
>> ; Minh Hon Chau 
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
>> fragmented messages [#3033] V3
>>
>> Hi Hans,
>>
>> See my responses inline.
>>
>> Regards, Vu
>>
>>> -Original Message-
>>> From: Hans Nordebäck 
>>> Sent: Thursday, April 25, 2019 4:28 PM
>>> To: Thuan Tran ; Vu Minh Nguyen 
>>> ; Minh Hon Chau
>> 
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast 
>>> fragmented messages [#3033] V3
>>>
>>> Hi Vu and Thuan,
>>>
>>> a few question, is the text in the ticket description correct? E.g 
>>> it says unicast is used if a multicast message is fragmented, (I 
>>> think multicast still is used
>>>
>>> to send the fragments), this is what you mean with 2 different channels?
>>> (only one socket is used, BSRsock),
>> [Vu] Yes. Unicast is used to send fragmented messages. Here is the 
>> current logic in case of sending a large package:
>> Iterate over destinations { // mcm_pvt_process_svc_bcast_common() @ 
>> mds_c_sndrcv.c
>>  1) Fragment the package // mdtm_frag_and_send() @ mds_dt_tipc.c
>>  2) Unicast to a specific adest  // mdtm_sendto() @
>> mds_dt_tipc.c
>>  4) Continue with next adest
>> }
>>
>>> The problem stated is sending one large multicast message and then 
>>> several smaller multicast messages, have you checked the
>>>
>>> fragment re-assembly part of the common code?
>> [Vu] Yes. At the receive side, if msg is fragmented, mds will not 
>> forward to upper layer until all fragmented msgs are collected.
>> If the message is not fragmented, mds will transfer the msg to upper 
>> right away.
>>
>> I checked with TIPC guys here, and he said that TIPC does not 
>> guarantee the order if we send msgs in different channels (unicast vs mcast).
>>
>>> /BR Hans
>>>
>>>
>>> On 2019-04-24 13:06, thuan.tran wrote:
>>>> Summary: mds: support multicast fragmented messages [#3033] Review 
>>>> request for Ticket(s): 3033 Peer Reviewer(s): Hans, Minh, Vu Pull 
>>>> request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected
>>>> branch(es): develop Development branch: ticket-3033 Base revision:
>>>> 7916ac316e86478c621c8359cf2aca4886288a38
>>>> Personal repository: git://git.code.sf.net/u/thuantr/review
>>>>
>>>> 
>>>> Impact

Re: [devel] [PATCH 0/1] Review Request for mds: support multicast fragmented messages [#3033] V3

2019-05-01 Thread Tran Thuan
Hi Hans,

Do you have any further comment?
Can we push the patch?

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau  
Sent: Friday, April 26, 2019 4:11 PM
To: Vu Minh Nguyen ; 'Hans Nordebäck' 
; 'Thuan Tran' 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 0/1] Review Request for mds: support multicast fragmented 
messages [#3033] V3

Hi,

ack from me (code review)

Thanks

Minh

On 25/4/19 9:33 pm, Vu Minh Nguyen wrote:
> Hi Hans,
>
> Probably you were looking at code that included this Thuan's patch.
>
> In legacy code, only mdtm_sendto() is called inside the function 
> mdtm_frag_and_send().
>
> Regards, Vu
>
>> -Original Message-
>> From: Hans Nordebäck 
>> Sent: Thursday, April 25, 2019 6:10 PM
>> To: Vu Minh Nguyen ; Thuan Tran 
>> ; Minh Hon Chau 
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
>> fragmented messages [#3033] V3
>>
>>
>> Hi Vu,
>> It seems mdtm_mcast_sendto is used in mdtm_frag_and_send, at 
>> MDS_SENDTYPE_BCAST/BR Hans -Original Message-
>> From: Vu Minh Nguyen 
>> Sent: den 25 april 2019 12:20
>> To: Hans Nordebäck ; Thuan Tran 
>> ; Minh Hon Chau 
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: RE: [PATCH 0/1] Review Request for mds: support multicast 
>> fragmented messages [#3033] V3
>>
>> Hi Hans,
>>
>> See my responses inline.
>>
>> Regards, Vu
>>
>>> -Original Message-
>>> From: Hans Nordebäck 
>>> Sent: Thursday, April 25, 2019 4:28 PM
>>> To: Thuan Tran ; Vu Minh Nguyen 
>>> ; Minh Hon Chau
>> 
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: Re: [PATCH 0/1] Review Request for mds: support multicast 
>>> fragmented messages [#3033] V3
>>>
>>> Hi Vu and Thuan,
>>>
>>> a few question, is the text in the ticket description correct? E.g 
>>> it says unicast is used if a multicast message is fragmented, (I 
>>> think multicast still is used
>>>
>>> to send the fragments), this is what you mean with 2 different channels?
>>> (only one socket is used, BSRsock),
>> [Vu] Yes. Unicast is used to send fragmented messages. Here is the 
>> current logic in case of sending a large package:
>> Iterate over destinations { // mcm_pvt_process_svc_bcast_common() @ 
>> mds_c_sndrcv.c
>>  1) Fragment the package // mdtm_frag_and_send() @ mds_dt_tipc.c
>>  2) Unicast to a specific adest  // mdtm_sendto() @
>> mds_dt_tipc.c
>>  4) Continue with next adest
>> }
>>
>>> The problem stated is sending one large multicast message and then 
>>> several smaller multicast messages, have you checked the
>>>
>>> fragment re-assembly part of the common code?
>> [Vu] Yes. At the receive side, if msg is fragmented, mds will not 
>> forward to upper layer until all fragmented msgs are collected.
>> If the message is not fragmented, mds will transfer the msg to upper 
>> right away.
>>
>> I checked with TIPC guys here, and he said that TIPC does not 
>> guarantee the order if we send msgs in different channels (unicast vs mcast).
>>
>>> /BR Hans
>>>
>>>
>>> On 2019-04-24 13:06, thuan.tran wrote:
 Summary: mds: support multicast fragmented messages [#3033] Review 
 request for Ticket(s): 3033 Peer Reviewer(s): Hans, Minh, Vu Pull 
 request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected
 branch(es): develop Development branch: ticket-3033 Base revision:
 7916ac316e86478c621c8359cf2aca4886288a38
 Personal repository: git://git.code.sf.net/u/thuantr/review

 
 Impacted area   Impact y/n
 
Docsn
Build systemn
RPM/packaging   n
Configuration files n
Startup scripts n
SAF servicesy
OpenSAF servicesn
Core libraries  n
Samples n
Tests   n
Other   n

 NOTE: Patch(es) contain lines longer than 80 characers

 Comments (indicate scope for each "y" above):
 -
 N/A

 revision 568f09774f936506f5e05e03813fa572af0fe0d3
 Author:thuan.tran 
 Date:  Wed, 24 Apr 2019 17:54:25 +0700

 mds: support multicast fragmented messages [#3033]

 - Sender may send broadcast big messages (> 65K) then small 
 messages (<
>>> 65K).
 Current MDS just loop via all destinations to unicast all 
 fragmented
>>> messages
 to one by one destinations. But sending multicast non-fragment 
 messages
>>> to all
 destinations. Therefor, receivers may get messages with incorrect 
 order, non-fragment messages may come before fragmented messages.
 For example, it may lead to OUT OF ORDER for IMMNDs during IMMD
>> sync.
 - Solution: support send multicast each fragmented messages to 
 avoid disorder of arrived broadcast messages.


Re: [devel] [PATCH 1/1] mds: support multicast fragmented messages [#3033]

2019-04-25 Thread Tran Thuan
Hi bro.Vu,

Thanks for comments.
Seems they are not really matter change or not, can we keep current patch?
Then we don't need push another PS for adaption patch in our repo.

Best Regards,
ThuanTr

-Original Message-
From: Vu Minh Nguyen  
Sent: Thursday, April 25, 2019 1:51 PM
To: 'thuan.tran' ; hans.nordeb...@ericsson.com;
'Minh Hon Chau' 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] mds: support multicast fragmented messages [#3033]

Hi Thuan,

Ack with minor comments below.

Regards, Vu

> -Original Message-
> From: thuan.tran 
> Sent: Wednesday, April 24, 2019 6:06 PM
> To: 'Vu Minh Nguyen' ; 
> hans.nordeb...@ericsson.com; Minh Hon Chau 
> Cc: opensaf-devel@lists.sourceforge.net; thuan.tran 
> 
> Subject: [PATCH 1/1] mds: support multicast fragmented messages 
> [#3033]
> 
> - Sender may send broadcast big messages (> 65K) then small messages 
> (< 65K).
> Current MDS just loop via all destinations to unicast all fragmented
messages
> to one by one destinations. But sending multicast non-fragment 
> messages to all destinations. Therefor, receivers may get messages 
> with incorrect order, non-fragment messages may come before fragmented 
> messages.
> For example, it may lead to OUT OF ORDER for IMMNDs during IMMD sync.
> - Solution: support send multicast each fragmented messages to avoid 
> disorder of arrived broadcast messages.
> ---
>  src/mds/mds_c_sndrcv.c |   3 +-
>  src/mds/mds_dt_tipc.c  | 104
+++
> --
>  2 files changed, 40 insertions(+), 67 deletions(-)
> 
> diff --git a/src/mds/mds_c_sndrcv.c b/src/mds/mds_c_sndrcv.c index 
> 703bc8e..7850ac7 100644
> --- a/src/mds/mds_c_sndrcv.c
> +++ b/src/mds/mds_c_sndrcv.c
> @@ -4496,8 +4496,7 @@ static uint32_t
> mcm_pvt_process_svc_bcast_common(
> info_result->key.vdest_id,
req, 0,
> info_result->key.adest, pri);
>   if ((svc_cb->subtn_info->prev_ver_sub_count == 0) &&
> - (tipc_mode_enabled) && (tipc_mcast_enabled) &&
> - (to_msg.bcast_buff_len < MDS_DIRECT_BUF_MAXSIZE)) {
> + (tipc_mode_enabled) && (tipc_mcast_enabled)) {
>   m_MDS_LOG_DBG(
>   "MDTM: Break while(1) prev_ver_sub_count: %d
svc_id =%s(%d)  
> to_msg.bcast_buff_len: %d ",
>   svc_cb->subtn_info->prev_ver_sub_count,
> diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c index 
> a3abff5..d8f8c78 100644
> --- a/src/mds/mds_dt_tipc.c
> +++ b/src/mds/mds_dt_tipc.c
> @@ -2856,6 +2856,7 @@ uint32_t mdtm_frag_and_send(MDTM_SEND_REQ *req, 
> uint32_t seq_num,
>   uint16_t frag_val = 0;
>   uint32_t sum_mds_hdr_plus_mdtm_hdr_plus_len;
>   int version = req->msg_arch_word & 0x7;
> + uint32_t ret = NCSCC_RC_SUCCESS;
> 
>   if (version > 1) {
>   sum_mds_hdr_plus_mdtm_hdr_plus_len = @@ -2914,95 +2915,66 @@

> uint32_t mdtm_frag_and_send(MDTM_SEND_REQ *req, uint32_t seq_num,
>   frag_val = NO_FRAG_BIT | i;
>   }
>   {
> + uint32_t hdr_plus = (i == 1) ?
> + sum_mds_hdr_plus_mdtm_hdr_plus_len :
> MDTM_FRAG_HDR_PLUS_LEN_2;
>   uint8_t *body = NULL;
>   body = calloc(1, len_buf);
> + p8 = (uint8_t *)m_MMGR_DATA_AT_START(usrbuf,
> len_buf - hdr_plus,
> + (char *)(body + hdr_plus));
> + if (p8 != (body + hdr_plus))
> + memcpy((body + hdr_plus), p8, len_buf -
> hdr_plus);
>   if (i == 1) {
[Vu] is it better if we combine these 02 if into one, as:
If ((i === 1) && NCSCC_RC_SUCCESS != mdtm_add_mds_hdr(body, req)) {
// error handling.
}
[Thuan] I think it is not really a matter, it's same.
> - p8 = (uint8_t *)m_MMGR_DATA_AT_START(
> - usrbuf,
> - (len_buf -
> -  sum_mds_hdr_plus_mdtm_hdr_plus_len),
> - (char
> -  *)(body +
> -
> sum_mds_hdr_plus_mdtm_hdr_plus_len));
> -
> - if (p8 !=
> - (body +
> sum_mds_hdr_plus_mdtm_hdr_plus_len))
> - memcpy(
> - (body +
> -
> sum_mds_hdr_plus_mdtm_hdr_plus_len),
> - p8,
> - (len_buf -
> -
> sum_mds_hdr_plus_mdtm_hdr_plus_len));
> -
>   if (NCSCC_RC_SUCCESS !=
>   mdtm_add_mds_hdr(body, req)) {
>   m_MDS_LOG_ERR(
>   "MDTM: frg MDS hdr addition
> failed\n");
> - 

Re: [devel] [PATCH 1/1] mds: support multicast fragmented messages [#3033]

2019-04-24 Thread Tran Thuan
Hi bro.Vu,

OK, you are right.
It should call m_MMGR_FREE_BUFR_LIST(usrbuf); before break
I will send out V2.

Best Regards,
ThuanTr

-Original Message-
From: Vu Minh Nguyen  
Sent: Wednesday, April 24, 2019 2:30 PM
To: 'Tran Thuan' ; hans.nordeb...@ericsson.com;
'Minh Hon Chau' 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] mds: support multicast fragmented messages [#3033]

Hi Thuan,

m_MMGR_REMOVE_FROM_START() is just removed n bytes from the start, not whole
memory chain.

You can refer to descriptions of these macros to see the differences b/w
them.

Regards, Vu

> -Original Message-
> From: Tran Thuan 
> Sent: Wednesday, April 24, 2019 2:17 PM
> To: 'Vu Minh Nguyen' ; 
> hans.nordeb...@ericsson.com; 'Minh Hon Chau'  c...@users.sourceforge.net>
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 1/1] mds: support multicast fragmented messages 
> [#3033]
> 
> Hi bro.Vu,
> 
> I think it won't cause memleak since I break after
> m_MMGR_REMOVE_FROM_START()
> free(body)
> 
> Best Regards,
> ThuanTr
> 
> -Original Message-
> From: Vu Minh Nguyen 
> Sent: Wednesday, April 24, 2019 1:45 PM
> To: 'thuan.tran' ; 
> hans.nordeb...@ericsson.com; 'Minh Hon Chau' 
> 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 1/1] mds: support multicast fragmented messages 
> [#3033]
> 
> Hi Thuan,
> 
> Tested broadcasting with large packages (each 1M).
> 
> Ack with below comments.
> 
> Regards, Vu
> 
> > -Original Message-
> > From: thuan.tran 
> > Sent: Wednesday, April 24, 2019 12:00 PM
> > To: hans.nordeb...@ericsson.com; Vu Minh Nguyen 
> > ; Minh Hon Chau  > c...@users.sourceforge.net>
> > Cc: opensaf-devel@lists.sourceforge.net; thuan.tran 
> > 
> > Subject: [PATCH 1/1] mds: support multicast fragmented messages 
> > [#3033]
> >
> > - Sender may send broadcast big messages (> 65K) then small messages 
> > (< 65K).
> > Current MDS just loop via all destinations to unicast all fragmented
> messages
> > to one by one destinations. But sending multicast non-fragment 
> > messages to all destinations. Therefor, receivers may get messages 
> > with incorrect order, non-fragment messages may come before
> fragmented
> > messages.
> > For example, it may lead to OUT OF ORDER for IMMNDs during IMMD sync.
> > - Solution: support send multicast each fragmented messages to avoid 
> > disorder of arrived broadcast messages.
> > ---
> >  src/mds/mds_c_sndrcv.c |  3 +--
> >  src/mds/mds_dt_tipc.c  | 55
> > --
> >  2 files changed, 45 insertions(+), 13 deletions(-)
> >
> > diff --git a/src/mds/mds_c_sndrcv.c b/src/mds/mds_c_sndrcv.c index
> > 703bc8e..7850ac7 100644
> > --- a/src/mds/mds_c_sndrcv.c
> > +++ b/src/mds/mds_c_sndrcv.c
> > @@ -4496,8 +4496,7 @@ static uint32_t 
> > mcm_pvt_process_svc_bcast_common(
> >   info_result->key.vdest_id,
> req, 0,
> >   info_result->key.adest, pri);
> > if ((svc_cb->subtn_info->prev_ver_sub_count == 0) &&
> > -   (tipc_mode_enabled) && (tipc_mcast_enabled) &&
> > -   (to_msg.bcast_buff_len < MDS_DIRECT_BUF_MAXSIZE)) {
> > +   (tipc_mode_enabled) && (tipc_mcast_enabled)) {
> > m_MDS_LOG_DBG(
> > "MDTM: Break while(1) prev_ver_sub_count: %d
> svc_id =%s(%d)
> > to_msg.bcast_buff_len: %d ",
> > svc_cb->subtn_info->prev_ver_sub_count,
> > diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c index 
> > a3abff5..656f79c 100644
> > --- a/src/mds/mds_dt_tipc.c
> > +++ b/src/mds/mds_dt_tipc.c
> > @@ -2856,6 +2856,7 @@ uint32_t
> mdtm_frag_and_send(MDTM_SEND_REQ *req,
> > uint32_t seq_num,
> > uint16_t frag_val = 0;
> > uint32_t sum_mds_hdr_plus_mdtm_hdr_plus_len;
> > int version = req->msg_arch_word & 0x7;
> > +   uint32_t ret = NCSCC_RC_SUCCESS;
> >
> > if (version > 1) {
> > sum_mds_hdr_plus_mdtm_hdr_plus_len = @@ -2952,11
> +2953,23 @@
> 
> > uint32_t mdtm_frag_and_send(MDTM_SEND_REQ *req, uint32_t seq_num,
> > free(body);
> > return NCSCC_RC_FAILURE;
> > }
> > -   m_MDS_LOG_DBG(
> > -   "MDTM:Sending me

Re: [devel] [PATCH 1/1] mds: support multicast fragmented messages [#3033]

2019-04-24 Thread Tran Thuan
Hi bro.Vu,

I think it won't cause memleak since I break after
m_MMGR_REMOVE_FROM_START()
free(body)

Best Regards,
ThuanTr

-Original Message-
From: Vu Minh Nguyen  
Sent: Wednesday, April 24, 2019 1:45 PM
To: 'thuan.tran' ; hans.nordeb...@ericsson.com;
'Minh Hon Chau' 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] mds: support multicast fragmented messages [#3033]

Hi Thuan,

Tested broadcasting with large packages (each 1M). 

Ack with below comments.

Regards, Vu

> -Original Message-
> From: thuan.tran 
> Sent: Wednesday, April 24, 2019 12:00 PM
> To: hans.nordeb...@ericsson.com; Vu Minh Nguyen 
> ; Minh Hon Chau  c...@users.sourceforge.net>
> Cc: opensaf-devel@lists.sourceforge.net; thuan.tran 
> 
> Subject: [PATCH 1/1] mds: support multicast fragmented messages 
> [#3033]
> 
> - Sender may send broadcast big messages (> 65K) then small messages 
> (< 65K).
> Current MDS just loop via all destinations to unicast all fragmented
messages
> to one by one destinations. But sending multicast non-fragment 
> messages to all destinations. Therefor, receivers may get messages 
> with incorrect order, non-fragment messages may come before fragmented 
> messages.
> For example, it may lead to OUT OF ORDER for IMMNDs during IMMD sync.
> - Solution: support send multicast each fragmented messages to avoid 
> disorder of arrived broadcast messages.
> ---
>  src/mds/mds_c_sndrcv.c |  3 +--
>  src/mds/mds_dt_tipc.c  | 55
> --
>  2 files changed, 45 insertions(+), 13 deletions(-)
> 
> diff --git a/src/mds/mds_c_sndrcv.c b/src/mds/mds_c_sndrcv.c index 
> 703bc8e..7850ac7 100644
> --- a/src/mds/mds_c_sndrcv.c
> +++ b/src/mds/mds_c_sndrcv.c
> @@ -4496,8 +4496,7 @@ static uint32_t
> mcm_pvt_process_svc_bcast_common(
> info_result->key.vdest_id,
req, 0,
> info_result->key.adest, pri);
>   if ((svc_cb->subtn_info->prev_ver_sub_count == 0) &&
> - (tipc_mode_enabled) && (tipc_mcast_enabled) &&
> - (to_msg.bcast_buff_len < MDS_DIRECT_BUF_MAXSIZE)) {
> + (tipc_mode_enabled) && (tipc_mcast_enabled)) {
>   m_MDS_LOG_DBG(
>   "MDTM: Break while(1) prev_ver_sub_count: %d
svc_id =%s(%d)  
> to_msg.bcast_buff_len: %d ",
>   svc_cb->subtn_info->prev_ver_sub_count,
> diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c index 
> a3abff5..656f79c 100644
> --- a/src/mds/mds_dt_tipc.c
> +++ b/src/mds/mds_dt_tipc.c
> @@ -2856,6 +2856,7 @@ uint32_t mdtm_frag_and_send(MDTM_SEND_REQ *req, 
> uint32_t seq_num,
>   uint16_t frag_val = 0;
>   uint32_t sum_mds_hdr_plus_mdtm_hdr_plus_len;
>   int version = req->msg_arch_word & 0x7;
> + uint32_t ret = NCSCC_RC_SUCCESS;
> 
>   if (version > 1) {
>   sum_mds_hdr_plus_mdtm_hdr_plus_len = @@ -2952,11 +2953,23 @@

> uint32_t mdtm_frag_and_send(MDTM_SEND_REQ *req, uint32_t seq_num,
>   free(body);
>   return NCSCC_RC_FAILURE;
>   }
> - m_MDS_LOG_DBG(
> - "MDTM:Sending message with Service
> Seqno=%d, Fragment Seqnum=%d, frag_num=%d, TO 
> Dest_Tipc_id=<0x%08x:%u>",
> - req->svc_seq_num, seq_num, frag_val,
> - id.node, id.ref);
> - mdtm_sendto(body, len_buf, id);
> + if (((req->snd_type == MDS_SENDTYPE_RBCAST)
> ||
> +  (req->snd_type == MDS_SENDTYPE_BCAST))
> &&
> + (version > 0) && (tipc_mcast_enabled)) {
> + m_MDS_LOG_DBG(
> + "MDTM:Send Multicast message
with
> Service Seqno=%d, Fragment Seqnum=%d, frag_num=%d "
> + "From svc_id = %s(%d) TO svc_id
=
> %s(%d)",
> + req->svc_seq_num, seq_num,
> frag_val,
> + get_svc_names(req->src_svc_id),
req-
> >src_svc_id,
> + get_svc_names(req->dest_svc_id),
> req->dest_svc_id);
> + ret = mdtm_mcast_sendto(body,
> len_buf, req);
> + } else {
> + m_MDS_LOG_DBG(
> + "MDTM:Sending message with
Service
> Seqno=%d, Fragment Seqnum=%d, frag_num=%d, TO 
> Dest_Tipc_id=<0x%08x:%u>",
> + req->svc_seq_num, seq_num,
> frag_val,
> + id.node, id.ref);
> + ret = mdtm_sendto(body, len_buf,
id);
> + }

Re: [devel] [PATCH 1/1] amf: fix no active assignment even one in-service SU can be assigned [#3020]

2019-04-07 Thread Tran Thuan
Hi Thang,

Thanks for your comment.
Yes, the scenario is not recommended in document, but every step is allowed
by AMF.
And I think the code can be updated for better. That's why I create ticket
as enhancement, not defect.
Let see what is AMF experts' opinion.

Best Regards,
ThuanTr

-Original Message-
From: Thang Nguyen  
Sent: Monday, April 8, 2019 10:21 AM
To: 'thuan.tran' ; gary@dektech.com.au;
minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [devel] [PATCH 1/1] amf: fix no active assignment even one
in-service SU can be assigned [#3020]

Hi Thuan,

Currently from AMF Release 5.2 Programmer's Reference (Nov 2018 ). There
already two use cases that support adding new component/csi to SG (e.i,
dynamic configurations change).
You can refer section 7.1.1.1 and 7.1.1.2 for more info.

So I'm not sure you new use case is valid or not. Or if it's valid, document
need to updated too.


B.R
/Thang

-Original Message-
From: thuan.tran 
Sent: Monday, March 18, 2019 3:05 PM
To: gary@dektech.com.au; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] amf: fix no active assignment even one
in-service SU can be assigned [#3020]

AMFD should try assign SI active for other in-service SUs if fail to assign
for current in-service SU
---
 src/amf/amfd/sg_2n_fsm.cc | 75
+--
 1 file changed, 46 insertions(+), 29 deletions(-)

diff --git a/src/amf/amfd/sg_2n_fsm.cc b/src/amf/amfd/sg_2n_fsm.cc index
91ffc63..ba0f72e 100644
--- a/src/amf/amfd/sg_2n_fsm.cc
+++ b/src/amf/amfd/sg_2n_fsm.cc
@@ -630,6 +630,43 @@ done:
 }
 
 
/***
**
+ * Function: avd_sg_2n_assign_si
+ *
+ * Purpose:  This function choose and assign SIs in the SG that dont have
+ *   active assignment.
+ *
+ * Input: cb - the AVD control block
+ *sg - The pointer to the service group.
+ *su - The pointer to the service unit to be assigned ACTIVE.
+ *
+ * Returns: True if assign succeed, otherwise return false
+ *
+ 
+***
+***/ static bool avd_sg_2n_assign_si(AVD_CL_CB *cb, AVD_SG *sg, AVD_SU
+*su) {
+  bool l_flag = false;
+  AVD_SU_SI_REL *tmp_susi;
+  /* choose and assign SIs in the SG that dont have active assignment 
+*/
+  for (const auto _si : sg->list_of_si) {
+if ((i_si->saAmfSIAdminState == SA_AMF_ADMIN_UNLOCKED) &&
+(i_si->list_of_csi != nullptr) &&
+(i_si->si_dep_state != AVD_SI_SPONSOR_UNASSIGNED) &&
+(i_si->si_dep_state != AVD_SI_UNASSIGNING_DUE_TO_DEP) &&
+(i_si->si_dep_state != AVD_SI_READY_TO_UNASSIGN) &&
+(i_si->list_of_sisu == AVD_SU_SI_REL_NULL) &&
+(su->saAmfSUNumCurrActiveSIs < sg->saAmfSGMaxActiveSIsperSU)) {
+  /* found a SI that needs active assignment. */
+  if (avd_new_assgn_susi(cb, su, i_si, SA_AMF_HA_ACTIVE, false,
+ _susi) == NCSCC_RC_SUCCESS) {
+l_flag = true;
+  } else {
+LOG_ER("%s:%u: %s", __FILE__, __LINE__, i_si->name.c_str());
+  }
+}
+  }
+  return l_flag;
+}
+
+/**
+***
  * Function: avd_sg_2n_su_chose_asgn
  *
  * Purpose:  This function will identify the current active SU.
@@ -675,7 +712,10 @@ static AVD_SU *avd_sg_2n_su_chose_asgn(AVD_CL_CB *cb,
AVD_SG *sg) {
 for (const auto  : sg->list_of_su) {
   if (iter->saAmfSuReadinessState == SA_AMF_READINESS_IN_SERVICE) {
 a_su = iter;
-break;
+l_flag = avd_sg_2n_assign_si(cb, sg, a_su);
+if (l_flag == true) {
+  break;
+}
   }
 }
 
@@ -683,36 +723,13 @@ static AVD_SU *avd_sg_2n_su_chose_asgn(AVD_CL_CB *cb,
AVD_SG *sg) {
   TRACE("No in service SUs available in the SG");
   goto done;
 }
-  } else { /* if (a_susi == AVD_SU_SI_REL_NULL) */
-
+  } else { /* if (a_susi != AVD_SU_SI_REL_NULL) */
 a_su = a_susi->su;
-  }
-
-  if (a_su->saAmfSuReadinessState != SA_AMF_READINESS_IN_SERVICE) {
-TRACE("The current active SU is OOS so return");
-goto done;
-  }
-
-  /* check if any more active SIs can be assigned to this SU */
-  l_flag = false;
-
-  /* choose and assign SIs in the SG that dont have active assignment */
-  for (const auto _si : sg->list_of_si) {
-if ((i_si->saAmfSIAdminState == SA_AMF_ADMIN_UNLOCKED) &&
-(i_si->list_of_csi != nullptr) &&
-(i_si->si_dep_state != AVD_SI_SPONSOR_UNASSIGNED) &&
-(i_si->si_dep_state != AVD_SI_UNASSIGNING_DUE_TO_DEP) &&
-(i_si->si_dep_state != AVD_SI_READY_TO_UNASSIGN) &&
-(i_si->list_of_sisu == AVD_SU_SI_REL_NULL) &&
-(a_su->saAmfSUNumCurrActiveSIs < sg->saAmfSGMaxActiveSIsperSU)) {
-  /* found a SI that needs active assignment. */
-  if (avd_new_assgn_susi(cb, a_su, i_si, SA_AMF_HA_ACTIVE, false,
-  

Re: [devel] [PATCH 1/1] amf: disallow delete csi that another csi is dependent on [#2969]

2019-03-31 Thread Tran Thuan
Hi Thang,

Correct my comment.
Your fix may allow "delete sponsor CSI + modify (not delete) dependent CSI
in same CCB".
In the end, dependent CSI will miss sponsor CSI and crash still happen.

Best Regards,
ThuanTr

-Original Message-----
From: Tran Thuan  
Sent: Friday, March 29, 2019 6:19 PM
To: 'thang.d.nguyen' ;
gary@dektech.com.au; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] amf: disallow delete csi that another csi
is dependent on [#2969]

Hi Thang,

I mean one CCB with delete both sponsor and dependent CSIs.
Then CCB should be OK, not rejected.

Best Regards,
ThuanTr


Hi Thuan,

I'm not catch up your point. The latest patch I sent is on  2/14/2019.
That solution allow if sponsor csi and dependent csi removed in same CCB.
And the code is in csi_ccb_completed_delete_hdlr(), e.i, delete operation
type.

B.R
/Thang


-Original Message-
From: thang.d.nguyen 
Sent: Thursday, February 14, 2019 4:33 PM
To: gary@dektech.com.au; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] amf: disallow delete csi that another csi is
dependent on [#2969]

Disallow delete a csi that another csi is dependent on. If allowing it
caused AMFD crashed or fallen in cyclic restart.
Allow if csi(s) are deleted in same ccb.
---
 src/amf/amfd/csi.cc | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/src/amf/amfd/csi.cc b/src/amf/amfd/csi.cc index
8282956..c53e62f 100644
--- a/src/amf/amfd/csi.cc
+++ b/src/amf/amfd/csi.cc
@@ -783,6 +783,7 @@ static SaAisErrorT csi_ccb_completed_delete_hdlr(
 CcbUtilOperationData_t *opdata) {
   SaAisErrorT rc = SA_AIS_ERR_BAD_OPERATION;
   AVD_CSI *csi;
+  AVD_CSI *t_csi;
   AVD_SU_SI_REL *t_sisu;
   const std::string object_name(Amf::to_string(>objectName));
 
@@ -841,6 +842,28 @@ static SaAisErrorT csi_ccb_completed_delete_hdlr(
 }
 t_sisu = t_sisu->si_next;
   } /*  while(t_sisu) */
+
+  // not allow delete if there are csi(s) dependency
+  t_csi = csi->si->list_of_csi;
+  while (t_csi) {
+AVD_CSI_DEPS *csidep;
+/* Verify that there are no CSI dependencies to this CSI  */
+for (csidep = t_csi->saAmfCSIDependencies; csidep != nullptr;
+ csidep = csidep->csi_dep_next) {
+  if (csidep->csi_dep_name_value == csi->name) {
+SaNameT csidepDn;
+osaf_extended_name_lend(t_csi->name.c_str(), );
+if (ccbutil_getCcbOpDataByDN(opdata->ccbId, ) ==
nullptr) {
[Thuan] in case of NOT null, we should check operation type is NOT "delete"
to reject this CCB.
Because not null can be one of "create/modify/delete" (not sure delete).
+  report_ccb_validation_error(
+  opdata, "csi '%s' depends on  '%s'",
+  t_csi->name.c_str(), csi->name.c_str());
+  rc = SA_AIS_ERR_BAD_OPERATION;
+  goto done;
+}
+  }
+}
+t_csi = t_csi->si_list_of_csi_next;
+  } /*  while(t_csi) */
 }
   } else {
 if (csi->list_compcsi != nullptr) {
--
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amf: disallow delete csi that another csi is dependent on [#2969]

2019-03-29 Thread Tran Thuan
Hi Thang,

I mean one CCB with delete both sponsor and dependent CSIs.
Then CCB should be OK, not rejected.

Best Regards,
ThuanTr


Hi Thuan,

I'm not catch up your point. The latest patch I sent is on  2/14/2019.
That solution allow if sponsor csi and dependent csi removed in same CCB.
And the code is in csi_ccb_completed_delete_hdlr(), e.i, delete operation
type.

B.R
/Thang


-Original Message-
From: thang.d.nguyen  
Sent: Thursday, February 14, 2019 4:33 PM
To: gary@dektech.com.au; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] amf: disallow delete csi that another csi is
dependent on [#2969]

Disallow delete a csi that another csi is dependent on. If allowing it
caused AMFD crashed or fallen in cyclic restart.
Allow if csi(s) are deleted in same ccb.
---
 src/amf/amfd/csi.cc | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/src/amf/amfd/csi.cc b/src/amf/amfd/csi.cc index
8282956..c53e62f 100644
--- a/src/amf/amfd/csi.cc
+++ b/src/amf/amfd/csi.cc
@@ -783,6 +783,7 @@ static SaAisErrorT csi_ccb_completed_delete_hdlr(
 CcbUtilOperationData_t *opdata) {
   SaAisErrorT rc = SA_AIS_ERR_BAD_OPERATION;
   AVD_CSI *csi;
+  AVD_CSI *t_csi;
   AVD_SU_SI_REL *t_sisu;
   const std::string object_name(Amf::to_string(>objectName));
 
@@ -841,6 +842,28 @@ static SaAisErrorT csi_ccb_completed_delete_hdlr(
 }
 t_sisu = t_sisu->si_next;
   } /*  while(t_sisu) */
+
+  // not allow delete if there are csi(s) dependency
+  t_csi = csi->si->list_of_csi;
+  while (t_csi) {
+AVD_CSI_DEPS *csidep;
+/* Verify that there are no CSI dependencies to this CSI  */
+for (csidep = t_csi->saAmfCSIDependencies; csidep != nullptr;
+ csidep = csidep->csi_dep_next) {
+  if (csidep->csi_dep_name_value == csi->name) {
+SaNameT csidepDn;
+osaf_extended_name_lend(t_csi->name.c_str(), );
+if (ccbutil_getCcbOpDataByDN(opdata->ccbId, ) ==
nullptr) {
[Thuan] in case of NOT null, we should check operation type is NOT "delete"
to reject this CCB. Because if both CSIs delete in same CCB, it should be
OK.
+  report_ccb_validation_error(
+  opdata, "csi '%s' depends on  '%s'",
+  t_csi->name.c_str(), csi->name.c_str());
+  rc = SA_AIS_ERR_BAD_OPERATION;
+  goto done;
+}
+  }
+}
+t_csi = t_csi->si_list_of_csi_next;
+  } /*  while(t_csi) */
 }
   } else {
 if (csi->list_compcsi != nullptr) {
--
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amf: disallow delete csi that another csi is dependent on [#2969]

2019-03-27 Thread Tran Thuan
Hi Thang,

See my comment inline.

Best Regards,
ThuanTr

-Original Message-
From: thang.nguyen  
Sent: Thursday, December 6, 2018 2:36 AM
To: gary@dektech.com.au; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] amf: disallow delete csi that another csi is
dependent on [#2969]

Disallow delete a csi that another csi is dependent on. If allowing it
caused AMFD crashed or fallen in cyclic restart.
Allow if csi(s) are deleted in same ccb.
---
 src/amf/amfd/csi.cc | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/src/amf/amfd/csi.cc b/src/amf/amfd/csi.cc index
8282956..0a6c964 100644
--- a/src/amf/amfd/csi.cc
+++ b/src/amf/amfd/csi.cc
@@ -783,6 +783,7 @@ static SaAisErrorT csi_ccb_completed_delete_hdlr(
 CcbUtilOperationData_t *opdata) {
   SaAisErrorT rc = SA_AIS_ERR_BAD_OPERATION;
   AVD_CSI *csi;
+  AVD_CSI *t_csi;
   AVD_SU_SI_REL *t_sisu;
   const std::string object_name(Amf::to_string(>objectName));
 
@@ -851,6 +852,28 @@ static SaAisErrorT csi_ccb_completed_delete_hdlr(
 }
   }
 
+  t_csi = csi->si->list_of_csi;
+
+  while (t_csi) {
+AVD_CSI_DEPS *csidep;
+/* Verify that there are no CSI dependencies to this CSI  */
+for (csidep = t_csi->saAmfCSIDependencies; csidep != nullptr;
+ csidep = csidep->csi_dep_next) {
+  if (csidep->csi_dep_name_value.compare(csi->name) == 0) {
+SaNameT csidepDn;
+osaf_extended_name_lend(csidep->csi_dep_name_value.c_str(),
);
+if (ccbutil_getCcbOpDataByDN(opdata->ccbId, ) == nullptr)
{
[Thuan] in case of NOT null, we should check operation type is NOT "delete"
to reject this CCB. Because if both CSIs delete in same CCB, it should be
OK.
+  report_ccb_validation_error(
+  opdata, "csi '%s' depends on  '%s'",
+  t_csi->name.c_str(), csi->name.c_str());
+  rc = SA_AIS_ERR_BAD_OPERATION;
+  goto done;
+}
+  }
+}
+t_csi = t_csi->si_list_of_csi_next;  } /*  while(t_csi) */
+
   rc = SA_AIS_OK;
   opdata->userData = csi; /* Save for later use in apply */
 done:
--
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] osaf: improve response time in etcd3.plugin [#3016]

2019-03-12 Thread Tran Thuan
Hi Gary,

ACK from me.
Thank you.

Best Regards,
ThuanTr

-Original Message-
From: Gary Lee  
Sent: Tuesday, March 12, 2019 7:32 AM
To: thuan.t...@dektech.com.au; hans.nordeb...@ericsson.com;
minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Gary Lee 
Subject: [PATCH 1/1] osaf: improve response time in etcd3.plugin [#3016]

if the initial call to watch takeover request in etcd3.plugin is made when
etcd has already been shutdown (for example, when etcd is running locally
and the node is being shutdown), the plugin should return 0 with a fake
takeover request to ensure rded shuts down promptly. Otherwise, it will keep
calling watch, delaying node shutdown.
---
 src/osaf/consensus/plugins/etcd3.plugin | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/osaf/consensus/plugins/etcd3.plugin
b/src/osaf/consensus/plugins/etcd3.plugin
index acccd98..d926885 100644
--- a/src/osaf/consensus/plugins/etcd3.plugin
+++ b/src/osaf/consensus/plugins/etcd3.plugin
@@ -357,9 +357,16 @@ watch() {
 return 0
   fi
 done
+  else
+# etcd down?
+if [ "$watch_key" == "$takeover_request" ]; then
+  hostname=`cat $node_name_file`
+  echo "$hostname SC-0 1000 UNDEFINED"
+  return 0
+else
+  return 1
+fi
   fi
-
-  return 1
 }
 
 # argument parsing
--
2.7.4




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] smf: clear old attributes before re-filling if ccb abort to sync [#3010]

2019-02-24 Thread Tran Thuan
Hi Thang,

ACK for code review.

Best Regards,
ThuanTr

-Original Message-
From: thang.d.nguyen  
Sent: Monday, February 25, 2019 1:25 PM
To: lennart.l...@ericsson.com; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; thang.d.nguyen

Subject: [PATCH 1/1] smf: clear old attributes before re-filling if ccb
abort to sync [#3010]

During single step install, the ccb is aborted to sync. The old attributes
in object creator/modifier must be erased before re-filling.
---
 src/smf/smfd/SmfImmOperation.h | 6 ++
 src/smf/smfd/SmfUtils.cc   | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/src/smf/smfd/SmfImmOperation.h b/src/smf/smfd/SmfImmOperation.h
index bc91431..04b4455 100644
--- a/src/smf/smfd/SmfImmOperation.h
+++ b/src/smf/smfd/SmfImmOperation.h
@@ -113,6 +113,12 @@ class SmfImmOperation {
 LOG_NO("addValue must be specialised");
   }
 
+  // Delete all attribute of object_create_ or object_modify_  void 
+ DeleteAttributes() {
+object_create_.attributes.clear();
+object_modify_.modifications.clear();
+  }
+
   // Create and add a new attribute if the attribute does not already
   // exist (based on name). If an attribute with the given name already
exist
   // no attribute is created but the value (i_value) is added (multivalue).
diff --git a/src/smf/smfd/SmfUtils.cc b/src/smf/smfd/SmfUtils.cc index
c2931c8..882a3e6 100644
--- a/src/smf/smfd/SmfUtils.cc
+++ b/src/smf/smfd/SmfUtils.cc
@@ -698,6 +698,9 @@ SaAisErrorT SmfImmUtils::doImmOperations(
   }
 }
 
+// Delete all attributes in the create/modify descriptor before fill
+imm_operation->DeleteAttributes();
+
 // Verify the create descriptor for this operation and fill in the
 // attributes
 result = imm_operation->Execute(rollbackData);
--
2.7.4




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] smf: clear old attributes before re-filling if ccb abort to sync [#3010]

2019-02-24 Thread Tran Thuan
Hi Thang,

See one comment inline [Thuan]

Best Regards,
ThuanTr

-Original Message-
From: thang.d.nguyen  
Sent: Monday, February 25, 2019 11:04 AM
To: lennart.l...@ericsson.com; thua...@users.sourceforge.net
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] smf: clear old attributes before re-filling if
ccb abort to sync [#3010]

During single step install, the ccb is aborted to sync. The old attributes
in object creator must be erased before re-filling.
---
 src/smf/smfd/SmfImmOperation.h | 5 +
 src/smf/smfd/SmfUtils.cc   | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/src/smf/smfd/SmfImmOperation.h b/src/smf/smfd/SmfImmOperation.h
index bc91431..c2c1be3 100644
--- a/src/smf/smfd/SmfImmOperation.h
+++ b/src/smf/smfd/SmfImmOperation.h
@@ -113,6 +113,11 @@ class SmfImmOperation {
 LOG_NO("addValue must be specialised");
   }
 
+  // Delete all attribute of object_create_  void DeleteAttributes() {
+object_create_.attributes.clear();
[Thuan] I think we need check "imm_operation_" is "Create" to do
"object_create_.attributes.clear();"
+  }
+
   // Create and add a new attribute if the attribute does not already
   // exist (based on name). If an attribute with the given name already
exist
   // no attribute is created but the value (i_value) is added (multivalue).
diff --git a/src/smf/smfd/SmfUtils.cc b/src/smf/smfd/SmfUtils.cc index
c2931c8..dfe9953 100644
--- a/src/smf/smfd/SmfUtils.cc
+++ b/src/smf/smfd/SmfUtils.cc
@@ -698,6 +698,9 @@ SaAisErrorT SmfImmUtils::doImmOperations(
   }
 }
 
+// Delete all attributes in the create descriptor before fill
+imm_operation->DeleteAttributes();
+
 // Verify the create descriptor for this operation and fill in the
 // attributes
 result = imm_operation->Execute(rollbackData);
--
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] clmd: not send sync respond to client if node down [#3004]

2019-01-31 Thread Tran Thuan
Hi Thang,

ACK from me for code review, not tested.

Best Regards,
ThuanTr

-Original Message-
From: thang.d.nguyen  
Sent: Wednesday, January 30, 2019 1:20 AM
To: gary@dektech.com.au; minh.c...@dektech.com.au;
thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; thang.d.nguyen

Subject: [PATCH 1/1] clmd: not send sync respond to client if node down
[#3004]

Clmd will not send sync respond to client if the node that client resided on
down. This will avoid timeout when clmd send via mds.
---
 src/clm/clmd/clms_cb.h   |  3 +++
 src/clm/clmd/clms_evt.cc | 22 +-
src/clm/clmd/clms_mds.cc |  2 +-
 3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/src/clm/clmd/clms_cb.h b/src/clm/clmd/clms_cb.h index
4d7fdc7..637d53a 100644
--- a/src/clm/clmd/clms_cb.h
+++ b/src/clm/clmd/clms_cb.h
@@ -22,6 +22,7 @@
 #include "osaf/config.h"
 #endif
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -238,6 +239,8 @@ typedef struct clms_cb_t {
   *node_down_list_head; /*NODE_DOWN record - Fix when active node goes
down
  */
   NODE_DOWN_LIST *node_down_list_tail;
+  // Record node id when receive MDS node down  std::set 
+ mds_node_down_list;
   bool is_impl_set;
   bool nid_started; /**< true if started by NID */
   NCS_PATRICIA_TREE iplist; /* To temporarily store ipaddress information
diff --git a/src/clm/clmd/clms_evt.cc b/src/clm/clmd/clms_evt.cc index
c2b83c2..5265002 100644
--- a/src/clm/clmd/clms_evt.cc
+++ b/src/clm/clmd/clms_evt.cc
@@ -943,6 +943,8 @@ static uint32_t proc_mds_node_evt(CLMSV_CLMS_EVT *evt) {
 goto done;
   }
 
+  clms_cb->mds_node_down_list.insert(node_id);
+
   if ((clms_cb->ha_state == SA_AMF_HA_ACTIVE) ||
   (clms_cb->ha_state == SA_AMF_HA_QUIESCED)) {
 clms_track_send_node_down(node);
@@ -1531,19 +1533,24 @@ static uint32_t proc_initialize_msg(CLMS_CB *cb,
CLMSV_CLMS_EVT *evt) {
 
   TRACE_ENTER2("dest %" PRIx64, evt->fr_dest);
 
-  /*Handle the wrap around */
-  if (clms_cb->last_client_id == INT_MAX) clms_cb->last_client_id = 0;
-
-  clms_cb->last_client_id++;
-
   node = clms_node_get_by_id(node_id);
   TRACE("Node id = %d", node_id);
   if (node == nullptr) {
 LOG_IN("Initialize request of client on an unconfigured node: node_id =
%d",
node_id);
 ais_rc = SA_AIS_ERR_UNAVAILABLE;
+std::set::iterator it =
+  clms_cb->mds_node_down_list.find(node_id);
+if (it != clms_cb->mds_node_down_list.end()) {
+  return (uint32_t)ais_rc;
+}
   }
 
+  /*Handle the wrap around */
+  if (clms_cb->last_client_id == INT_MAX) clms_cb->last_client_id = 0;
+
+  clms_cb->last_client_id++;
+
   if ((client = clms_client_new(evt->fr_dest, clms_cb->last_client_id)) ==
   nullptr) {
 TRACE("Creating a new client failed"); @@ -1564,6 +1571,11 @@ static
uint32_t proc_initialize_msg(CLMS_CB *cb, CLMSV_CLMS_EVT *evt) {
 return rc;
   }
 
+  std::set::iterator it = 
+ clms_cb->mds_node_down_list.find(node_id);
+  if (it != clms_cb->mds_node_down_list.end()) {
+clms_cb->mds_node_down_list.erase(it);
+  }
+
   if (node) {
 if (node->member == false) {
   rc = clms_send_is_member_info(clms_cb, node->node_id, node->member,
diff --git a/src/clm/clmd/clms_mds.cc b/src/clm/clmd/clms_mds.cc index
58552cc..833d18c 100644
--- a/src/clm/clmd/clms_mds.cc
+++ b/src/clm/clmd/clms_mds.cc
@@ -1097,7 +1097,7 @@ static uint32_t clms_mds_node_event(struct
ncsmds_callback_info *mds_info) {
 clmsv_evt->info.node_mds_info.node_id =
mds_info->info.node_evt.node_id;
 clmsv_evt->info.node_mds_info.nodeup = SA_TRUE;
 
-rc = m_NCS_IPC_SEND(_cb->mbx, clmsv_evt, NCS_IPC_PRIORITY_HIGH);
+rc = m_NCS_IPC_SEND(_cb->mbx, clmsv_evt, 
+ NCS_IPC_PRIORITY_VERY_HIGH);
 if (rc != NCSCC_RC_SUCCESS) {
   TRACE("IPC send failed %d", rc);
   free(clmsv_evt);
--
2.7.4




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] clmd: not send sync respond to client if node down [#3004]

2019-01-29 Thread Tran Thuan
Hi Thang,

Some comments inline with [Thuan]

Best Regards,
ThuanTr

-Original Message-
From: thang.d.nguyen  
Sent: Tuesday, January 29, 2019 4:53 AM
To: gary@dektech.com.au; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] clmd: not send sync respond to client if node
down [#3004]

clmd will not send sync respond to client if the node that client resided on
down. This will avoid timeout when clmd send via mds.
---
 src/clm/clmd/clms_cb.h|  5 
 src/clm/clmd/clms_evt.cc  | 35 ++-
 src/clm/clmd/clms_evt.h   |  1 +
 src/clm/clmd/clms_main.cc |  4 
 src/clm/clmd/clms_mds.cc  | 61
+++
 5 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/src/clm/clmd/clms_cb.h b/src/clm/clmd/clms_cb.h index
4d7fdc7..6999761 100644
--- a/src/clm/clmd/clms_cb.h
+++ b/src/clm/clmd/clms_cb.h
@@ -22,6 +22,7 @@
 #include "osaf/config.h"
 #endif
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -238,6 +239,8 @@ typedef struct clms_cb_t {
   *node_down_list_head; /*NODE_DOWN record - Fix when active node goes
down
  */
   NODE_DOWN_LIST *node_down_list_tail;
+  // Node down list - Updated by MDS thread  std::list 
+ mds_node_down_list;

[Thuan]: The element is simple and small, I suggest to use
std::set. We can avoid new/delete and SET to avoid duplicate
element

   bool is_impl_set;
   bool nid_started; /**< true if started by NID */
   NCS_PATRICIA_TREE iplist; /* To temporarily store ipaddress information
@@ -245,6 +248,8 @@ typedef struct clms_cb_t {
 
   /* Mutex protecting shared data used by the scale-out functionality */
   pthread_mutex_t scale_out_data_mutex;
+  /* Mutex protecting shared data used by the delete/add node-id */  
+ pthread_mutex_t node_down_list_mutex;
   /* Number of occupied indices in the vectors pending_nodes[] and
* pending_node_ids[] */
   size_t no_of_pending_nodes;
diff --git a/src/clm/clmd/clms_evt.cc b/src/clm/clmd/clms_evt.cc index
c2b83c2..08d4acd 100644
--- a/src/clm/clmd/clms_evt.cc
+++ b/src/clm/clmd/clms_evt.cc
@@ -17,7 +17,6 @@
  *
  */
 
-#include "osaf/configmake.h"
 #include 
 #include 
 #include 
@@ -31,6 +30,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include "osaf/configmake.h"
 #include "base/logtrace.h"
 #include "base/ncsgl_defs.h"
 #include "base/osaf_utility.h"
@@ -1514,6 +1516,31 @@ static uint32_t proc_node_get_async_msg(CLMS_CB *cb,
CLMSV_CLMS_EVT *evt) {  }
 
 /**
+ * Return true if mds node down exist
+ * @param node id
+ *
+ * @return bool
+ */
+bool clms_is_node_down(uint32_t node_id) {
+  TRACE_ENTER();
+  bool found = false;
+  std::list::iterator it;
+  osaf_mutex_lock_ordie(_cb->node_down_list_mutex);
+
+  for (it = clms_cb->mds_node_down_list.begin();
+it != clms_cb->mds_node_down_list.end(); ++it) {
+if (*(*it) == node_id) {
+  found = true;
+  break;
+}
+  }
+
+  osaf_mutex_unlock_ordie(_cb->node_down_list_mutex);
+  TRACE_LEAVE();
+  return found;
+}
+
+/**
  * Handle a initialize message
  * @param cb
  * @param evt
@@ -1556,6 +1583,12 @@ static uint32_t proc_initialize_msg(CLMS_CB *cb,
CLMSV_CLMS_EVT *evt) {
   if (client != nullptr)
 msg.info.api_resp_info.param.client_id = client->client_id;
 
+  if (clms_is_node_down(node_id) == true) {
+LOG_NO("node_id = %d already down, no need sending sync respond",
node_id);
+if (client != nullptr) clms_client_delete(client->client_id);
+return (uint32_t)ais_rc;
+  }
+
[Thuan] Why don't move this block before clms_client_new() then not need to
clms_client_delete()

   rc = clms_mds_msg_send(cb, , >fr_dest, >mds_ctxt,
  MDS_SEND_PRIORITY_HIGH, NCSMDS_SVC_ID_CLMA);
   if (rc != NCSCC_RC_SUCCESS) {
diff --git a/src/clm/clmd/clms_evt.h b/src/clm/clmd/clms_evt.h index
1005456..ef35cbc 100644
--- a/src/clm/clmd/clms_evt.h
+++ b/src/clm/clmd/clms_evt.h
@@ -92,6 +92,7 @@ extern uint32_t clms_clmresp_ok(CLMS_CB *cb,
CLMS_CLUSTER_NODE *op_node,
 CLMS_TRACK_INFO *trkrec);  extern uint32_t
clms_remove_clma_down_rec(CLMS_CB *cb, MDS_DEST mds_dest);  extern void
clms_remove_node_down_rec(SaClmNodeIdT node_id);
+extern bool clms_is_node_down(SaClmNodeIdT node_id);
 extern uint32_t clms_node_add(CLMS_CLUSTER_NODE *node, int i);  extern void
clms_clmresp_error_timeout(CLMS_CB *cb, CLMS_CLUSTER_NODE *node);  extern
bool clms_clma_entry_valid(CLMS_CB *cb, MDS_DEST mds_dest); diff --git
a/src/clm/clmd/clms_main.cc b/src/clm/clmd/clms_main.cc index
ad6e12e..e2c4f21 100644
--- a/src/clm/clmd/clms_main.cc
+++ b/src/clm/clmd/clms_main.cc
@@ -245,6 +245,10 @@ uint32_t clms_cb_init(CLMS_CB *clms_cb) {
   if (pthread_mutex_init(_cb->scale_out_data_mutex, nullptr) != 0) {
 return NCSCC_RC_FAILURE;
   }
+  if (pthread_mutex_init(_cb->node_down_list_mutex, nullptr) != 0) {
+return NCSCC_RC_FAILURE;
+  }
+
   

Re: [devel] [PATCH 1/1] imm: initialize attrDefinitionsOut with NULL [#2981]

2018-12-16 Thread Tran Thuan
Hi,

Do you mean [#2987]?

Best Regards,
ThuanTr

-Original Message-
From: Mohan Kanakam  
Sent: Monday, December 17, 2018 2:22 PM
To: vu.m.ngu...@dektech.com.au; hans.nordeb...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] imm: initialize attrDefinitionsOut with NULL
[#2981]

---
 src/imm/apitest/management/test_saImmOmClassDescriptionGet_2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/imm/apitest/management/test_saImmOmClassDescriptionGet_2.c
b/src/imm/apitest/management/test_saImmOmClassDescriptionGet_2.c
index 1d4e8eb..be512f6 100644
--- a/src/imm/apitest/management/test_saImmOmClassDescriptionGet_2.c
+++ b/src/imm/apitest/management/test_saImmOmClassDescriptionGet_2.c
@@ -191,7 +191,7 @@ void
saImmOmClassDescriptionGet_2_with_className_as_null(void)
SA_IMM_ATTR_RUNTIME | SA_IMM_ATTR_RDN | SA_IMM_ATTR_CACHED,
NULL};
const SaImmAttrDefinitionT_2 *attrDefinitionsIn[] = {, NULL};
SaImmClassCategoryT classCategory;
-   SaImmAttrDefinitionT_2 **attrDefinitionsOut;
+   SaImmAttrDefinitionT_2 **attrDefinitionsOut = NULL;
 
safassert(immutil_saImmOmInitialize(, ,
),
  SA_AIS_OK);
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] ntf: Increase the priority when sending MDS_DOWN to NTFD maind thread [#2973]

2018-11-27 Thread Tran Thuan
Hi Canh,

I just wonder if any notification miss cause by this change?
Example:
  Agent send notification then it terminate.
  Will NTFD handle Agent DOWN then skip notification already received from
Agent?

Also, what is the problem/consequence of current behavior? Is ticket a
defect or an enhancement?
Make clear what is defect or what improve with new change?

Best Regards,
ThuanTr

-Original Message-
From: Canh Van Truong  
Sent: Wednesday, November 28, 2018 10:12 AM
To: 'Lennart Lund' ; 'Minh Hon Chau'

Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] ntf: Increase the priority when sending
MDS_DOWN to NTFD maind thread [#2973]

Thanks Lennart,

Yes, I sent with wrong patch. This patch was including my testing. 
The solution  just increases the priority  when sending the MDS_DOWN event.
I am sending the right.

Regards
Canh


-Original Message-
From: Lennart Lund 
Sent: Tuesday, November 27, 2018 10:40 PM
To: Canh Van Truong ; Minh Hon Chau

Cc: opensaf-devel@lists.sourceforge.net; Canh Van Truong
; Lennart Lund 
Subject: RE: [PATCH 1/1] ntf: Increase the priority when sending MDS_DOWN to
NTFD maind thread [#2973]

Hi Canh,

Ack, with comments see [Lennart] below. I have not tested I cannot say if
this is a good solution or not.

Thanks
Lennart

-Original Message-
From: Canh Van Truong 
Sent: den 27 november 2018 14:17
To: Lennart Lund ; Minh Hon Chau

Cc: opensaf-devel@lists.sourceforge.net; Canh Van Truong

Subject: [PATCH 1/1] ntf: Increase the priority when sending MDS_DOWN to
NTFD maind thread [#2973]

When the sending notification request come to NTFD , they are put in the mbx
to wait for being processed. if client of these request is down, MDS thread
receive the event down and also put to mbx. The priority of event down and
send request are the same. So NTF will process the send request before the
event down.

The patch increases the priority when puting the MDS_DOWN event to main
thread
---
 src/dtm/transport/transportd.conf | 4 ++--
 src/ntf/ntfd/NtfAdmin.cc  | 5 +
 src/ntf/ntfd/ntfs_mds.c   | 2 +-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/dtm/transport/transportd.conf
b/src/dtm/transport/transportd.conf
index 23e87a0a1..10e41338b 100644
--- a/src/dtm/transport/transportd.conf
+++ b/src/dtm/transport/transportd.conf
@@ -4,10 +4,10 @@
[Lennart] This has nothing to do with the problem. Also it does not seems to
do anything. The new values are the same as the default.
If this change is relevant in some way it must be documented why it is
needed at least in the commit message.
Otherwise revert the change.
 # TRANSPORT_MAX_FILE_SIZE: The  maximum size of the log file. The size
value should  # be in Bytes i.e if you want to give 5 MB please mention it
as 5242880. it is treated  # as 5 MB. By default value will be 5242880 Bytes
-#TRANSPORT_MAX_FILE_SIZE=5242880
+TRANSPORT_MAX_FILE_SIZE=52428800
 
 #
 # TRANSPORT_MAX_BACKUPS: Number of backup files to maintain. Log rotation
will  # be done based on this value. Default value will be 9  # i.e totally
10 log files will be maintain.
-#TRANSPORT_MAX_BACKUPS=9
+TRANSPORT_MAX_BACKUPS=9
diff --git a/src/ntf/ntfd/NtfAdmin.cc b/src/ntf/ntfd/NtfAdmin.cc index
2cb99457c..f0996f487 100644
--- a/src/ntf/ntfd/NtfAdmin.cc
+++ b/src/ntf/ntfd/NtfAdmin.cc
@@ -200,10 +200,15 @@ void NtfAdmin::processNotification(unsigned int
clientId,
   sendNotificationUpdate(clientId, notification->getNotInfo());
 
   ClientMap::iterator pos;
[Lennart] Seems to be some test printouts that you have used when
testing/debugging. Shall probably be removed.
+  int enc = 0;
   for (pos = clientMap.begin(); pos != clientMap.end(); pos++) {
+ TRACE("xcantru: %d", enc);
 NtfClient *client = pos->second;
+
 client->notificationReceived(clientId, notification, mdsCtxt);
+enc++;
   }
+  TRACE("xcantru0: %d", enc);
 
   /* remove notification if sent to all subscribers and logged */
   if (notification->isSubscriptionListEmpty() && notification->loggedOk())
{ diff --git a/src/ntf/ntfd/ntfs_mds.c b/src/ntf/ntfd/ntfs_mds.c index
a4b4a5f09..bede30cef 100644
--- a/src/ntf/ntfd/ntfs_mds.c
+++ b/src/ntf/ntfd/ntfs_mds.c
@@ -954,7 +954,7 @@ static uint32_t mds_svc_event(struct
ncsmds_callback_info *info)
 
/* Push the event and we are done */
if (m_NCS_IPC_SEND(_cb->mbx, evt,
-  NCS_IPC_PRIORITY_HIGH) !=
+  NCS_IPC_PRIORITY_VERY_HIGH) !=
NCSCC_RC_SUCCESS) {
TRACE("ipc send failed");
ntfs_evt_destroy(evt);
--
2.15.1




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel



___
Opensaf-devel mailing list

Re: [devel] [PATCH 0/2] Review Request for amfd: checkpoint node state to standby [#2971]

2018-11-26 Thread Tran Thuan
Hi Gary,

ACK. Thank you!

Best Regards,
ThuanTr

-Original Message-
From: Gary Lee  
Sent: Monday, November 26, 2018 1:30 PM
To: Nagendra Kumar ; hans.nordeb...@ericsson.com;
minh.c...@dektech.com.au; thuan . tran 
Cc: opensaf-devel@lists.sourceforge.net; Gary Lee 
Subject: [PATCH 0/2] Review Request for amfd: checkpoint node state to
standby [#2971]

Summary: amfd: checkpoint node state to standby [#2971] Review request for
Ticket(s): 2971 Peer Reviewer(s): Nagendra, Thuan, Minh, Hans Pull request
to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es):
develop Development branch: ticket-2971 Base revision:
8c5f9ab333231b093489e60071083a1452b93d0e
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy 
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 846d1b4410f47f808f7f29cdba8e4abec167d99d
Author: Gary Lee 
Date:   Mon, 26 Nov 2018 17:18:55 +1100

amfd: set userData [#2971]

Depending on timing, it's possible for node_info.member to be set after this
ccb callback. We should populate userData anyway, in case the active
validates this callback and then a SC failover to the standby occurs.



revision e5a149513f6425d36cfa61039d343656fc5c75d0
Author: Gary Lee 
Date:   Mon, 26 Nov 2018 17:18:51 +1100

amfd: checkpoint node state to standby [#2971]

we need to checkpoint change to node_info.member to the standby



Complete diffstat:
--
 src/amf/amfd/ndfsm.cc | 3 +++
 src/amf/amfd/node.cc  | 1 +
 2 files changed, 4 insertions(+)


Testing Commands:
-
Scale in a node and perform a SC failover

Testing, Expected Results:
--
New active AMFD does not assert


Conditions of Submission:
-
Ack from any reviewer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email
etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net

Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

2018-10-01 Thread Tran Thuan
Hi Alex,

 

Here are my steps:

Try unlock-in/lock container/contained.

Try lock/unlock SI

Try restart component.

 

Best Regards,

Thuan

 

From: Jones, Alex  
Sent: Friday, September 28, 2018 11:29 PM
To: Tran Thuan ; nagen...@hasolutions.in; 'Gary Lee' 
; hans.nordeb...@ericsson.com; 
ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

 

Hi Thuan,

I can't seem to reproduce this. Can you tell me the exact steps for how you 
are setting it up?

Alex

 

On 09/27/2018 06:55 AM, Tran Thuan wrote:

  _  

NOTICE: This email was received from an EXTERNAL sender

  _  


Hi Alex,

When I try admin restart component, it got error and state is stuck in 
RESTARTING.
Please check.

root@SC-1:/# amf-adm restart 
safComp=Contained_1,safSu=SU1,safSg=Contained_2N,safApp=Contained_2N

2018-09-27 17:52:51.808 SC-1 osafamfnd[515]: NO Admin restart requested for 
'safComp=Contained_1,safSu=SU1,safSg=Contained_2N,safApp=Contained_2N'
2018-09-27 17:52:51.810 SC-1 osafamfnd[515]: NO 
'safSu=SU1,safSg=Contained_2N,safApp=Contained_2N' Presence State INSTANTIATED 
=> RESTARTING
2018-09-27 17:52:51.811 SC-1 amf_container_demo[1074]: =Contained Terminate 
Callback>
2018-09-27 17:52:51.811 SC-1 amf_container_demo[1074]: 
comp:'safComp=Contained_1,safSu=SU1,safSg=Contained_2N,safApp=Contained_2N'
2018-09-27 17:52:51.815 SC-1 amf_container_demo[1074]: 
<=
2018-09-27 17:52:51.815 SC-1 amf_container_demo[1074]: =Contained 
Instantiate Callback>
2018-09-27 17:52:51.815 SC-1 amf_container_demo[1074]: 
comp:safComp=Contained_1,safSu=SU1,safSg=Contained_2N,safApp=Contained_2N
2018-09-27 17:52:51.817 SC-1 amf_container_demo[1074]: 
<===
2018-09-27 17:52:51.817 SC-1 osafamfnd[515]: ER nodeid_mdsdest rec doesn't 
exist, Rec get failed: NodeId:0
2018-09-27 17:52:51.817 SC-1 osafamfnd[515]: ER avnd_comp_cbq_rec_send:Msg Send 
to AvND 
Failed:safComp=Contained_1,safSu=SU1,safSg=Contained_2N,safApp=Contained_2N, 0

root@SC-1:/# amf-state su | grep -A 4 SU1.*Contain
safSu=SU1,safSg=Contained_2N,safApp=Contained_2N
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=RESTARTING(5)
saAmfSUReadinessState=IN-SERVICE(2)
--
safSu=SU1,safSg=Container,safApp=Container
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=IN-SERVICE(2)

Best Regards,
Thuan

-Original Message-----
From: Tran Thuan  <mailto:thuan.t...@dektech.com.au> 
 
Sent: Wednesday, September 26, 2018 6:12 PM
To: 'Jones, Alex'  <mailto:ajo...@rbbn.com> ; 
nagen...@hasolutions.in <mailto:nagen...@hasolutions.in> ; 'Gary Lee'  
<mailto:gary@dektech.com.au> ; 
hans.nordeb...@ericsson.com <mailto:hans.nordeb...@ericsson.com> ; 
ravisekhar.ko...@oracle.com <mailto:ravisekhar.ko...@oracle.com> 
Cc: opensaf-devel@lists.sourceforge.net 
<mailto:opensaf-devel@lists.sourceforge.net> 
Subject: Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

Hi Alex,



I can fetch your code. Will try it and reply if any concern… 



* 430dbfdd4 - (HEAD -> ticket-70) amf: fix crash of amfd [#70] (5 days ago) 


* bf998b543 - amfd: don't start contained if it is in locked-in state [#70] (3 
weeks ago) 

* 8b8f2fe30 - amf: fix restart of container app and check container model [#70] 
(6 weeks ago) 

* 9c9f7e04c - amf: add support for container/contained [#70] (6 weeks ago) 


* cf9d75653 - amf: add support for container/contained [#70] (6 weeks ago) 


* 199d81e0e - amf: add support for container/contained [#70] (6 weeks ago) 


* d53a063d3 - amfnd: add support for container/contained [#70] (6 weeks ago) 


* 308b8d733 - amfd: add support for container/contained [#70] (6 weeks ago) 




Best Regards,

Thuan



From: Jones, Alex  <mailto:ajo...@rbbn.com> 
Sent: Wednesday, September 26, 2018 12:10 AM
To: Tran Thuan  <mailto:thuan.t...@dektech.com.au> ; 
nagen...@hasolutions.in <mailto:nagen...@hasolutions.in> ; 'Gary Lee'  
<mailto:gary@dektech.com.au> ; 
hans.nordeb...@ericsson.com <mailto:hans.nordeb...@ericsson.com> ; 
ravisekhar.ko...@oracle.com <mailto:ravisekhar.ko...@oracle.com> 
Cc: opensaf-devel@lists.sourceforge.net 
<mailto:opensaf-devel@lists.sourceforge.net> 
Subject: Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]



Hi Thuan,

I pushed the latest ticket-70 to my review repo.

For these concerns you have try cleaning out your code, and pull the latest 
from my review repo for ticket-70. Contained should not get instantiated until 
you unlock-in the contained su. I'm guessing you are missing some patches...

Let me know if you still see these issues.

Alex



On 09/24/2018 12:24 AM, Tran Thuan wrote:

_ 

NOTICE: This email was received from an E

Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

2018-09-27 Thread Tran Thuan
Hi Alex,

When I try admin restart component, it got error and state is stuck in 
RESTARTING.
Please check.

root@SC-1:/# amf-adm restart 
safComp=Contained_1,safSu=SU1,safSg=Contained_2N,safApp=Contained_2N

2018-09-27 17:52:51.808 SC-1 osafamfnd[515]: NO Admin restart requested for 
'safComp=Contained_1,safSu=SU1,safSg=Contained_2N,safApp=Contained_2N'
2018-09-27 17:52:51.810 SC-1 osafamfnd[515]: NO 
'safSu=SU1,safSg=Contained_2N,safApp=Contained_2N' Presence State INSTANTIATED 
=> RESTARTING
2018-09-27 17:52:51.811 SC-1 amf_container_demo[1074]: =Contained Terminate 
Callback>
2018-09-27 17:52:51.811 SC-1 amf_container_demo[1074]:
comp:'safComp=Contained_1,safSu=SU1,safSg=Contained_2N,safApp=Contained_2N'
2018-09-27 17:52:51.815 SC-1 amf_container_demo[1074]: 
<=
2018-09-27 17:52:51.815 SC-1 amf_container_demo[1074]: =Contained 
Instantiate Callback>
2018-09-27 17:52:51.815 SC-1 amf_container_demo[1074]:  
comp:safComp=Contained_1,safSu=SU1,safSg=Contained_2N,safApp=Contained_2N
2018-09-27 17:52:51.817 SC-1 amf_container_demo[1074]: 
<===
2018-09-27 17:52:51.817 SC-1 osafamfnd[515]: ER nodeid_mdsdest rec doesn't 
exist, Rec get failed: NodeId:0
2018-09-27 17:52:51.817 SC-1 osafamfnd[515]: ER avnd_comp_cbq_rec_send:Msg Send 
to AvND 
Failed:safComp=Contained_1,safSu=SU1,safSg=Contained_2N,safApp=Contained_2N, 0

root@SC-1:/# amf-state su | grep -A 4 SU1.*Contain
safSu=SU1,safSg=Contained_2N,safApp=Contained_2N
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=RESTARTING(5)
saAmfSUReadinessState=IN-SERVICE(2)
--
safSu=SU1,safSg=Container,safApp=Container
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=IN-SERVICE(2)

Best Regards,
Thuan

-Original Message-----
From: Tran Thuan  
Sent: Wednesday, September 26, 2018 6:12 PM
To: 'Jones, Alex' ; nagen...@hasolutions.in; 'Gary Lee' 
; hans.nordeb...@ericsson.com; 
ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

Hi Alex,

 

I can fetch your code. Will try it and reply if any concern… 

 

* 430dbfdd4 - (HEAD -> ticket-70) amf: fix crash of amfd [#70] (5 days ago) 


* bf998b543 - amfd: don't start contained if it is in locked-in state [#70] (3 
weeks ago) 

* 8b8f2fe30 - amf: fix restart of container app and check container model [#70] 
(6 weeks ago) 

* 9c9f7e04c - amf: add support for container/contained [#70] (6 weeks ago) 


* cf9d75653 - amf: add support for container/contained [#70] (6 weeks ago) 


* 199d81e0e - amf: add support for container/contained [#70] (6 weeks ago) 


* d53a063d3 - amfnd: add support for container/contained [#70] (6 weeks ago) 


* 308b8d733 - amfd: add support for container/contained [#70] (6 weeks ago) 


 

Best Regards,

Thuan

 

From: Jones, Alex 
Sent: Wednesday, September 26, 2018 12:10 AM
To: Tran Thuan ; nagen...@hasolutions.in; 'Gary Lee' 
; hans.nordeb...@ericsson.com; 
ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

 

Hi Thuan,

I pushed the latest ticket-70 to my review repo.

For these concerns you have try cleaning out your code, and pull the latest 
from my review repo for ticket-70. Contained should not get instantiated until 
you unlock-in the contained su. I'm guessing you are missing some patches...

Let me know if you still see these issues.

Alex

 

On 09/24/2018 12:24 AM, Tran Thuan wrote:

  _  

NOTICE: This email was received from an EXTERNAL sender

  _  

 

Hi Alex,

 

Can you send out new version of review?

Then I can fetch latest version from your review repo.

 

Some more concerns without using your additional patch plm-70 yet.

 

1.  After unlock-in, unlock container, contained got instantiated but admin 
state is still locked-in

 

safSu=SU1,safSg=Contained_2N,safApp=Contained_2N

saAmfSUAdminState=LOCKED-INSTANTIATION(3)

saAmfSUOperState=ENABLED(1)

saAmfSUPresenceState=INSTANTIATED(3)

saAmfSUReadinessState=OUT-OF-SERVICE(1)

--

safSu=SU1,safSg=Container,safApp=Container

saAmfSUAdminState=UNLOCKED(1)

saAmfSUOperState=ENABLED(1)

saAmfSUPresenceState=INSTANTIATED(3)

saAmfSUReadinessState=IN-SERVICE(2)

 

2.  Also, I don't see how to make contained get assignment?

 

root@SC-1:~# amf-adm unlock safSu=SU1,safSg=Contained_2N,safApp=Contained_2N

error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
SA_AIS_ERR_BAD_OPERATION (20)

error-string: State transition invalid, state 3, op 1

root@SC-1:~# amf-adm unlock-in safSu=SU1,safSg=Contained_2N,safApp=Contained_2N

error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
SA_AIS_ERR_BAD_OPERATION (20)

error-string: Can't insta

Re: [devel] [PATCH 1/1] smf: campaign is executing forever until cluster reset [#1353]

2018-09-27 Thread Tran Thuan
Hi Lennart,

Can you help check this? Thanks.

Best Regards,
Thuan

-Original Message-
From: thuan.tran  
Sent: Tuesday, September 25, 2018 2:04 PM
To: lennart.l...@ericsson.com; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; thuan.tran

Subject: [PATCH 1/1] smf: campaign is executing forever until cluster reset
[#1353]

The function getNodeDestination() reset elapsedTime to zero cause the node
reboot timeout at waitForNodeDestination() never reach.
If scenario that node reboot cannot come back then campaign is stuck in
executing forever until cluster reset.
---
 src/smf/smfd/SmfUpgradeStep.cc |  1 +
 src/smf/smfd/SmfUtils.cc   | 11 ---
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/src/smf/smfd/SmfUpgradeStep.cc b/src/smf/smfd/SmfUpgradeStep.cc
index 4c0ddd192..80da668de 100644
--- a/src/smf/smfd/SmfUpgradeStep.cc
+++ b/src/smf/smfd/SmfUpgradeStep.cc
@@ -2399,6 +2399,7 @@ bool SmfUpgradeStep::nodeReboot() {
   "SmfUpgradeStep::nodeReboot: Waiting to get node destination with
increased UP counter");
 
   while (true) {
+elapsedTime = 0;
 for (nodeIt = rebootedNodeList.begin(); nodeIt !=
rebootedNodeList.end();) {
   if (getNodeDestination((*nodeIt).node_name, , ,
  -1)) {
diff --git a/src/smf/smfd/SmfUtils.cc b/src/smf/smfd/SmfUtils.cc index
915c086a5..4ac5af163 100644
--- a/src/smf/smfd/SmfUtils.cc
+++ b/src/smf/smfd/SmfUtils.cc
@@ -95,9 +95,6 @@ bool getNodeDestination(const std::string _node,
SmfndNodeDest *o_nodeDest,
 
   TRACE("Find destination for node '%s'", i_node.c_str());
 
-  if (elapsedTime)  // Initialize elapsedTime to zero.
-*elapsedTime = 0;
-
   /* It seems SaAmfNode objects can be stored, but the code
* indicates that SaClmNode's are expected. Anyway an attempt
* to go for it is probably faster that examining IMM classes @@ -133,10
+130,10 @@ bool getNodeDestination(const std::string _node, SmfndNodeDest
*o_nodeDest,
   }
   struct timespec time = {2 * ONE_SECOND, 0};
   osaf_nanosleep();
-  timeout--;
+  timeout -= 2;
   if (elapsedTime) *elapsedTime = *elapsedTime + 2 * ONE_SECOND;
   if (maxWaitTime != -1) {
-if (*elapsedTime >= maxWaitTime) {
+if ((elapsedTime) && (*elapsedTime >= maxWaitTime)) {
   LOG_NO("Failed to get node dest for clm node %s",
i_node.c_str());
   return false;
 }
@@ -165,11 +162,11 @@ bool getNodeDestination(const std::string _node,
SmfndNodeDest *o_nodeDest,
   }
   struct timespec time = {2 * ONE_SECOND, 0};
   osaf_nanosleep();
-  timeout--;
+  timeout -= 2;
   if (elapsedTime) *elapsedTime = *elapsedTime + 2 * ONE_SECOND;
 
   if (maxWaitTime != -1) {
-if (*elapsedTime >= maxWaitTime) {
+if ((elapsedTime) && (*elapsedTime >= maxWaitTime)) {
   LOG_NO("Failed to get node dest for clm node %s",
i_node.c_str());
   free(nodeName);
   return false;
--
2.18.0




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

2018-09-26 Thread Tran Thuan
Hi Alex,

 

I can fetch your code. Will try it and reply if any concern… 

 

* 430dbfdd4 - (HEAD -> ticket-70) amf: fix crash of amfd [#70] (5 days ago) 


* bf998b543 - amfd: don't start contained if it is in locked-in state [#70] (3 
weeks ago) 

* 8b8f2fe30 - amf: fix restart of container app and check container model [#70] 
(6 weeks ago) 

* 9c9f7e04c - amf: add support for container/contained [#70] (6 weeks ago) 


* cf9d75653 - amf: add support for container/contained [#70] (6 weeks ago) 


* 199d81e0e - amf: add support for container/contained [#70] (6 weeks ago) 


* d53a063d3 - amfnd: add support for container/contained [#70] (6 weeks ago) 


* 308b8d733 - amfd: add support for container/contained [#70] (6 weeks ago) 


 

Best Regards,

Thuan

 

From: Jones, Alex  
Sent: Wednesday, September 26, 2018 12:10 AM
To: Tran Thuan ; nagen...@hasolutions.in; 'Gary Lee' 
; hans.nordeb...@ericsson.com; 
ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

 

Hi Thuan,

I pushed the latest ticket-70 to my review repo.

For these concerns you have try cleaning out your code, and pull the latest 
from my review repo for ticket-70. Contained should not get instantiated until 
you unlock-in the contained su. I'm guessing you are missing some patches...

Let me know if you still see these issues.

Alex

 

On 09/24/2018 12:24 AM, Tran Thuan wrote:

  _  

NOTICE: This email was received from an EXTERNAL sender

  _  

 

Hi Alex,

 

Can you send out new version of review?

Then I can fetch latest version from your review repo.

 

Some more concerns without using your additional patch plm-70 yet.

 

1.  After unlock-in, unlock container, contained got instantiated but admin 
state is still locked-in

 

safSu=SU1,safSg=Contained_2N,safApp=Contained_2N

saAmfSUAdminState=LOCKED-INSTANTIATION(3)

saAmfSUOperState=ENABLED(1)

saAmfSUPresenceState=INSTANTIATED(3)

saAmfSUReadinessState=OUT-OF-SERVICE(1)

--

safSu=SU1,safSg=Container,safApp=Container

saAmfSUAdminState=UNLOCKED(1)

saAmfSUOperState=ENABLED(1)

saAmfSUPresenceState=INSTANTIATED(3)

saAmfSUReadinessState=IN-SERVICE(2)

 

2.  Also, I don't see how to make contained get assignment?

 

root@SC-1:~# amf-adm unlock safSu=SU1,safSg=Contained_2N,safApp=Contained_2N

error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
SA_AIS_ERR_BAD_OPERATION (20)

error-string: State transition invalid, state 3, op 1

root@SC-1:~# amf-adm unlock-in safSu=SU1,safSg=Contained_2N,safApp=Contained_2N

error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
SA_AIS_ERR_BAD_OPERATION (20)

error-string: Can't instantiate 
'safSu=SU1,safSg=Contained_2N,safApp=Contained_2N', whose presence state is '3'

 

3.  When I kill -9 `pidof amf_container_demo`, it seems recovery fail?

 

2018-09-24 10:59:59.273 SC-1 osafamfnd[478]: NO 
'safSu=SU1,safSg=Contained_2N,safApp=Contained_2N' component restart probation 
timer started (timeout: 40 ns)

2018-09-24 10:59:59.273 SC-1 osafamfnd[478]: NO Restarting a component of 
'safSu=SU1,safSg=Contained_2N,safApp=Contained_2N' (comp restart count: 1)

2018-09-24 10:59:59.273 SC-1 osafamfnd[478]: NO 
'safComp=Contained_1,safSu=SU1,safSg=Contained_2N,safApp=Contained_2N' faulted 
due to 'avaDown' : Recovery is 'componentRestart'

2018-09-24 10:59:59.274 SC-1 osafamfnd[478]: ER ncsmds_api for 0 FAILED, 
dest=2010f0280

2018-09-24 10:59:59.274 SC-1 osafamfnd[478]: NO Component CLC fsm exited with 
error for 
comp:safComp=Contained_1,safSu=SU1,safSg=Contained_2N,safApp=Contained_2N

2018-09-24 11:00:03.295 SC-1 osafamfnd[478]: NO 
'safSu=SU1,safSg=Contained_2N,safApp=Contained_2N' Component or SU restart 
probation timer expired

 

safSu=SU1,safSg=Contained_2N,safApp=Contained_2N

saAmfSUAdminState=LOCKED-INSTANTIATION(3)

saAmfSUOperState=ENABLED(1)

saAmfSUPresenceState=INSTANTIATED(3)

saAmfSUReadinessState=OUT-OF-SERVICE(1)

--

safSu=SU1,safSg=Container,safApp=Container

saAmfSUAdminState=UNLOCKED(1)

saAmfSUOperState=ENABLED(1)

saAmfSUPresenceState=INSTANTIATED(3)

saAmfSUReadinessState=IN-SERVICE(2)

 

safSi=Contained_2N_1,safApp=Contained_2N

saAmfSIAdminState=UNLOCKED(1)

saAmfSIAssignmentState=UNASSIGNED(1)

safSi=Container,safApp=Container

saAmfSIAdminState=UNLOCKED(1)

saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)

 

Best Regards,

Thuan

 

From: Jones, Alex  <mailto:ajo...@rbbn.com>  
Sent: Saturday, September 22, 2018 1:01 AM
To: Tran Thuan  <mailto:thuan.t...@dektech.com.au> ; 
nagen...@hasolutions.in <mailto:nagen...@hasolutions.in> ; 'Gary Lee'  
<mailto:gary@dektech.com.au> ; 
hans.nordeb...@ericsson.com <mailto:hans.nordeb...@ericsson.com&g

Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

2018-09-21 Thread Tran Thuan
Hi Alex,

I think you need update samples/amf/container/README.
Also when I try following steps, AMFD crash.

root@SC-1:/opt/amf_demo# immcfg -f AppConfig-container.xml
root@SC-1:/opt/amf_demo# immcfg -f AppConfig-contained-2N.xml 
root@SC-1:/opt/amf_demo# amf-adm unlock-in
safSu=SU1,safSg=Container,safApp=Container
root@SC-1:/opt/amf_demo# amf-adm unlock
safSu=SU1,safSg=Container,safApp=Container
root@SC-1:/opt/amf_demo# amf-adm lock
safSu=SU1,safSg=Container,safApp=Container

2018-09-21 16:58:38.338 SC-1 osafamfnd[512]: NO Assigning
'safSi=Container,safApp=Container' ACTIVE to
'safSu=SU1,safSg=Container,safApp=Container'
2018-09-21 16:58:38.339 SC-1 amf_container_demo[739]: csi set callback for
comp: safComp=Container,safSu=SU1,safSg=Container,safApp=Container
2018-09-21 16:58:38.339 SC-1 amf_container_demo[739]: CSI Set - add
'safCsi=Container1,safSi=Container,safApp=Container' HAState Active
2018-09-21 16:58:38.339 SC-1 amf_container_demo[739]: name: Contained1,
value: 
2018-09-21 16:58:38.339 SC-1 amf_container_demo[739]: name: Contained1,
value: 
2018-09-21 16:58:38.340 SC-1 osafamfnd[512]: NO Assigned
'safSi=Container,safApp=Container' ACTIVE to
'safSu=SU1,safSg=Container,safApp=Container'
2018-09-21 16:58:38.413 SC-1 osafamfnd[512]: NO
'safSu=SU1,safSg=Contained_2N,safApp=Contained_2N' Presence State
UNINSTANTIATED => INSTANTIATING
2018-09-21 16:58:38.414 SC-1 amf_container_demo[739]: =Contained
Instantiate Callback>
2018-09-21 16:58:38.414 SC-1 amf_container_demo[739]:
comp:safComp=Contained_1,safSu=SU1,safSg=Contained_2N,safApp=Contained_2N
2018-09-21 16:58:38.414 SC-1 amf_container_demo[739]: responding with
TRY_AGAIN
2018-09-21 16:58:38.417 SC-1 amf_container_demo[739]:
<===
2018-09-21 16:58:38.418 SC-1 amf_container_demo[739]: =Contained Clean
Up Callback>
2018-09-21 16:58:38.418 SC-1 amf_container_demo[739]:
comp:safComp=Contained_1,safSu=SU1,safSg=Contained_2N,safApp=Contained_2N
2018-09-21 16:58:38.418 SC-1 amf_container_demo[739]:
<===
2018-09-21 16:58:38.422 SC-1 amf_container_demo[739]: =Contained
Instantiate Callback>
2018-09-21 16:58:38.422 SC-1 amf_container_demo[739]:
comp:safComp=Contained_1,safSu=SU1,safSg=Contained_2N,safApp=Contained_2N
2018-09-21 16:58:38.422 SC-1 amf_container_demo[739]:
<===
2018-09-21 16:58:38.424 SC-1 osafamfnd[512]: NO
'safSu=SU1,safSg=Contained_2N,safApp=Contained_2N' Presence State
INSTANTIATING => INSTANTIATED

2018-09-21 17:19:07.776 SC-1 osafamfnd[512]: ER AMFD has unexpectedly
crashed. Rebooting node
2018-09-21 17:19:07.776 SC-1 osafamfnd[512]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: AMFD has unexpectedly crashed. Rebooting node,
OwnNodeId = 131343, SupervisionTime = 60
2018-09-21 17:19:07.778 SC-1 osafimmnd[438]: NO Implementer locally
disconnected. Marking it as doomed 5 <23, 2010f> (safAmfService)
2018-09-21 17:19:07.799 SC-1 opensaf_reboot: Rebooting local node;
timeout=60

Best Regards,
Thuan

-Original Message-
From: nagen...@hasolutions.in  
Sent: Friday, September 7, 2018 12:57 PM
To: Alex Jones ; Gary Lee ;
hans.nordeb...@ericsson.com; ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] amf: add support for container/contained
[#70]

Hi Alex,
Thanks for the patch.
>From my side Ack. I wish that I could have tested the following area (I
assume you would have covered it):
- Headless enabled test cases
 - CSI Dep, SI Dep testing(in 2N red model)
- Combinations of Admin operations on Container and contained (in all 5 red
models for contained) with fault scenarios.
- Escalations of contained components.
 
Thanks,
Nagendra, 91-9866424860
High Availability Solutions Pvt. Ltd. (www.hasolutions.in)
- OpenSAF Support and Services
 
 
 
 
 
 
 
- Original Message - Subject: Re: [PATCH 1/1] amf: add
support for container/contained [#70]
From: "Alex Jones" 
Date: 9/6/18 8:52 pm
To: nagen...@hasolutions.in, "Gary Lee" ,
hans.nordeb...@ericsson.com, ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net

 Hi Nagu,
 Here's a patch that fixes your issue in test #1.
 For the other code review issues, is it OK if I just add them when I
push the final patch. Or do you want to review them now?
 Alex
 
 On 08/30/2018 01:44 AM, nagen...@hasolutions.in wrote:
NOTICE: This email was received from an EXTERNAL sender

 Hi Alex,
Thanks for your response.
 
For Test #2, I had configured all SUs on the single node SC-1. So, 2
container SUs and 2 contained SUs are on the same node. In such cases, we
can have the implementation as having only one SU of that node(higher rank
SUs may be) to be the container for all the contained SUs of that node.
 
 
Thanks,
 Nagendra, 91-9866424860
 High Availability Solutions Pvt. Ltd. (www.hasolutions.in)
 - OpenSAF Support and Services
 
 
 
 
 
 
 
- Original 

Re: [devel] [PATCH 1/1] amf: Recover node that disconnnect from active AMFD [#2880]

2018-07-24 Thread Tran Thuan
Hi Nagu,

I had tested the remote fencing, it works and no multiple reboot to PL.

Jul  9 14:48:27 SC-2-1 osafamfd[4594]: WA avd_msg_sanity_chk: invalid msg id
45, msg type 8, from 2030f should be 1
Jul  9 14:48:27 SC-2-1 osafamfd[4594]: WA avd_msg_sanity_chk: reboot node
2030f to recover it
Jul  9 14:48:27 SC-2-1 osafamfd[4594]: Rebooting OpenSAF NodeId = 131855 EE
Name = PL-2-3, Reason: Fencing remote node, OwnNodeId = 131343,
SupervisionTime = 60
Jul  9 14:48:27 SC-2-1 systemd[1]: Starting Session c3 of user root.
Jul  9 14:48:27 SC-2-1 systemd[1]: Started Session c3 of user root.
Jul  9 14:48:27 SC-2-1 external/libvirt[8939]: notice: Domain FT-REG-12-PL-3
was stopped
Jul  9 14:48:30 SC-2-1 external/libvirt[8939]: notice: Domain FT-REG-12-PL-3
was started
Jul  9 14:48:31 SC-2-1 osafamfd[4594]: WA avd_msg_sanity_chk: invalid node
ID (2030f)
Jul  9 14:48:31 SC-2-1 osafamfd[4594]: WA avd_msg_sanity_chk: invalid node
ID (2030f)


Best Regards,
Thuan

-Original Message-
From: nagen...@hasolutions.in  
Sent: Tuesday, July 24, 2018 1:24 PM
To: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] amf: Recover node that disconnnect from
active AMFD [#2880]

Hi Gary,
Thanks for reminding me. My guess was if there is node leave at SC-1, then
there should be node leave at PL-16 as well. And that need to be debug.
The patch looks ok to me as well except it will result in sending multiple
reboot command to PL-16 if there are more messages from PL-16.
 
Thanks,
Nagendra, 91-9866424860
www.hasolutions.in
https://www.linkedin.com/company/hasolutions/
High Availability Solutions Pvt. Ltd.
- OpenSAF Support and Services
 
- Original Message - Subject: Re: [PATCH 1/1] amf: Recover
node that disconnnect from active AMFD [#2880]
From: "Gary Lee" 
Date: 7/24/18 10:01 am
To: "thuan.tran" , hans.nordeb...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net, nagen...@hasolutions.in

Hi Nagu
 
 Do you have any comments on this? It seems OK to me, but I know you've
worked on similar scenarios with TIPC flickering before, where reboot is
issued from the PL side.
 
 Thanks
 Gary
 
 On 09/07/18 16:37, thuan.tran wrote:
 > There is a abnormal state that AMFND on remote node keep sending  >
message to active AMFD but active AMFD see that node already left.
 > The msg_id expected is not matched and the remote node keep stuck  > as
out of control of active AMFD.
 > In this case, active AMFD can trigger remote fencing for that node  > if
possible, otherwise send reboot order directly.
 > ---
 > src/amf/amfd/ndfsm.cc | 2 --
 > src/amf/amfd/ndproc.cc | 16   > 2 files changed, 16
insertions(+), 2 deletions(-)  >  > diff --git a/src/amf/amfd/ndfsm.cc
b/src/amf/amfd/ndfsm.cc  > index 9d54df13d..2d407be12 100644  > ---
a/src/amf/amfd/ndfsm.cc  > +++ b/src/amf/amfd/ndfsm.cc  > @@ -796,7 +796,6
@@ void avd_mds_avnd_down_evh(AVD_CL_CB *cb, AVD_EVT *evt) {  > */  >
node->node_state = AVD_AVND_STATE_ABSENT;  > node->saAmfNodeOperState =
SA_AMF_OPERATIONAL_DISABLED;  > - node->adest = 0;  > node->rcv_msg_id = 0;
> node->snd_msg_id = 0;  > node->recvr_fail_sw = false;  > @@ -1115,7
+1114,6 @@ void avd_node_mark_absent(AVD_AVND *node) {  >  > LOG_NO("Node
'%s' left the cluster", node->node_name.c_str());  >  > - node->adest = 0;
> node->rcv_msg_id = 0;  > node->snd_msg_id = 0;  > node->recvr_fail_sw =
false;  > diff --git a/src/amf/amfd/ndproc.cc b/src/amf/amfd/ndproc.cc  >
index 428c26085..31d2263d2 100644  > --- a/src/amf/amfd/ndproc.cc  > +++
b/src/amf/amfd/ndproc.cc  > @@ -73,6 +73,22 @@ AVD_AVND
*avd_msg_sanity_chk(AVD_EVT *evt, SaClmNodeIdT node_id,  > LOG_WA("%s:
invalid msg id %u, msg type %u, from %x should be %u",  > __FUNCTION__,
msg_id, evt->info.avnd_msg->msg_type, node_id,  > node->rcv_msg_id + 1);  >
+ if (node->rcv_msg_id == 0) {  > + /* Active AMFD see node left but node
still see active AMFD  > + and keep sending messages with msg_id increment
*/  > + LOG_WA("%s: reboot node %x to recover it", __FUNCTION__, node_id);
> + Consensus consensus_service;  > + if
(consensus_service.IsRemoteFencingEnabled() == true) {  > + std::string
host_name =  > + osaf_extended_name_borrow(>node_info.nodeName);
 > + int first = host_name.find_first_of("=") + 1;  > + int end =
host_name.find_first_of(",");  > + host_name = host_name.substr(first,
end-first);  > + opensaf_reboot(node_id, host_name.c_str(), "Fencing remote
node");  > + } else {  > + avd_send_reboot_msg_directly(node);
 > + }
 > + }
 > return nullptr;
 > }
 >

--
Check out the vibrant tech community on one of the world's most engaging
tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


--
Check out the vibrant tech 

Re: [devel] [PATCH 1/1] smf: Increase cbk count before post the evt to client [#2882]

2018-07-19 Thread Tran Thuan
Hello Nagu,

I misunderstood your comment.
Yes, I need free for evt in case of posting evt fail.
I will send out new version.
Thank you.

Best Regards,
Thuan

-Original Message-
From: Tran Thuan  
Sent: Monday, July 16, 2018 8:22 AM
To: nagen...@hasolutions.in; nguyen.tk@dektech.com.au;
lennart.l...@ericsson.com; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] smf: Increase cbk count before post the evt
to client [#2882]

Hi Nagu,

 

Thanks for your comment.

If m_NCS_IPC_SEND fails, smfa_cbk_ok_resp_process() is called to free
memory.

 

Best Regards,

Thuan

 

From: nagen...@hasolutions.in 
Sent: Friday, July 13, 2018 9:29 PM
To: thuan.tran ; nguyen.tk@dektech.com.au;
lennart.l...@ericsson.com; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] smf: Increase cbk count before post the evt to
client [#2882]

 

Hi Thuan,

Nice work. Ack from me.

It would be nice if the memory allocated above in the function need to be
deallocated if m_NCS_IPC_SEND fails.

 

Thanks,

Nagendra, 91-9866424860

www.hasolutions.in <http://www.hasolutions.in> 

https://www.linkedin.com/company/hasolutions/

High Availability Solutions Pvt. Ltd.

- OpenSAF support and services

 

 

 

 

 

 

 

- Original Message - 

Subject: [PATCH 1/1] smf: Increase cbk count before post the evt to client
[#2882]
From: "thuan.tran" mailto:thuan.t...@dektech.com.au> >
Date: 7/13/18 9:22 am
To: nagen...@hasolutions.in <mailto:nagen...@hasolutions.in> ,
nguyen.tk@dektech.com.au <mailto:nguyen.tk@dektech.com.au> ,
lennart.l...@ericsson.com <mailto:lennart.l...@ericsson.com> ,
gary@dektech.com.au <mailto:gary@dektech.com.au>
Cc: opensaf-devel@lists.sourceforge.net
<mailto:opensaf-devel@lists.sourceforge.net> , "thuan.tran"


Sometimes, callback agent dispatch and fail at saSmfReponse() because cbk
list is empty, agent by somehow handle evt before increase cbk count. To
avoid this, increase cbk count before post the evt.
---
src/smf/agent/smfa_utils.c | 40 +-
1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/src/smf/agent/smfa_utils.c b/src/smf/agent/smfa_utils.c index
fb31a9ae1..3436785cd 100644
--- a/src/smf/agent/smfa_utils.c
+++ b/src/smf/agent/smfa_utils.c
@@ -615,8 +615,8 @@ SMFA_CBK_HDL_LIST *smfa_inv_hdl_add(SaInvocationT
inv_id, SaSmfHandleT hdl) }

/***
-@brief : Match the filter. If matches, post the evt to the client MBX
- and increment the cbk count of the the corresponding hdl node.
+@brief : Match the filter. If matches, increment the cbk count of  the 
+corresponding hdl node and post the evt to the client MBX.
If for a client, more than one scope matches, then those many no of evts are
posted to the MBX.
@param[in] : client_info - For which filter match to be performed.
@@ -694,27 +694,31 @@ uint32_t smfa_cbk_filter_match(SMFA_CLIENT_INFO
*client_info, _evt->object_name), >evt.cbk_evt.object_name);

- if (m_NCS_IPC_SEND(_info->cbk_mbx,
- (NCSCONTEXT)evt,
- NCS_IPC_PRIORITY_NORMAL)) {
- /* Increment the cbk count.*/
- if (NULL != hdl_list) {
- /* There are two scope id
- * matching for the same hdl.*/
- } else {
- /* First scope id matching for
- * this hdl.*/
- hdl_list = smfa_inv_hdl_add(
- cbk_evt->inv_id,
- client_info->client_hdl);
- }
- hdl_list->cnt++;
- rc = NCSCC_RC_SUCCESS;
+ /* Increment the cbk count.*/
+ if (NULL != hdl_list) {
+ /* There are two scope id
+ * matching for the same hdl.*/
} else {
+ /* First scope id matching for
+ * this hdl.*/
+ hdl_list = smfa_inv_hdl_add(
+ cbk_evt->inv_id,
+ client_info->client_hdl);
+ }
+ hdl_list->cnt++;
+ rc = m_NCS_IPC_SEND(
+ _info->cbk_mbx,
+ (NCSCONTEXT)evt,
+ NCS_IPC_PRIORITY_NORMAL);
+ if (rc != NCSCC_RC_SUCCESS) {
LOG_ER(
"SMFA: Posting to MBX failed. hdl: %llu, scoe_id: %u",
client_info->client_hdl, cbk_evt->scope_id);
+ /* Descrease the cbk count */
+ smfa_cbk_ok_resp_process(
+ client_info->client_hdl,
+ cbk_evt->inv_id);
}

/* If one of the filter matches then go to the
--
2.18.0


--
Check out the vibrant tech community on one of the world's most engaging
tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


  1   2   >