Re: [devel] [PATCH 0/5] Review Request for rded: add relaxed node promotion feature [#2996]

2019-01-21 Thread Minh Hon Chau

Hi Gary,

I'm trying to understand the patch 3/5 and 4/5, there seems to be logic 
of *relaxed mode* left in 3/5 and 4/5.


Thanks

Minh

On 21/1/19 2:52 pm, Gary Lee wrote:

Summary: rded: add relaxed node promotion feature [#2996]
Review request for Ticket(s): 2996
Peer Reviewer(s): Hans, Minh
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2996
Base revision: 35035599567d1add6975a89f1286f20738d67bf1
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

  Docsn
  Build systemn
  RPM/packaging   n
  Configuration files n
  Startup scripts n
  SAF servicesn
  OpenSAF servicesy
  Core libraries  y
  Samples n
  Tests   n
  Other   n


Comments (indicate scope for each "y" above):
-

revision 9a681198810be2e2ad3f512ff966fe1d9eceb1ab
Author: Gary Lee 
Date:   Mon, 21 Jan 2019 14:35:49 +1100

rded: add relaxed node promotion feature [#2996]

Allow promotion of node to active at cluster startup, even if the
consensus service is unavailable, if the peer SC can be seen.

During normal cluster operation, if the consensus service becomes
unavailable but the peer SC can still be seen, allow the existing
active SC to remain active.

A new NCSMDS_SVC_ID_RDE_DISCOVERY service ID is exported by rded.
This is installed as soon as rded is started, unlike
NCSMDS_SVC_ID_RDE which is only installed when it becomes
a candidate for election.



revision d2fad05f5ab3b502403493763f5f2bb31608444f
Author: Gary Lee 
Date:   Mon, 21 Jan 2019 14:35:49 +1100

amfd: allow node to remain active is peer SC can be seen [#2996]

If relaxed node promotion is enabled, allow a SC to remain
active if the peer SC can be seen, even if access to the
consensus service is lost.



revision 4e1bbbd4997a6ea8307695e81a64dd9c53da15aa
Author: Gary Lee 
Date:   Mon, 21 Jan 2019 14:35:42 +1100

osaf: allow active SC to be preferred during network split [#2996]

Add FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE option to allow
active SC to be preferred during a network split. The default
behavior is to prefer the larger partition to maintain
existing behaviour.

Add configuration support for FMS_RELAXED_NODE_PROMOTION.



revision 7b50ffd37aafb82e71c726781824f8d6883c5aa5
Author: Gary Lee 
Date:   Mon, 21 Jan 2019 14:27:38 +1100

fmd: add configuration parameters [#2996]

Add parameters FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE and
FMS_RELAXED_NODE_PROMOTION.



revision 1bb52d591e6014e013c8335f7f1a1f516ecc8566
Author: Gary Lee 
Date:   Mon, 21 Jan 2019 14:01:08 +1100

osaf: update etcd3 to poll instead of watch [#2996]

The 'watch' command does not return if the etcd server goes down.
We need to poll the etcd server to properly check we still have
connectivity to the etcd server.



Complete diffstat:
--
  src/amf/amfd/ndfsm.cc   |  2 +-
  src/amf/amfd/ndproc.cc  | 13 -
  src/amf/amfd/proc.h |  2 +-
  src/fm/fmd/fmd.conf | 17 ++
  src/mds/mds_papi.h  |  1 +
  src/osaf/consensus/consensus.cc | 39 -
  src/osaf/consensus/consensus.h  |  9 ++-
  src/osaf/consensus/key_value.cc |  8 ++-
  src/osaf/consensus/plugins/etcd3.plugin | 50 +
  src/rde/rded/rde_cb.h   | 12 +++-
  src/rde/rded/rde_main.cc| 71 +---
  src/rde/rded/rde_mds.cc | 94 ++--
  src/rde/rded/role.cc| 97 +
  src/rde/rded/role.h |  4 +-
  14 files changed, 375 insertions(+), 44 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
Ack from any reviewer, or in 1 week

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
 that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible li

Re: [devel] [PATCH 5/5] logd: fix crash in logd [#2999]

2019-01-21 Thread Lennart Lund
Hi Alex,


Ack, for this part.


Please add information in the commit message(s) about what's fixed. Just to say 
that a crash is fixed is not enough. There should be some info about what is 
crashing and how the fix is fixing it. This is applicable for the other parts 
as well, at least if the fix affects the code in a significant way.


Thanks

Lennart



Från: Jones, Alex 
Skickat: den 17 januari 2019 20:27
Till: Hans Nordebäck; Lennart Lund; Anders Widell; Vu Minh Nguyen
Kopia: opensaf-devel@lists.sourceforge.net; Jones, Alex
Ämne: [PATCH 5/5] logd: fix crash in logd [#2999]

---
src/log/logd/lgs_dest.cc | 2 +-
src/log/logd/lgs_unixsock_dest.cc | 5 ++---
2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/log/logd/lgs_dest.cc b/src/log/logd/lgs_dest.cc
index 0bd1fd3b0..0e70ddd12 100644
--- a/src/log/logd/lgs_dest.cc
+++ b/src/log/logd/lgs_dest.cc
@@ -114,7 +114,7 @@ void DestinationHandler::UpdateDestTypeDb(

case MsgType::kNoDest: {
TRACE("%s kNoDest", __func__);
- for (auto& it : nametype_map_) nametype_map_.erase(it.first);
+ nametype_map_.clear();
break;
}

diff --git a/src/log/logd/lgs_unixsock_dest.cc 
b/src/log/logd/lgs_unixsock_dest.cc
index 35ef43175..a48250063 100644
--- a/src/log/logd/lgs_unixsock_dest.cc
+++ b/src/log/logd/lgs_unixsock_dest.cc
@@ -329,10 +329,9 @@ ErrCode UnixSocketType::ProcessMsg(const 
DestinationHandler::HandleMsg& msg) {
return ErrCode::kDrop;
}

- for (auto& it : name_sockethdlr_map_) {
+ for (auto& it : name_sockethdlr_map_)
if (it.second != nullptr) delete it.second;
- name_sockethdlr_map_.erase(it.first);
- }
+ name_sockethdlr_map_.clear();
cfgsent = false;
break;
}
--
2.17.2



Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that is confidential and/or proprietary for the sole 
use of the intended recipient. Any review, disclosure, reliance or distribution 
by others or forwarding without express permission is strictly prohibited. If 
you are not the intended recipient, please notify the sender immediately and 
then delete all copies, including any attachments.


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 4/5] logd: fix truncation warnings [#2999]

2019-01-21 Thread Lennart Lund
Hi Alex,


Ack, for the change as such. There should be a note in the function header 
informing that information fields has a max size and that information exceeding 
this size will be truncated.


Thanks

Lennart



Från: Jones, Alex 
Skickat: den 17 januari 2019 20:27
Till: Hans Nordebäck; Lennart Lund; Anders Widell; Vu Minh Nguyen
Kopia: opensaf-devel@lists.sourceforge.net; Jones, Alex
Ämne: [PATCH 4/5] logd: fix truncation warnings [#2999]

---
src/log/logd/lgs_util.cc | 13 -
1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/src/log/logd/lgs_util.cc b/src/log/logd/lgs_util.cc
index 9b0707fd7..28b2a3271 100644
--- a/src/log/logd/lgs_util.cc
+++ b/src/log/logd/lgs_util.cc
@@ -199,19 +199,14 @@ char *lgs_get_time(time_t *time_in) {
osafassert(timeStampData);

stringSize = 5 * sizeof(char);
- if (snprintf(srcString, (size_t)stringSize, "%d",
- (timeStampData->tm_year + START_YEAR)) >=
- static_cast(stringSize)) {
- LOG_WA("truncation on log year: %s", srcString);
- }
+ snprintf(srcString, (size_t)stringSize, "%u",
+ (timeStampData->tm_year + START_YEAR));

strncpy(timeStampString, srcString, stringSize);

stringSize = 3 * sizeof(char);
- if (snprintf(srcString, (size_t)stringSize, "%02d",
- timeStampData->tm_mon + 1) >= static_cast(stringSize)) {
- LOG_WA("trunction on log month: %s", srcString);
- }
+ snprintf(srcString, (size_t)stringSize, "%02u",
+ timeStampData->tm_mon + 1);

strncat(timeStampString, srcString, stringSize);

--
2.17.2



Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that is confidential and/or proprietary for the sole 
use of the intended recipient. Any review, disclosure, reliance or distribution 
by others or forwarding without express permission is strictly prohibited. If 
you are not the intended recipient, please notify the sender immediately and 
then delete all copies, including any attachments.


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 3/5] log: fix truncation warning [#2999]

2019-01-21 Thread Lennart Lund
Hi Alex,


Ack


Thanks

Lennart



Från: Jones, Alex 
Skickat: den 17 januari 2019 20:27
Till: Hans Nordebäck; Lennart Lund; Anders Widell; Vu Minh Nguyen
Kopia: opensaf-devel@lists.sourceforge.net; Jones, Alex
Ämne: [PATCH 3/5] log: fix truncation warning [#2999]

---
src/log/apitest/tet_LogOiOps.c | 7 +--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/log/apitest/tet_LogOiOps.c b/src/log/apitest/tet_LogOiOps.c
index 23fe13838..e74cd5331 100644
--- a/src/log/apitest/tet_LogOiOps.c
+++ b/src/log/apitest/tet_LogOiOps.c
@@ -99,8 +99,11 @@ void m_restoreData(SaNameT objName, char *attr, void *v_attr,
attr, *(SaUint64T *)v_attr);
break;
case SA_IMM_ATTR_SASTRINGT:
- sprintf(command, "immcfg %s -a %s='%s' 2> /dev/null", name,
- attr, (char *)v_attr);
+ if (snprintf(command, MAX_DATA,
+ "immcfg %s -a %s='%s' 2> /dev/null", name,
+ attr, (char *)v_attr) >= MAX_DATA) {
+ fprintf(stderr, "command truncated: %s", command);
+ }
break;
default:
fprintf(stderr, "Unsupported data type (%s) \n", attr);
--
2.17.2



Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that is confidential and/or proprietary for the sole 
use of the intended recipient. Any review, disclosure, reliance or distribution 
by others or forwarding without express permission is strictly prohibited. If 
you are not the intended recipient, please notify the sender immediately and 
then delete all copies, including any attachments.


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 2/5] smfd: fix uninitialized variable warning [#2999]

2019-01-21 Thread Lennart Lund
Hi ALex,


Ack


Thanks

Lennart



Från: Jones, Alex 
Skickat: den 17 januari 2019 20:27
Till: Hans Nordebäck; Lennart Lund; Anders Widell; Vu Minh Nguyen
Kopia: opensaf-devel@lists.sourceforge.net; Jones, Alex
Ämne: [PATCH 2/5] smfd: fix uninitialized variable warning [#2999]

---
src/smf/smfd/SmfUpgradeStep.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/smf/smfd/SmfUpgradeStep.cc b/src/smf/smfd/SmfUpgradeStep.cc
index 80da668de..0a8d52363 100644
--- a/src/smf/smfd/SmfUpgradeStep.cc
+++ b/src/smf/smfd/SmfUpgradeStep.cc
@@ -1419,7 +1419,7 @@ SaAisErrorT SmfUpgradeStep::calculateStepType() {
will be ordered within the step */

SaAisErrorT rc;
- bool isSingleNode;
+ bool isSingleNode(false);
if ((rc = this->isSingleNodeSystem(isSingleNode)) != SA_AIS_OK) {
LOG_NO("Fail to read if this is a single node system rc=%s",
saf_error(rc));
--
2.17.2



Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that is confidential and/or proprietary for the sole 
use of the intended recipient. Any review, disclosure, reliance or distribution 
by others or forwarding without express permission is strictly prohibited. If 
you are not the intended recipient, please notify the sender immediately and 
then delete all copies, including any attachments.


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/5] osaf: update etcd3 to poll instead of watch [#2996]

2019-01-21 Thread Hans Nordebäck
ack, review only/Thanks HansN

On 1/21/19 04:52, Gary Lee wrote:
> The 'watch' command does not return if the etcd server goes down.
> We need to poll the etcd server to properly check we still have
> connectivity to the etcd server.
> ---
>   src/osaf/consensus/plugins/etcd3.plugin | 50 
> ++---
>   1 file changed, 40 insertions(+), 10 deletions(-)
>
> diff --git a/src/osaf/consensus/plugins/etcd3.plugin 
> b/src/osaf/consensus/plugins/etcd3.plugin
> index b3814c9..4998df0 100644
> --- a/src/osaf/consensus/plugins/etcd3.plugin
> +++ b/src/osaf/consensus/plugins/etcd3.plugin
> @@ -17,9 +17,12 @@
>   # backward compatible. This plugin may need to be adapted.
>   
>   readonly keyname="opensaf_consensus_lock"
> +readonly takeover_request="takeover_request"
> +readonly node_name_file="/etc/opensaf/node_name"
>   readonly directory="/opensaf/"
>   readonly etcd_options=""
> -readonly etcd_timeout="10s"
> +readonly etcd_timeout="3s"
> +readonly heartbeat_interval=2
>   
>   export ETCDCTL_API=3
>   
> @@ -29,7 +32,8 @@ export ETCDCTL_API=3
>   #   $1 - 
>   # returns:
>   #   0 - success,  is echoed to stdout
> -#   non-zero - failure
> +#   1 - invalid param
> +#   other - failure
>   get() {
> readonly key="$1"
>   
> @@ -51,7 +55,7 @@ get() {
> return 1
>   fi
> else
> -return 1
> +return 2
> fi
>   }
>   
> @@ -101,7 +105,8 @@ setkey() {
>   # returns:
>   #   0 - success
>   #   1 - already exists
> -#   2 or above - other failure
> +#   2 - invalid param
> +#   3 or above - other failure
>   create_key() {
> readonly key="$1"
> readonly value="$2"
> @@ -114,7 +119,7 @@ create_key() {
> lease_id=$(echo $output | awk '{print $2}')
> lease_param="--lease="$lease_id""
>   else
> -  return 2
> +  return 3
>   fi
> else
>   lease_param=""
> @@ -135,7 +140,7 @@ create_key() {
> then
>   return 1
> else
> -return 2
> +return 3
> fi
>   }
>   
> @@ -149,6 +154,7 @@ create_key() {
>   #   $4 - 
>   # returns:
>   #   0 - success
> +#   1 - invalid param
>   #   non-zero - failure
>   setkey_match_prev() {
> readonly key="$1"
> @@ -326,10 +332,34 @@ unlock() {
>   #   non-zero - failure
>   watch() {
> readonly watch_key="$1"
> -  etcdctl $etcd_options --dial-timeout $etcd_timeout \
> -watch "$directory$watch_key" | grep -m0 \"\" 2>&1
> -  get "$watch_key"
> -  return 0
> +
> +  # get baseline
> +  orig_value=$(get "$watch_key")
> +  result=$?
> +
> +  if [ "$result" -le "1" ]; then
> +while true
> +do
> +  sleep $heartbeat_interval
> +  current_value=$(get "$watch_key")
> +  result=$?
> +  if [ "$result" -gt "1" ]; then
> +# etcd down?
> +if [ "$watch_key" == "$takeover_request" ]; then
> +  hostname=`cat $node_name_file`
> +  echo "$hostname SC-0 1000 UNDEFINED"
> +  return 0
> +else
> +  return 1
> +fi
> +  elif [ "$orig_value" != "$current_value" ]; then
> +echo $current_value
> +return 0
> +  fi
> +done
> +  fi
> +
> +  return 1
>   }
>   
>   # argument parsing

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 2/5] fmd: add configuration parameters [#2996]

2019-01-21 Thread Hans Nordebäck
ack, review only/Thanks HansN

On 1/21/19 04:52, Gary Lee wrote:
> Add parameters FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE and
> FMS_RELAXED_NODE_PROMOTION.
> ---
>   src/fm/fmd/fmd.conf | 17 +
>   1 file changed, 17 insertions(+)
>
> diff --git a/src/fm/fmd/fmd.conf b/src/fm/fmd/fmd.conf
> index 9a106bf..209e484 100644
> --- a/src/fm/fmd/fmd.conf
> +++ b/src/fm/fmd/fmd.conf
> @@ -30,6 +30,23 @@ export FMS_TAKEOVER_REQUEST_VALID_TIME=20
>   # Full path to key-value store plugin
>   #export FMS_KEYVALUE_STORE_PLUGIN_CMD=
>   
> +# In the event of SCs being split into network partitions, we can try to make
> +# the active SC reside in the largest network partition. If it is preferable
> +# to keep the current SC active, then set this to 0
> +# Default is 1
> +#export FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE=1
> +
> +# Default behaviour is not to allow promotion of this node to Active
> +# unless a lock can be obtained, if split brain prevention is enabled.
> +# Uncomment the next line to allow promotion of this node at cluster startup,
> +# if a peer SC can be seen and we have a lower node ID, in the event the
> +# consensus service is not available.
> +# Also if the consensus service is down, but a peer SC can be seen,
> +# then an active SC may remain active.
> +# This mode should not be used together with the roaming SC feature
> +# Default is 0
> +#export FMS_RELAXED_NODE_PROMOTION=0
> +
>   # FM will supervise transitions to the ACTIVE role when this variable is 
> set to
>   # a non-zero value. The value is the time in the unit of 10 ms to wait for a
>   # role change to ACTIVE to take effect. If AMF has not give FM an active

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 5/5] logd: fix crash in logd [#2999]

2019-01-21 Thread Lennart Lund
Hi Alex,


Ack.


You have added a printout or log WA if data is truncated in many places but 
have removed that warning in some other. What is the logic for this. It is that 
in some cases there will be an error if the truncation is done e.g. if a 
command line command is created and in other cases it will just be a message 
that is truncated?


Thanks
Lennart


Från: Jones, Alex 
Skickat: den 17 januari 2019 20:27
Till: Hans Nordebäck; Lennart Lund; Anders Widell; Vu Minh Nguyen
Kopia: opensaf-devel@lists.sourceforge.net; Jones, Alex
Ämne: [PATCH 5/5] logd: fix crash in logd [#2999]

---
src/log/logd/lgs_dest.cc | 2 +-
src/log/logd/lgs_unixsock_dest.cc | 5 ++---
2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/log/logd/lgs_dest.cc b/src/log/logd/lgs_dest.cc
index 0bd1fd3b0..0e70ddd12 100644
--- a/src/log/logd/lgs_dest.cc
+++ b/src/log/logd/lgs_dest.cc
@@ -114,7 +114,7 @@ void DestinationHandler::UpdateDestTypeDb(

case MsgType::kNoDest: {
TRACE("%s kNoDest", __func__);
- for (auto& it : nametype_map_) nametype_map_.erase(it.first);
+ nametype_map_.clear();
break;
}

diff --git a/src/log/logd/lgs_unixsock_dest.cc 
b/src/log/logd/lgs_unixsock_dest.cc
index 35ef43175..a48250063 100644
--- a/src/log/logd/lgs_unixsock_dest.cc
+++ b/src/log/logd/lgs_unixsock_dest.cc
@@ -329,10 +329,9 @@ ErrCode UnixSocketType::ProcessMsg(const 
DestinationHandler::HandleMsg& msg) {
return ErrCode::kDrop;
}

- for (auto& it : name_sockethdlr_map_) {
+ for (auto& it : name_sockethdlr_map_)
if (it.second != nullptr) delete it.second;
- name_sockethdlr_map_.erase(it.first);
- }
+ name_sockethdlr_map_.clear();
cfgsent = false;
break;
}
--
2.17.2



Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that is confidential and/or proprietary for the sole 
use of the intended recipient. Any review, disclosure, reliance or distribution 
by others or forwarding without express permission is strictly prohibited. If 
you are not the intended recipient, please notify the sender immediately and 
then delete all copies, including any attachments.


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/5] all: fix warnings from gcc 8.2.1 [#2999]

2019-01-21 Thread Lennart Lund
Hi Alex,


Ack


Thanks

Lennart



Från: Jones, Alex 
Skickat: den 17 januari 2019 20:27
Till: Hans Nordebäck; Lennart Lund; Anders Widell; Vu Minh Nguyen
Kopia: opensaf-devel@lists.sourceforge.net; Jones, Alex
Ämne: [PATCH 1/5] all: fix warnings from gcc 8.2.1 [#2999]

These are mostly string overflow issues.
---
src/base/conf.cc | 4 +--
src/base/daemon.c | 7 +++--
src/dtm/dtmnd/dtm_main.cc | 2 +-
src/dtm/dtmnd/multicast.cc | 2 +-
src/imm/immnd/immnd_proc.c | 2 +-
src/imm/immnd/immnd_utils.cc | 7 +++--
src/log/apitest/logtestfr.c | 17 ---
src/log/apitest/saflogtest.c | 7 +++--
src/log/apitest/tet_LogOiOps.c | 49 +++---
src/log/apitest/tet_Log_recov.c | 8 -
src/log/apitest/tet_log_longDN.c | 7 +++--
src/log/logd/lgs_util.cc | 12 ++--
src/mds/mds_c_db.c | 34 +
src/nid/nodeinit.cc | 5 ++-
src/ntf/apitest/tet_ntf_clm.cc | 26 +++-
src/rde/rded/rde_amf.cc | 2 +-
src/smf/smfd/SmfUpgradeCampaign.cc | 2 +-
17 files changed, 123 insertions(+), 70 deletions(-)

diff --git a/src/base/conf.cc b/src/base/conf.cc
index d5755a11f..e27e7167a 100644
--- a/src/base/conf.cc
+++ b/src/base/conf.cc
@@ -189,7 +189,7 @@ std::string Conf::ReadFile(const std::string& path_name,
try {
str.open(path_name);
str >> contents;
- } catch (std::ifstream::failure) {
+ } catch (std::ifstream::failure&) {
contents.clear();
}
return (str.fail() || contents.empty()) ? default_contents : contents;
@@ -203,7 +203,7 @@ void Conf::WriteFileAtomically(const std::string& path_name,
try {
str.open(tmp_file, std::ofstream::out | std::ofstream::trunc);
str << contents << std::endl;
- } catch (std::ofstream::failure) {
+ } catch (std::ofstream::failure&) {
success = false;
}
str.close();
diff --git a/src/base/daemon.c b/src/base/daemon.c
index cdde7fde0..f5a43c93d 100644
--- a/src/base/daemon.c
+++ b/src/base/daemon.c
@@ -99,7 +99,8 @@ static int __create_pidfile(const char *pidfile)
pid_t pid;

pid = getpid();
- snprintf(pidfiletmp, NAME_MAX, "%s.%u.tmp", pidfile, pid);
+ if (snprintf(pidfiletmp, NAME_MAX, "%s.%u.tmp", pidfile, pid) >= NAME_MAX)
+ syslog(LOG_WARNING,"truncation occurred writing pid file: %s", pidfiletmp);

/* open the file and associate a stream with it */
if (((fd = open(pidfiletmp, O_RDWR | O_CREAT, 0644)) == -1) ||
@@ -289,8 +290,8 @@ void daemonize(int argc, char *argv[])
int prio_val = sched_get_priority_min(policy);
int i;
char t_str[256];
- char buf1[256] = {0};
- char buf2[256] = {0};
+ char buf1[256 + sizeof("_SCHED_PRIORITY")] = {0};
+ char buf2[256 + sizeof("_SCHED_POLICY")] = {0};

internal_version_id_ = strdup("@(#) $Id: " INTERNAL_VERSION_ID " $");

diff --git a/src/dtm/dtmnd/dtm_main.cc b/src/dtm/dtmnd/dtm_main.cc
index 585e11ed1..04230165d 100644
--- a/src/dtm/dtmnd/dtm_main.cc
+++ b/src/dtm/dtmnd/dtm_main.cc
@@ -367,7 +367,7 @@ void UpdateNodeIdFile(DTM_INTERNODE_CB *cb) {
try {
str.open(PKGLOCALSTATEDIR "/node_id", std::ofstream::out);
str << std::hex << node_id << std::endl;
- } catch (std::ofstream::failure) {
+ } catch (std::ofstream::failure&) {
}
str.close();
}
diff --git a/src/dtm/dtmnd/multicast.cc b/src/dtm/dtmnd/multicast.cc
index cadc002bd..9752a01c9 100644
--- a/src/dtm/dtmnd/multicast.cc
+++ b/src/dtm/dtmnd/multicast.cc
@@ -199,7 +199,7 @@ bool Multicast::GetPeersFromFile(const std::string 
&path_name) {
}
}
}
- } catch (std::ifstream::failure) {
+ } catch (std::ifstream::failure&) {
LOG_ER("Caught std::ifstream::failure when reading file '%s', peers=%zu",
path_name.c_str(), static_cast(peers_.size()));
peers_.clear();
diff --git a/src/imm/immnd/immnd_proc.c b/src/imm/immnd/immnd_proc.c
index 48363b1a4..87b8a728d 100644
--- a/src/imm/immnd/immnd_proc.c
+++ b/src/imm/immnd/immnd_proc.c
@@ -1887,7 +1887,7 @@ static int immnd_forkPbe(IMMND_CB *cb)
char execPath[1024];
char dbFilePath[1024];
int pid = (-1);
- int execDirLen = (int)(strlen(cb->mProgName) - strlen(base));
+ size_t execDirLen = strlen(cb->mProgName) - strlen(base);
int dirLen = (int)strlen(cb->mDir);
int pbeLen = (int)strlen(cb->mPbeFile);
int i, j;
diff --git a/src/imm/immnd/immnd_utils.cc b/src/imm/immnd/immnd_utils.cc
index f1dae4452..351a0b2cd 100644
--- a/src/imm/immnd/immnd_utils.cc
+++ b/src/imm/immnd/immnd_utils.cc
@@ -192,8 +192,11 @@ void CollectRecentFevsLogData(const IMMND_EVT *evt, 
SaUint64T msg_no) {
break;
}

- snprintf(evt_data, MAX_LENGTH_FEVS_MSG,
- "<%llu>[%s -> %s]", msg_no, evt_name, evt_info);
+ if (snprintf(evt_data, MAX_LENGTH_FEVS_MSG,
+ "<%llu>[%s -> %s]", msg_no, evt_name, evt_info) >=
+ MAX_LENGTH_FEVS_MSG) {
+ LOG_WA("truncation occurred on evt_data: %s", evt_data);
+ }

static int index = 0;
latest_fevs[index] = std::string(evt_data);
diff --git a/src/log/apitest/logtestfr.c b/src/log/apitest/logtestfr.c
index a11024025..7ed82767c 100644
--- a/src/log/apitest/logtestfr.c
+++ b/src/log/apitest/logtestfr.c
@@ -452,7 +452,8 @@ static int truncate_filenames(void)
size_t str_end;

/* Create path to log files */
- snpr

Re: [devel] [PATCH 5/5] logd: fix crash in logd [#2999]

2019-01-21 Thread Jones, Alex
Hi Lennart,


  The reason I removed it in some cases is because the warning was removed 
simply by using "%u" instead of "%d". I thought it was better to have no call 
to LOG_WA, than to add one. If you think we should still have a warning 
printed, I can add it.


  I will add a reason for the logd crash in the commit.


Alex


From: Lennart Lund 
Sent: Monday, January 21, 2019 9:20:12 AM
To: Jones, Alex; Hans Nordebäck; Anders Widell; Vu Minh Nguyen
Cc: opensaf-devel@lists.sourceforge.net
Subject: SV: [PATCH 5/5] logd: fix crash in logd [#2999]


NOTICE: This email was received from an EXTERNAL sender



Hi Alex,


Ack.


You have added a printout or log WA if data is truncated in many places but 
have removed that warning in some other. What is the logic for this. It is that 
in some cases there will be an error if the truncation is done e.g. if a 
command line command is created and in other cases it will just be a message 
that is truncated?


Thanks
Lennart


Från: Jones, Alex 
Skickat: den 17 januari 2019 20:27
Till: Hans Nordebäck; Lennart Lund; Anders Widell; Vu Minh Nguyen
Kopia: opensaf-devel@lists.sourceforge.net; Jones, Alex
Ämne: [PATCH 5/5] logd: fix crash in logd [#2999]

---
src/log/logd/lgs_dest.cc | 2 +-
src/log/logd/lgs_unixsock_dest.cc | 5 ++---
2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/log/logd/lgs_dest.cc b/src/log/logd/lgs_dest.cc
index 0bd1fd3b0..0e70ddd12 100644
--- a/src/log/logd/lgs_dest.cc
+++ b/src/log/logd/lgs_dest.cc
@@ -114,7 +114,7 @@ void DestinationHandler::UpdateDestTypeDb(

case MsgType::kNoDest: {
TRACE("%s kNoDest", __func__);
- for (auto& it : nametype_map_) nametype_map_.erase(it.first);
+ nametype_map_.clear();
break;
}

diff --git a/src/log/logd/lgs_unixsock_dest.cc 
b/src/log/logd/lgs_unixsock_dest.cc
index 35ef43175..a48250063 100644
--- a/src/log/logd/lgs_unixsock_dest.cc
+++ b/src/log/logd/lgs_unixsock_dest.cc
@@ -329,10 +329,9 @@ ErrCode UnixSocketType::ProcessMsg(const 
DestinationHandler::HandleMsg& msg) {
return ErrCode::kDrop;
}

- for (auto& it : name_sockethdlr_map_) {
+ for (auto& it : name_sockethdlr_map_)
if (it.second != nullptr) delete it.second;
- name_sockethdlr_map_.erase(it.first);
- }
+ name_sockethdlr_map_.clear();
cfgsent = false;
break;
}
--
2.17.2



Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that is confidential and/or proprietary for the sole 
use of the intended recipient. Any review, disclosure, reliance or distribution 
by others or forwarding without express permission is strictly prohibited. If 
you are not the intended recipient, please notify the sender immediately and 
then delete all copies, including any attachments.


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 5/5] logd: fix crash in logd [#2999]

2019-01-21 Thread Lennart Lund
Hi Alex,

I have no comments on this. If you think the printout is not needed then I see 
no reason not to remove it
If I understand you correctly you mean that the possibility of truncation is 
removed and no printout is needed.

Thanks
Lennart

From: Jones, Alex 
Sent: den 21 januari 2019 15:30
To: Lennart Lund ; Hans Nordebäck 
; Anders Widell ; Vu 
Minh Nguyen 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 5/5] logd: fix crash in logd [#2999]


Hi Lennart,



  The reason I removed it in some cases is because the warning was removed 
simply by using "%u" instead of "%d". I thought it was better to have no call 
to LOG_WA, than to add one. If you think we should still have a warning 
printed, I can add it.



  I will add a reason for the logd crash in the commit.



Alex


From: Lennart Lund mailto:lennart.l...@ericsson.com>>
Sent: Monday, January 21, 2019 9:20:12 AM
To: Jones, Alex; Hans Nordebäck; Anders Widell; Vu Minh Nguyen
Cc: 
opensaf-devel@lists.sourceforge.net
Subject: SV: [PATCH 5/5] logd: fix crash in logd [#2999]


NOTICE: This email was received from an EXTERNAL sender



Hi Alex,



Ack.



You have added a printout or log WA if data is truncated in many places but 
have removed that warning in some other. What is the logic for this. It is that 
in some cases there will be an error if the truncation is done e.g. if a 
command line command is created and in other cases it will just be a message 
that is truncated?


Thanks
Lennart

Från: Jones, Alex mailto:ajo...@rbbn.com>>
Skickat: den 17 januari 2019 20:27
Till: Hans Nordebäck; Lennart Lund; Anders Widell; Vu Minh Nguyen
Kopia: 
opensaf-devel@lists.sourceforge.net;
 Jones, Alex
Ämne: [PATCH 5/5] logd: fix crash in logd [#2999]

---
src/log/logd/lgs_dest.cc | 2 +-
src/log/logd/lgs_unixsock_dest.cc | 5 ++---
2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/log/logd/lgs_dest.cc b/src/log/logd/lgs_dest.cc
index 0bd1fd3b0..0e70ddd12 100644
--- a/src/log/logd/lgs_dest.cc
+++ b/src/log/logd/lgs_dest.cc
@@ -114,7 +114,7 @@ void DestinationHandler::UpdateDestTypeDb(

case MsgType::kNoDest: {
TRACE("%s kNoDest", __func__);
- for (auto& it : nametype_map_) nametype_map_.erase(it.first);
+ nametype_map_.clear();
break;
}

diff --git a/src/log/logd/lgs_unixsock_dest.cc 
b/src/log/logd/lgs_unixsock_dest.cc
index 35ef43175..a48250063 100644
--- a/src/log/logd/lgs_unixsock_dest.cc
+++ b/src/log/logd/lgs_unixsock_dest.cc
@@ -329,10 +329,9 @@ ErrCode UnixSocketType::ProcessMsg(const 
DestinationHandler::HandleMsg& msg) {
return ErrCode::kDrop;
}

- for (auto& it : name_sockethdlr_map_) {
+ for (auto& it : name_sockethdlr_map_)
if (it.second != nullptr) delete it.second;
- name_sockethdlr_map_.erase(it.first);
- }
+ name_sockethdlr_map_.clear();
cfgsent = false;
break;
}
--
2.17.2


Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that is confidential and/or proprietary for the sole 
use of the intended recipient. Any review, disclosure, reliance or distribution 
by others or forwarding without express permission is strictly prohibited. If 
you are not the intended recipient, please notify the sender immediately and 
then delete all copies, including any attachments.


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0/5] Review Request for rded: add relaxed node promotion feature [#2996]

2019-01-21 Thread Gary Lee

Hi Minh

[3] contains the osaf/consensus changes to support the two new 
parameters. I will change the subject line to make it clearer:


osaf: add support for FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE and 
FMS_RELAXED_NODE_PROMOTION


[4] contains the changes required for FMS_RELAXED_NODE_PROMOTION in 
amfd. I intentionally separated it out from [5], so [5] has rded changes 
only.


On 21/1/19 9:43 pm, Minh Hon Chau wrote:

Hi Gary,

I'm trying to understand the patch 3/5 and 4/5, there seems to be 
logic of *relaxed mode* left in 3/5 and 4/5.


Thanks

Minh

On 21/1/19 2:52 pm, Gary Lee wrote:

Summary: rded: add relaxed node promotion feature [#2996]
Review request for Ticket(s): 2996
Peer Reviewer(s): Hans, Minh
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2996
Base revision: 35035599567d1add6975a89f1286f20738d67bf1
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

  Docs    n
  Build system    n
  RPM/packaging   n
  Configuration files n
  Startup scripts n
  SAF services    n
  OpenSAF services    y
  Core libraries  y
  Samples n
  Tests   n
  Other   n


Comments (indicate scope for each "y" above):
-

revision 9a681198810be2e2ad3f512ff966fe1d9eceb1ab
Author:    Gary Lee 
Date:    Mon, 21 Jan 2019 14:35:49 +1100

rded: add relaxed node promotion feature [#2996]

Allow promotion of node to active at cluster startup, even if the
consensus service is unavailable, if the peer SC can be seen.

During normal cluster operation, if the consensus service becomes
unavailable but the peer SC can still be seen, allow the existing
active SC to remain active.

A new NCSMDS_SVC_ID_RDE_DISCOVERY service ID is exported by rded.
This is installed as soon as rded is started, unlike
NCSMDS_SVC_ID_RDE which is only installed when it becomes
a candidate for election.



revision d2fad05f5ab3b502403493763f5f2bb31608444f
Author:    Gary Lee 
Date:    Mon, 21 Jan 2019 14:35:49 +1100

amfd: allow node to remain active is peer SC can be seen [#2996]

If relaxed node promotion is enabled, allow a SC to remain
active if the peer SC can be seen, even if access to the
consensus service is lost.



revision 4e1bbbd4997a6ea8307695e81a64dd9c53da15aa
Author:    Gary Lee 
Date:    Mon, 21 Jan 2019 14:35:42 +1100

osaf: allow active SC to be preferred during network split [#2996]

Add FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE option to allow
active SC to be preferred during a network split. The default
behavior is to prefer the larger partition to maintain
existing behaviour.

Add configuration support for FMS_RELAXED_NODE_PROMOTION.



revision 7b50ffd37aafb82e71c726781824f8d6883c5aa5
Author:    Gary Lee 
Date:    Mon, 21 Jan 2019 14:27:38 +1100

fmd: add configuration parameters [#2996]

Add parameters FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE and
FMS_RELAXED_NODE_PROMOTION.



revision 1bb52d591e6014e013c8335f7f1a1f516ecc8566
Author:    Gary Lee 
Date:    Mon, 21 Jan 2019 14:01:08 +1100

osaf: update etcd3 to poll instead of watch [#2996]

The 'watch' command does not return if the etcd server goes down.
We need to poll the etcd server to properly check we still have
connectivity to the etcd server.



Complete diffstat:
--
  src/amf/amfd/ndfsm.cc   |  2 +-
  src/amf/amfd/ndproc.cc  | 13 -
  src/amf/amfd/proc.h |  2 +-
  src/fm/fmd/fmd.conf | 17 ++
  src/mds/mds_papi.h  |  1 +
  src/osaf/consensus/consensus.cc | 39 -
  src/osaf/consensus/consensus.h  |  9 ++-
  src/osaf/consensus/key_value.cc |  8 ++-
  src/osaf/consensus/plugins/etcd3.plugin | 50 +
  src/rde/rded/rde_cb.h   | 12 +++-
  src/rde/rded/rde_main.cc    | 71 +---
  src/rde/rded/rde_mds.cc | 94 
++--
  src/rde/rded/role.cc    | 97 
+

  src/rde/rded/role.h |  4 +-
  14 files changed, 375 insertions(+), 44 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
Ack from any reviewer, or in 1 week

Arch  Built Started    Linux distro
---
mips    n  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review

Re: [devel] [PATCH 4/5] amfd: allow node to remain active is peer SC can be seen [#2996]

2019-01-21 Thread Minh Hon Chau

ack, review only. Thanks/Minh

On 21/1/19 2:52 pm, Gary Lee wrote:

If relaxed node promotion is enabled, allow a SC to remain
active if the peer SC can be seen, even if access to the
consensus service is lost.
---
  src/amf/amfd/ndfsm.cc  |  2 +-
  src/amf/amfd/ndproc.cc | 13 +++--
  src/amf/amfd/proc.h|  2 +-
  3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc
index 4146ddc..8c8f3c5 100644
--- a/src/amf/amfd/ndfsm.cc
+++ b/src/amf/amfd/ndfsm.cc
@@ -817,7 +817,7 @@ void avd_mds_avnd_down_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
if (cb->node_failover_delay == 0) {
  avd_node_failover(node);
}
-  check_quorum();
+  check_quorum(cb);
node->node_info.member = SA_FALSE;
// Update standby out of sync if standby sc goes down
if (avd_cb->node_id_avd_other == node->node_info.nodeId) {
diff --git a/src/amf/amfd/ndproc.cc b/src/amf/amfd/ndproc.cc
index c4eebb1..ec347fc 100644
--- a/src/amf/amfd/ndproc.cc
+++ b/src/amf/amfd/ndproc.cc
@@ -1245,15 +1245,24 @@ void avd_node_failover(AVD_AVND *node, const bool 
mw_only) {
TRACE_LEAVE();
  }
  
-void check_quorum() {

+void check_quorum(AVD_CL_CB *cb) {
TRACE_ENTER();
  
Consensus consensus_service;

if (consensus_service.IsRemoteFencingEnabled() == false &&
consensus_service.IsWritable() == false) {
+// if relaxed mode is enabled, ignore failure if peer SC is up
+if (consensus_service.IsRelaxedNodePromotionEnabled() == true) {
+  AVD_AVND* peer = avd_node_find_nodeid(cb->node_id_avd_other);
+  if (peer != nullptr && peer->node_state == AVD_AVND_STATE_PRESENT) {
+LOG_NO("Relaxed node promotion is enabled, peer SC is connected");
+return;
+  }
+}
+
  // remote fencing is disabled and we have lost write access
  // reboot this node to prevent split brain
  opensaf_reboot(0, nullptr,
"Quorum lost. Rebooting this node to prevent split-brain");
}
-}
\ No newline at end of file
+}
diff --git a/src/amf/amfd/proc.h b/src/amf/amfd/proc.h
index a378218..f1dc7ba 100644
--- a/src/amf/amfd/proc.h
+++ b/src/amf/amfd/proc.h
@@ -96,7 +96,7 @@ void avd_process_hb_event(AVD_CL_CB *cb_now, struct AVD_EVT 
*evt);
  extern void avd_node_mark_absent(AVD_AVND *node);
  extern void avd_tmr_snd_hb_evh(AVD_CL_CB *cb, AVD_EVT *evt);
  extern void avd_node_failover(AVD_AVND *node, const bool mw_only = false);
-extern void check_quorum();
+extern void check_quorum(AVD_CL_CB *cb);
  extern AVD_SU *get_other_su_from_oper_list(AVD_SU *su);
  extern void su_complete_admin_op(AVD_SU *su, SaAisErrorT result);
  extern void comp_complete_admin_op(AVD_COMP *comp, SaAisErrorT result);



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 3/5] osaf: allow active SC to be preferred during network split [#2996]

2019-01-21 Thread Minh Hon Chau

ack, review only. Thanks/Minh

On 21/1/19 2:52 pm, Gary Lee wrote:

Add FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE option to allow
active SC to be preferred during a network split. The default
behavior is to prefer the larger partition to maintain
existing behaviour.

Add configuration support for FMS_RELAXED_NODE_PROMOTION.
---
  src/osaf/consensus/consensus.cc | 39 ---
  src/osaf/consensus/consensus.h  |  9 +++--
  src/osaf/consensus/key_value.cc |  8 ++--
  3 files changed, 49 insertions(+), 7 deletions(-)

diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc
index 112af7d..5304c4f 100644
--- a/src/osaf/consensus/consensus.cc
+++ b/src/osaf/consensus/consensus.cc
@@ -64,6 +64,7 @@ SaAisErrorT Consensus::PromoteThisNode(const bool 
graceful_takeover,
 cluster_size);
  if (rc != SA_AIS_OK) {
LOG_WA("Takeover request failed (%d)", rc);
+  rc = SA_AIS_ERR_EXIST;
return rc;
  }
  take_over_request_created = true;
@@ -99,7 +100,7 @@ SaAisErrorT Consensus::PromoteThisNode(const bool 
graceful_takeover,
if (rc == SA_AIS_OK) {
  LOG_NO("Active controller set to %s", base::Conf::NodeName().c_str());
} else {
-LOG_ER("Failed to promote this node (%u)", rc);
+LOG_WA("Failed to promote this node (%u)", rc);
}
  
return rc;

@@ -197,6 +198,10 @@ bool Consensus::IsWritable() const {
  
  bool Consensus::IsRemoteFencingEnabled() const { return use_remote_fencing_; }
  
+bool Consensus::IsRelaxedNodePromotionEnabled() const {

+  return relaxed_node_promotion_;
+}
+
  std::string Consensus::CurrentActive() const {
TRACE_ENTER();
if (use_consensus_ == false) {
@@ -228,6 +233,10 @@ Consensus::Consensus() {
uint32_t split_brain_enable = base::GetEnv("FMS_SPLIT_BRAIN_PREVENTION", 0);
std::string kv_store_cmd = base::GetEnv("FMS_KEYVALUE_STORE_PLUGIN_CMD", 
"");
uint32_t use_remote_fencing = base::GetEnv("FMS_USE_REMOTE_FENCING", 0);
+  uint32_t prioritise_partition_size =
+base::GetEnv("FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE", 1);
+  uint32_t relaxed_node_promotion =
+base::GetEnv("FMS_RELAXED_NODE_PROMOTION", 0);
  
// if not specified in fmd.conf,

// takeover requests are valid for 20 seconds
@@ -246,6 +255,14 @@ Consensus::Consensus() {
  use_remote_fencing_ = true;
}
  
+  if (prioritise_partition_size == 1) {

+prioritise_partition_size_ = true;
+  }
+
+  if (use_consensus_ == true && relaxed_node_promotion == 1) {
+relaxed_node_promotion_ = true;
+  }
+
// needed for base::Conf::NodeName() later
base::Conf::InitNodeName();
  }
@@ -373,6 +390,10 @@ SaAisErrorT Consensus::CreateTakeoverRequest(const 
std::string& current_owner,
  return CreateTakeoverRequest(current_owner, proposed_owner, cluster_size);
}
  
+  if (rc != SA_AIS_OK) {

+ return rc;
+  }
+
// wait up to max_takeover_retry seconds for request to be answered
retries = 0;
while (retries < max_takeover_retry) {
@@ -546,9 +567,21 @@ Consensus::TakeoverState Consensus::HandleTakeoverRequest(
LOG_NO("Other network size: %" PRIu64 ", our network size: %" PRIu64,
   proposed_cluster_size, cluster_size);
  
+  const std::string state_str =

+tokens[static_cast(TakeoverElements::STATE)];
+
TakeoverState result;
-  if (proposed_cluster_size > cluster_size) {
-result = TakeoverState::ACCEPTED;
+  if (state_str !=
+TakeoverStateStr[static_cast(TakeoverState::NEW)]) {
+return TakeoverState::UNDEFINED;
+  }
+
+  if (prioritise_partition_size_ == true) {
+if (proposed_cluster_size > cluster_size) {
+  result = TakeoverState::ACCEPTED;
+} else {
+  result = TakeoverState::REJECTED;
+}
} else {
  result = TakeoverState::REJECTED;
}
diff --git a/src/osaf/consensus/consensus.h b/src/osaf/consensus/consensus.h
index 6421c7c..2fbd3bd 100644
--- a/src/osaf/consensus/consensus.h
+++ b/src/osaf/consensus/consensus.h
@@ -57,6 +57,9 @@ class Consensus {
// Is remote fencing enabled?
bool IsRemoteFencingEnabled() const;
  
+  // Is relaxed node promotion enabled?

+  bool IsRelaxedNodePromotionEnabled() const;
+
Consensus();
virtual ~Consensus();
  
@@ -66,7 +69,7 @@ class Consensus {

  UNDEFINED = 0,
  NEW = 1,
  ACCEPTED = 2,
-REJECTED = 3,
+REJECTED = 3
};
  
enum class TakeoverElements : std::uint8_t {

@@ -85,13 +88,15 @@ class Consensus {
   private:
bool use_consensus_ = false;
bool use_remote_fencing_ = false;
+  bool prioritise_partition_size_ = false;
+  bool relaxed_node_promotion_ = false;
uint32_t takeover_valid_time;
uint32_t max_takeover_retry;
const std::string kTestKeyname = "opensaf_write_test";
const std::chrono::milliseconds kSleepInterval =
std::chrono::milliseconds(1000);  // in ms
static constexpr uint32_t kLockTimeout = 0;  // lock is persistent b

Re: [devel] [PATCH 0/1] Review Request for dtm: Fix dtm close socket due to duplication of adding node IP info [#2984]

2019-01-21 Thread Canh Van Truong
Hi all

Please help look at this ticket.

Thanks
Canh

-Original Message-
From: Canh Van Truong  
Sent: Tuesday, December 18, 2018 2:25 PM
To: anders.wid...@ericsson.com; hans.nordeb...@ericsson.com;
gary@dektech.com.au; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Canh Van Truong

Subject: [PATCH 0/1] Review Request for dtm: Fix dtm close socket due to
duplication of adding node IP info [#2984]

Summary: dtm: Fix dtm close socket due to duplication of adding node IP info
[#2984]
Review request for Ticket(s): 2984
Peer Reviewer(s): Gary, Minh, Ander, Hans 
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2984
Base revision: 9ce47c5a79cc080d89d192bc1947ebd04aa0a122
Personal repository: git://git.code.sf.net/u/canht32/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 031c6cd11485c103c8faab901eff5f6f36b02c3f
Author: Canh Van Truong 
Date:   Tue, 18 Dec 2018 14:03:26 +0700

dtm: Fix dtm close socket due to duplication of adding node IP info [#2984]

During cluster start, one node (node 1) broadcast up msg to other node. The
remote node (node 2) get this msg and send the connection to node 1
(connect()).
Similarly node 1 send the connection to  node 2 after node 2 broadcast up
msg to.
Beside of node 2 connect() to node 1, node 2 also add the IP and ID info of
node 1 to database.
But before of that, node 2 may also accept the connection that come from
node 1. The
acception is also add node ID of node 1. So there is 2 times adding the node
ID
info of node 1 to database in node 2. This causes the socket connection is
closed
and node is  restart again.

The patch change to retrieve node from database by node IP instead node ID
in
processing connection. This will reject the double of establishing
connection
between 2 nodes and also double of adding node IP to database.



Complete diffstat:
--
 src/dtm/dtmnd/dtm.h   | 11 --
 src/dtm/dtmnd/dtm_inter_trans.cc  |  3 +-
 src/dtm/dtmnd/dtm_node.cc |  2 +-
 src/dtm/dtmnd/dtm_node_db.cc  | 79
---
 src/dtm/dtmnd/dtm_node_sockets.cc | 20 ++
 5 files changed, 72 insertions(+), 43 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ Y