Re: [devel] [PATCH 1/1] imm: make version parameter in immutil_xxx non-const [#2830]

2018-04-12 Thread Lennart Lund
Hi Vu,

Ack, but an important issue to fix. See comments below [Lennart]

Thanks
Lennart

> -Original Message-
> From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au]
> Sent: den 5 april 2018 12:39
> To: ravisekhar.ko...@oracle.com; Hans Nordebäck
> ; Anders Widell
> ; Lennart Lund 
> Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen
> 
> Subject: [PATCH 1/1] imm: make version parameter in immutil_xxx non-
> const [#2830]
> 
> The version in saImmO{m,i}Initialize is input/output parameter and is
> declared
> as non-constant for both IMM OM and OI API according to SAF spec.
> But in immutil wrapper library, some are declared as constant and don't
> update
> the in/out version before returning from such wrappers.
> 
> This patch makes that parameter non-const and do update the version
> before returning from wrapper APIs. Also fix the wrong usage of
> these wrapper, passed const version, in some services/applications.
> ---
>  src/amf/amfd/imm.cc  | 11 +++
>  src/amf/amfnd/util.cc|  3 ++-
>  src/log/apitest/imm_tstutil.c|  5 -
>  src/log/apitest/logtest.c|  9 ++---
>  src/log/apitest/logtestfr.c  |  6 --
>  src/log/apitest/tet_log_runtime_cfgobj.c |  3 ++-
>  src/log/logd/lgs_config.cc   |  3 ++-
>  src/log/logd/lgs_imm.cc  | 15 ++-
>  src/log/logd/lgs_imm_gcfg.cc |  7 +--
>  src/osaf/immutil/immutil.c   | 20 +++-
>  src/osaf/immutil/immutil.h   |  6 +++---
>  src/smf/smfd/SmfAdminState.cc|  4 ++--
>  src/smf/smfd/SmfExecControlHdl.cc|  3 ++-
>  13 files changed, 64 insertions(+), 31 deletions(-)
> 
> diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc
> index 47c0e5a..8c70325 100644
> --- a/src/amf/amfd/imm.cc
> +++ b/src/amf/amfd/imm.cc
> @@ -1461,6 +1461,7 @@ done:
>  SaAisErrorT avd_imm_init(void *avd_cb) {
>SaAisErrorT error = SA_AIS_OK;
>AVD_CL_CB *cb = (AVD_CL_CB *)avd_cb;
> +  SaVersionT local_version = immVersion;
> 
>TRACE_ENTER();
> 
> @@ -1471,13 +1472,13 @@ SaAisErrorT avd_imm_init(void *avd_cb) {
> 
>cb->avd_imm_status = AVD_IMM_INIT_ONGOING;
>if ((error = immutil_saImmOiInitialize_2(&cb->immOiHandle,
> &avd_callbacks,
> -   &immVersion)) != SA_AIS_OK) {
> +   &local_version)) != SA_AIS_OK) {
>  LOG_ER("saImmOiInitialize failed %u", error);
>  goto done;
>}
> 
[Lennart] local_version has to be set again to immVersion here. This is the 
case in all functions where initialize using an IMM version is done more than 
once. In this case both OM and OI is initialized. In some cases it may be a try 
again loop where an initialize API may be called several times.
Note: I have only written this comment once but it may be applicable in several 
places.
>if ((error = immutil_saImmOmInitialize(&cb->immOmHandle, nullptr,
> - &immVersion)) != SA_AIS_OK) {
> + &local_version)) != SA_AIS_OK) {
>  LOG_ER("saImmOmInitialize failed %u", error);
>  goto done;
>}
> @@ -2075,6 +2076,7 @@ void avd_imm_update_runtime_attrs(void) {
>   */
>  static void *avd_imm_reinit_bg_thread(void *_cb) {
>SaAisErrorT rc = SA_AIS_OK;
> +  SaVersionT local_version;
>AVD_CL_CB *cb = (AVD_CL_CB *)_cb;
>AVD_EVT *evt;
>uint32_t status;
> @@ -2098,9 +2100,10 @@ static void *avd_imm_reinit_bg_thread(void
> *_cb) {
> 
>  avd_cb->immOiHandle = 0;
>  avd_cb->is_implementer = false;
> +local_version = immVersion;
> 
>  if ((rc = immutil_saImmOiInitialize_2(&cb->immOiHandle, &avd_callbacks,
> -  &immVersion)) != SA_AIS_OK) {
> +  &local_version)) != SA_AIS_OK) {
>LOG_ER("saImmOiInitialize failed %u", rc);
>osaf_mutex_unlock_ordie(&imm_reinit_mutex);
>exit(EXIT_FAILURE);
> @@ -2141,7 +2144,7 @@ static void *avd_imm_reinit_bg_thread(void *_cb)
> {
>/* Lets re-initialize Om interface also. */
>(void)immutil_saImmOmFinalize(cb->immOmHandle);
>if ((rc = immutil_saImmOmInitialize(&cb->immOmHandle, nullptr,
> -  &immVersion)) != SA_AIS_OK) {
> +  &local_version)) != SA_AIS_OK) {
>  LOG_ER("saImmOmInitialize failed %u", rc);
>  continue;
>}
> diff --git a/src/amf/amfnd/util.cc b/src/amf/amfnd/util.cc
> index f6dbb49..38bf426 100644
> --- a/src/amf/amfnd/util.cc
> +++ b/src/amf/amfnd/util.cc
> @@ -250,8 +250,9 @@ const char *avnd_failed_state_file_location(void) {
>  SaAisErrorT saImmOmInitialize_cond(SaImmHandleT *immHandle,
> const SaImmCallbacksT *immCallbacks,
> const SaVersionT *vers

Re: [devel] [PATCH 1/1] imm: make version parameter in immutil_xxx non-const [#2830]

2018-04-12 Thread Vu Minh Nguyen
Hi Lennart,

See my response inline, started with [Vu].

Regards, Vu

> -Original Message-
> From: Lennart Lund [mailto:lennart.l...@ericsson.com]
> Sent: Thursday, April 12, 2018 5:13 PM
> To: Vu Minh Nguyen ;
> ravisekhar.ko...@oracle.com; Hans Nordebäck
> ; Anders Widell
> 
> Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen
> 
> Subject: RE: [PATCH 1/1] imm: make version parameter in immutil_xxx non-
> const [#2830]
> 
> Hi Vu,
> 
> Ack, but an important issue to fix. See comments below [Lennart]
> 
> Thanks
> Lennart
> 
> > -Original Message-
> > From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au]
> > Sent: den 5 april 2018 12:39
> > To: ravisekhar.ko...@oracle.com; Hans Nordebäck
> > ; Anders Widell
> > ; Lennart Lund
> 
> > Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen
> > 
> > Subject: [PATCH 1/1] imm: make version parameter in immutil_xxx non-
> > const [#2830]
> >
> > The version in saImmO{m,i}Initialize is input/output parameter and is
> > declared
> > as non-constant for both IMM OM and OI API according to SAF spec.
> > But in immutil wrapper library, some are declared as constant and don't
> > update
> > the in/out version before returning from such wrappers.
> >
> > This patch makes that parameter non-const and do update the version
> > before returning from wrapper APIs. Also fix the wrong usage of
> > these wrapper, passed const version, in some services/applications.
> > ---
> >  src/amf/amfd/imm.cc  | 11 +++
> >  src/amf/amfnd/util.cc|  3 ++-
> >  src/log/apitest/imm_tstutil.c|  5 -
> >  src/log/apitest/logtest.c|  9 ++---
> >  src/log/apitest/logtestfr.c  |  6 --
> >  src/log/apitest/tet_log_runtime_cfgobj.c |  3 ++-
> >  src/log/logd/lgs_config.cc   |  3 ++-
> >  src/log/logd/lgs_imm.cc  | 15 ++-
> >  src/log/logd/lgs_imm_gcfg.cc |  7 +--
> >  src/osaf/immutil/immutil.c   | 20 +++-
> >  src/osaf/immutil/immutil.h   |  6 +++---
> >  src/smf/smfd/SmfAdminState.cc|  4 ++--
> >  src/smf/smfd/SmfExecControlHdl.cc|  3 ++-
> >  13 files changed, 64 insertions(+), 31 deletions(-)
> >
> > diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc
> > index 47c0e5a..8c70325 100644
> > --- a/src/amf/amfd/imm.cc
> > +++ b/src/amf/amfd/imm.cc
> > @@ -1461,6 +1461,7 @@ done:
> >  SaAisErrorT avd_imm_init(void *avd_cb) {
> >SaAisErrorT error = SA_AIS_OK;
> >AVD_CL_CB *cb = (AVD_CL_CB *)avd_cb;
> > +  SaVersionT local_version = immVersion;
> >
> >TRACE_ENTER();
> >
> > @@ -1471,13 +1472,13 @@ SaAisErrorT avd_imm_init(void *avd_cb) {
> >
> >cb->avd_imm_status = AVD_IMM_INIT_ONGOING;
> >if ((error = immutil_saImmOiInitialize_2(&cb->immOiHandle,
> > &avd_callbacks,
> > -   &immVersion)) != SA_AIS_OK)
{
> > +   &local_version)) !=
SA_AIS_OK) {
> >  LOG_ER("saImmOiInitialize failed %u", error);
> >  goto done;
> >}
> >
> [Lennart] local_version has to be set again to immVersion here. This is
the
> case in all functions where initialize using an IMM version is done more
than
> once. In this case both OM and OI is initialized. In some cases it may be
a try
> again loop where an initialize API may be called several times.
> Note: I have only written this comment once but it may be applicable in
> several places.
[Vu] I don't think it is needed here as `local_version` is backup and set
again inside that immutil_saImmOiInitialize_2() wrapper. 
Unless immutil_saImmOiInitialize_2() is called inside a loop(*), we don't
need to have `local_version` reset.

(*) Please refer to my next comment for the case - called inside a loop.

> >if ((error = immutil_saImmOmInitialize(&cb->immOmHandle, nullptr,
> > - &immVersion)) != SA_AIS_OK) {
> > + &local_version)) != SA_AIS_OK)
{
> >  LOG_ER("saImmOmInitialize failed %u", error);
> >  goto done;
> >}
> > @@ -2075,6 +2076,7 @@ void avd_imm_update_runtime_attrs(void) {
> >   */
> >  static void *avd_imm_reinit_bg_thread(void *_cb) {
> >SaAisErrorT rc = SA_AIS_OK;
> > +  SaVersionT local_version;
> >AVD_CL_CB *cb = (AVD_CL_CB *)_cb;
> >AVD_EVT *evt;
> >uint32_t status;
> > @@ -2098,9 +2100,10 @@ static void *avd_imm_reinit_bg_thread(void
> > *_cb) {
> >
> >  avd_cb->immOiHandle = 0;
> >  avd_cb->is_implementer = false;
> > +local_version = immVersion;
[Vu] Since immutil_saImmOiInitialize_2() is called inside a loop, we need to
reset `local_version`.
> >
> >  if ((rc = immutil_saImmOiInitialize_2(&cb->immOiHandle,
> &avd_callbacks,
> > -  &immVersion)) != SA_AIS_OK) {
> > +  &local_version)) !=
SA_AIS_OK) {
> >LOG

[devel] [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

2018-04-12 Thread ravi-sekhar
---
 scripts/opensaf_reboot | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
index df65c26..b219c39 100644
--- a/scripts/opensaf_reboot
+++ b/scripts/opensaf_reboot
@@ -37,6 +37,9 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH
 if [ -f "$pkgsysconfdir/fmd.conf" ]; then
   . "$pkgsysconfdir/fmd.conf"
 fi
+if [ -f "$pkgsysconfdir/nid.conf" ]; then
+  . "$pkgsysconfdir/nid.conf"
+fi
 
 NODE_ID_FILE=$pkglocalstatedir/node_id
 
@@ -118,7 +121,17 @@ else
# uncomment the following line if debugging errors that keep 
restarting the node
# exit 0
 
+# If the application is using different interface for cluster 
communication, please
+# add your application specific isolation commands here
+
logger -t "opensaf_reboot" "Rebooting local node; 
timeout=$OPENSAF_REBOOT_TIMEOUT"
+  
+# Isolate the node
+if [ "$MDS_TRANSPORT" = "TIPC" ]; then
+   tipc-config -bd eth:$TIPC_ETH_IF
+else
+   $icmd pkill -STOP osafdtmd
+fi
 
# Start a reboot supervision background process. Note that a 
similar
# supervision is also done in the opensaf_reboot() function in 
LEAP.
@@ -128,12 +141,6 @@ else
(sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" > 
"/proc/sysrq-trigger") &
fi
 
-   # Stop some important opensaf processes to prevent bad things 
from happening
-   $icmd pkill -STOP osafamfwd
-   $icmd pkill -STOP osafamfnd
-   $icmd pkill -STOP osafamfd
-   $icmd pkill -STOP osaffmd
-
# Flush OpenSAF internal log server messages to disk.
$bindir/osaflog --flush
 
-- 
1.9.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for osaf:Isolate the node in the opensaf_reboot[#2833]

2018-04-12 Thread ravi-sekhar
Summary: osaf:Isolate the node in the opensaf_reboot[#2833]
Review request for Ticket(s): 2833
Peer Reviewer(s): Anders, Hans 
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2833
Base revision: aff54ff091727f27830443332b830890668749cf
Personal repository: git://git.code.sf.net/u/ravi-sekhar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 799e5b098e4932929a5710df36395eafbb155605
Author: ravi-sekhar 
Date:   Thu, 12 Apr 2018 18:40:19 +0530

osaf: Isolate the node in the opensaf_reboot [#2833]



Complete diffstat:
--
 scripts/opensaf_reboot | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)


Testing Commands:
-
Bring up the demo application on SC-1(Active) and SC-2(Standby)
such that application on SC-2 is Active and on SC-1 is Standby 

 Simulate the link flickering
  1) TCP confifured systems 
 for simulating link down use: tipc-config -bd eth:eth0
 for simulating link down use: tipc-config -be eth:eth0
 
  2) TIPC configured systems: 
 for simulating link down use: ifconfig eth0 down
 for simulating link up use:ifconfig eth0 up 

Testing, Expected Results:
--
The node should get isolated immediatly, so that we won't see two active SUs 
ata a time 

Conditions of Submission:
-
Ack from reviewer on or before Aprili 20 

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo

Re: [devel] [PATCH 1/1] imm: make version parameter in immutil_xxx non-const [#2830]

2018-04-12 Thread Lennart Lund


> -Original Message-
> From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au]
> Sent: den 12 april 2018 12:37
> To: Lennart Lund ;
> ravisekhar.ko...@oracle.com; Hans Nordebäck
> ; Anders Widell
> 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 1/1] imm: make version parameter in immutil_xxx non-
> const [#2830]
> 
> Hi Lennart,
> 
> See my response inline, started with [Vu].
> 
> Regards, Vu
> 
> > -Original Message-
> > From: Lennart Lund [mailto:lennart.l...@ericsson.com]
> > Sent: Thursday, April 12, 2018 5:13 PM
> > To: Vu Minh Nguyen ;
> > ravisekhar.ko...@oracle.com; Hans Nordebäck
> > ; Anders Widell
> > 
> > Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen
> > 
> > Subject: RE: [PATCH 1/1] imm: make version parameter in immutil_xxx
> non-
> > const [#2830]
> >
> > Hi Vu,
> >
> > Ack, but an important issue to fix. See comments below [Lennart]
> >
> > Thanks
> > Lennart
> >
> > > -Original Message-
> > > From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au]
> > > Sent: den 5 april 2018 12:39
> > > To: ravisekhar.ko...@oracle.com; Hans Nordebäck
> > > ; Anders Widell
> > > ; Lennart Lund
> > 
> > > Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen
> > > 
> > > Subject: [PATCH 1/1] imm: make version parameter in immutil_xxx non-
> > > const [#2830]
> > >
> > > The version in saImmO{m,i}Initialize is input/output parameter and is
> > > declared
> > > as non-constant for both IMM OM and OI API according to SAF spec.
> > > But in immutil wrapper library, some are declared as constant and don't
> > > update
> > > the in/out version before returning from such wrappers.
> > >
> > > This patch makes that parameter non-const and do update the version
> > > before returning from wrapper APIs. Also fix the wrong usage of
> > > these wrapper, passed const version, in some services/applications.
> > > ---
> > >  src/amf/amfd/imm.cc  | 11 +++
> > >  src/amf/amfnd/util.cc|  3 ++-
> > >  src/log/apitest/imm_tstutil.c|  5 -
> > >  src/log/apitest/logtest.c|  9 ++---
> > >  src/log/apitest/logtestfr.c  |  6 --
> > >  src/log/apitest/tet_log_runtime_cfgobj.c |  3 ++-
> > >  src/log/logd/lgs_config.cc   |  3 ++-
> > >  src/log/logd/lgs_imm.cc  | 15 ++-
> > >  src/log/logd/lgs_imm_gcfg.cc |  7 +--
> > >  src/osaf/immutil/immutil.c   | 20 +++-
> > >  src/osaf/immutil/immutil.h   |  6 +++---
> > >  src/smf/smfd/SmfAdminState.cc|  4 ++--
> > >  src/smf/smfd/SmfExecControlHdl.cc|  3 ++-
> > >  13 files changed, 64 insertions(+), 31 deletions(-)
> > >
> > > diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc
> > > index 47c0e5a..8c70325 100644
> > > --- a/src/amf/amfd/imm.cc
> > > +++ b/src/amf/amfd/imm.cc
> > > @@ -1461,6 +1461,7 @@ done:
> > >  SaAisErrorT avd_imm_init(void *avd_cb) {
> > >SaAisErrorT error = SA_AIS_OK;
> > >AVD_CL_CB *cb = (AVD_CL_CB *)avd_cb;
> > > +  SaVersionT local_version = immVersion;
> > >
> > >TRACE_ENTER();
> > >
> > > @@ -1471,13 +1472,13 @@ SaAisErrorT avd_imm_init(void *avd_cb) {
> > >
> > >cb->avd_imm_status = AVD_IMM_INIT_ONGOING;
> > >if ((error = immutil_saImmOiInitialize_2(&cb->immOiHandle,
> > > &avd_callbacks,
> > > -   &immVersion)) != SA_AIS_OK)
> {
> > > +   &local_version)) !=
> SA_AIS_OK) {
> > >  LOG_ER("saImmOiInitialize failed %u", error);
> > >  goto done;
> > >}
> > >
> > [Lennart] local_version has to be set again to immVersion here. This is
> the
> > case in all functions where initialize using an IMM version is done more
> than
> > once. In this case both OM and OI is initialized. In some cases it may be
> a try
> > again loop where an initialize API may be called several times.
> > Note: I have only written this comment once but it may be applicable in
> > several places.
> [Vu] I don't think it is needed here as `local_version` is backup and set
> again inside that immutil_saImmOiInitialize_2() wrapper.
> Unless immutil_saImmOiInitialize_2() is called inside a loop(*), we don't
> need to have `local_version` reset.[Lennart] 

[Lennart] No, this is not the case and immutil initialize functions must return 
the "out" value and does just that.
The important fix done with this ticket is to make sure that immutil initialize 
functions sets the in/out version parameter to the out value. This means that 
this function that  calls immutil initialize functions has to use a local 
version variable and reset that variable back to the wanted state before each 
call to an immutil initialize function using the version.
In this case first immutil_saImmOiInitialize_2(..., *local_version) is called 
which will change content of local_version after that 
immutil_saImmOmInitialize(..., *local_version) is c

Re: [devel] [PATCH 0/5] Review Request for split-brain: select active SC from largest network partition V3 [#2795]

2018-04-12 Thread Anders Widell

Ack with comments:

* There is no need to use "const" when passing function arguments by 
value. E.g. the argument "const uint64_t cluster_size" should be 
"uint64_t cluster_size".


* You assume that all nodes in the cluster have synchronized clocks 
(probably using NTP). Would it be possible to use an expiration time for 
the etcd key instead of writing a time stamp in the value, so that etcd 
automatically deletes the takeover request when it expires? That way we 
would not require synchronized clocks.


regards,
Anders Widell

On 04/11/2018 09:35 AM, Gary Lee wrote:

Summary: split-brain: select active SC from largest network partition V3 [#2795]
Review request for Ticket(s): 2795
Peer Reviewer(s): Anders, Ravi, Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2795
Base revision: 1c302a300e449e8a8527671fbd6c7f4e2b41e95d
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

  Docsn
  Build systemn
  RPM/packaging   n
  Configuration files n
  Startup scripts n
  SAF servicesn
  OpenSAF servicesy
  Core libraries  y
  Samples n
  Tests   n
  Other   n


Comments (indicate scope for each "y" above):
-

*** Changes from V2: ***

fmd: made cluster_size atomic
fmd: wait 3 seconds before promoting to active, to allow topology events to be 
processed first
osaf: add check for existing takeover request, before trying to lock
etcdv3 plugin: reliablity improvements


revision c7bc78656d5de11f6147727bd8612274fb6e438f
Author: Gary Lee 
Date:   Wed, 11 Apr 2018 17:16:46 +1000

rded: adapt to new Consensus API [#2795]

- add 3 new internal message:

RDE_MSG_NODE_UP
RDE_MSG_NODE_DOWN
RDE_MSG_TAKEOVER_REQUEST_CALLBACK

- subscribe to AMFND service up events to keep track of the number
   of cluster members

- listen for takeover requests in KV store



revision 4899e5d0f5abdff8f15eca8ad17d3b13b6a00393
Author: Gary Lee 
Date:   Wed, 11 Apr 2018 17:16:18 +1000

fmd: adapt to new Consensus API [#2795]



revision 812a315af21df06b2f9fdcc3d8fd5b7bbad3e550
Author: Gary Lee 
Date:   Wed, 11 Apr 2018 17:15:41 +1000

amfd: adapt to new Consensus API [#2795]



revision b8a37c1b8965826e5faffbfebc44a84bdb6433a1
Author: Gary Lee 
Date:   Wed, 11 Apr 2018 17:14:39 +1000

osaf: add lock takeover request fuction [#2795]

- add create and set (if previous value matches) functions to KeyValue class
- add Consensus::MonitorTakeoverRequest() function for use by RDE to answer 
takeover requests
- add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, 
it will
   create a takeover request in the KV store. An existing SC can reject the 
lock takeover



revision 955be872ba5887b1b521eac9f7732dd3f6afc593
Author: Gary Lee 
Date:   Wed, 11 Apr 2018 17:13:45 +1000

osaf: extend API to include a create key and an enhanced set key function 
[#2795]

- add create_key function (fails if key already exists)
- add setkey_match_prev function (set value if previous value matches)
- add missing quotes
- add etcd3.plugin



Added Files:

  src/osaf/consensus/plugins/etcd3.plugin


Complete diffstat:
--
  src/amf/amfd/role.cc |   2 +-
  src/fm/fmd/fm_cb.h   |   2 +-
  src/fm/fmd/fm_main.cc|  26 +-
  src/fm/fmd/fm_mds.cc |   2 +
  src/fm/fmd/fm_rda.cc |  27 +-
  src/osaf/consensus/consensus.cc  | 435 ++-
  src/osaf/consensus/consensus.h   |  55 +++-
  src/osaf/consensus/key_value.cc  | 105 +---
  src/osaf/consensus/key_value.h   |  19 +-
  src/osaf/consensus/plugins/etcd.plugin   |  86 +-
  src/osaf/consensus/plugins/etcd3.plugin  | 366 ++
  src/osaf/consensus/plugins/sample.plugin |  67 -
  src/rde/rded/rde_cb.h|  12 +-
  src/rde/rded/rde_main.cc |  75 --
  src/rde/rded/rde_mds.cc  |  39 ++-
  src/rde/rded/rde_rda.cc  |   2 +-
  src/rde/rded/role.cc |  46 +++-
  src/rde/rded/role.h  |   2 +-
  18 files changed, 1180 insertions(+), 188 deletions(-)


Testing Commands:
-
1) SI swap of safSi=SC-2N,safApp=OpenSAF
2) Isolate standby cluster (eg. use iptables to block port 6700 on a TCP system)
3) Isolate active cluster

Testing, Expected Results:
--
1) No error
2) Standby will fail to be promoted as active as the takeover request is 
rejected
3) Standby will be promoted

Conditions of Submission:
-
Ack from any reviewer


Arch  Built StartedLinux distro
---

Re: [devel] [PATCH 0/5] Review Request for split-brain: select active SC from largest network partition V3 [#2795]

2018-04-12 Thread Gary Lee

Hi


On 12/04/18 23:34, Anders Widell wrote:

Ack with comments:

* There is no need to use "const" when passing function arguments by 
value. E.g. the argument "const uint64_t cluster_size" should be 
"uint64_t cluster_size".




[GL] Sure, but it doesn't do any harm, and would stop accidental 
assignments (that would be lost anyway).


* You assume that all nodes in the cluster have synchronized clocks 
(probably using NTP). Would it be possible to use an expiration time 
for the etcd key instead of writing a time stamp in the value, so that 
etcd automatically deletes the takeover request when it expires? That 
way we would not require synchronized clocks.




[GL] Good idea. I did question why I hadn't use TTL/lease once I had 
finished the ticket. :-) Will see what I can do!



regards,
Anders Widell

On 04/11/2018 09:35 AM, Gary Lee wrote:
Summary: split-brain: select active SC from largest network partition 
V3 [#2795]

Review request for Ticket(s): 2795
Peer Reviewer(s): Anders, Ravi, Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2795
Base revision: 1c302a300e449e8a8527671fbd6c7f4e2b41e95d
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

  Docs    n
  Build system    n
  RPM/packaging   n
  Configuration files n
  Startup scripts n
  SAF services    n
  OpenSAF services    y
  Core libraries  y
  Samples n
  Tests   n
  Other   n


Comments (indicate scope for each "y" above):
-

*** Changes from V2: ***

fmd: made cluster_size atomic
fmd: wait 3 seconds before promoting to active, to allow topology 
events to be processed first

osaf: add check for existing takeover request, before trying to lock
etcdv3 plugin: reliablity improvements


revision c7bc78656d5de11f6147727bd8612274fb6e438f
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:16:46 +1000

rded: adapt to new Consensus API [#2795]

- add 3 new internal message:

RDE_MSG_NODE_UP
RDE_MSG_NODE_DOWN
RDE_MSG_TAKEOVER_REQUEST_CALLBACK

- subscribe to AMFND service up events to keep track of the number
   of cluster members

- listen for takeover requests in KV store



revision 4899e5d0f5abdff8f15eca8ad17d3b13b6a00393
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:16:18 +1000

fmd: adapt to new Consensus API [#2795]



revision 812a315af21df06b2f9fdcc3d8fd5b7bbad3e550
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:15:41 +1000

amfd: adapt to new Consensus API [#2795]



revision b8a37c1b8965826e5faffbfebc44a84bdb6433a1
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:14:39 +1000

osaf: add lock takeover request fuction [#2795]

- add create and set (if previous value matches) functions to 
KeyValue class
- add Consensus::MonitorTakeoverRequest() function for use by RDE to 
answer takeover requests
- add Consensus::CreateTakeoverRequest() - before a SC is promoted to 
active, it will
   create a takeover request in the KV store. An existing SC can 
reject the lock takeover




revision 955be872ba5887b1b521eac9f7732dd3f6afc593
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:13:45 +1000

osaf: extend API to include a create key and an enhanced set key 
function [#2795]


- add create_key function (fails if key already exists)
- add setkey_match_prev function (set value if previous value matches)
- add missing quotes
- add etcd3.plugin



Added Files:

  src/osaf/consensus/plugins/etcd3.plugin


Complete diffstat:
--
  src/amf/amfd/role.cc |   2 +-
  src/fm/fmd/fm_cb.h   |   2 +-
  src/fm/fmd/fm_main.cc    |  26 +-
  src/fm/fmd/fm_mds.cc |   2 +
  src/fm/fmd/fm_rda.cc |  27 +-
  src/osaf/consensus/consensus.cc  | 435 
++-

  src/osaf/consensus/consensus.h   |  55 +++-
  src/osaf/consensus/key_value.cc  | 105 +---
  src/osaf/consensus/key_value.h   |  19 +-
  src/osaf/consensus/plugins/etcd.plugin   |  86 +-
  src/osaf/consensus/plugins/etcd3.plugin  | 366 
++

  src/osaf/consensus/plugins/sample.plugin |  67 -
  src/rde/rded/rde_cb.h    |  12 +-
  src/rde/rded/rde_main.cc |  75 --
  src/rde/rded/rde_mds.cc  |  39 ++-
  src/rde/rded/rde_rda.cc  |   2 +-
  src/rde/rded/role.cc |  46 +++-
  src/rde/rded/role.h  |   2 +-
  18 files changed, 1180 insertions(+), 188 deletions(-)


Testing Commands:
-
1) SI swap of safSi=SC-2N,safApp=OpenSAF
2) Isolate standby cluster (eg. use iptables to block port 6700 on a 
TCP system)

3) Isolate active cluster

Testing, Expected

[devel] [PATCH 1/1] clmd: pass rootCauseEntity from PLM tracking to CLM tracking clients [#2834]

2018-04-12 Thread Alex Jones
CLM tracking clients have no context for the tracking callback.

PLM rootCauseEntity is not passed by CLM to its own tracking clients.

When CLM tracking is invoked because of PLM tracking, pass on the
rootCauseEntity.
---
 src/clm/clmd/clms_evt.cc  |  4 +--
 src/clm/clmd/clms_imm.cc  | 80 ++-
 src/clm/clmd/clms_imm.h   |  9 --
 src/clm/clmd/clms_plm.cc  |  3 +-
 src/clm/clmd/clms_util.cc | 13 
 5 files changed, 69 insertions(+), 40 deletions(-)

diff --git a/src/clm/clmd/clms_evt.cc b/src/clm/clmd/clms_evt.cc
index 5c1ca992c..c7d2f4ceb 100644
--- a/src/clm/clmd/clms_evt.cc
+++ b/src/clm/clmd/clms_evt.cc
@@ -692,7 +692,7 @@ uint32_t proc_node_up_msg(CLMS_CB *cb, CLMSV_CLMS_EVT *evt) 
{
   node->init_view = ++(clms_cb->cluster_view_num);
   TRACE("node->init_view %llu", node->init_view);
   node->change = SA_CLM_NODE_JOINED;
-  clms_send_track(clms_cb, node, SA_CLM_CHANGE_COMPLETED, false);
+  clms_send_track(clms_cb, node, SA_CLM_CHANGE_COMPLETED, false, 0);
   /* Clear node->stat_change after sending the callback to
* its clients */
   node->stat_change = SA_FALSE;
@@ -766,7 +766,7 @@ void clms_track_send_node_down(CLMS_CLUSTER_NODE *node) {
   node->stat_change = SA_TRUE;
   node->change = SA_CLM_NODE_LEFT;
   ++(clms_cb->cluster_view_num);
-  clms_send_track(clms_cb, node, SA_CLM_CHANGE_COMPLETED, true);
+  clms_send_track(clms_cb, node, SA_CLM_CHANGE_COMPLETED, true, 0);
   /* Clear node->stat_change after sending the callback to its clients */
   node->stat_change = SA_FALSE;
 
diff --git a/src/clm/clmd/clms_imm.cc b/src/clm/clmd/clms_imm.cc
index 65a6ae11e..190509b05 100644
--- a/src/clm/clmd/clms_imm.cc
+++ b/src/clm/clmd/clms_imm.cc
@@ -1216,7 +1216,8 @@ static void clms_create_track_resp_list(CLMS_CLUSTER_NODE 
*node,
  * @param[in] stepCLM step for which to send the callback
  */
 void clms_send_track(CLMS_CB *cb, CLMS_CLUSTER_NODE *node,
- SaClmChangeStepT step, bool node_reboot) {
+ SaClmChangeStepT step, bool node_reboot,
+ const SaNameT *rootCauseEntity) {
   CLMS_CLIENT_INFO *rec;
   uint32_t client_id = 0;
   SaClmClusterNotificationT_4 *notify_changes = nullptr;
@@ -1254,24 +1255,27 @@ void clms_send_track(CLMS_CB *cb, CLMS_CLUSTER_NODE 
*node,
 if (node_id == node->node_id) {
   /*Implies the change is
* on this local node */
-  rc = clms_send_track_local(node, rec, SA_CLM_CHANGE_START);
+  rc = clms_send_track_local(node, rec, SA_CLM_CHANGE_START,
+ rootCauseEntity);
 }
 
   } else
 rc = clms_prep_and_send_track(cb, node, rec, SA_CLM_CHANGE_START,
-  notify_changes_only);
+  notify_changes_only,
+  rootCauseEntity);
 
 } else if (rec->track_flags & SA_TRACK_CHANGES) {
   if (rec->track_flags & SA_TRACK_LOCAL) {
 if (node_id == node->node_id) {
   /*Implies the change is
* on this local node */
-  rc = clms_send_track_local(node, rec, SA_CLM_CHANGE_START);
+  rc = clms_send_track_local(node, rec, SA_CLM_CHANGE_START,
+ rootCauseEntity);
 }
 
   } else
 rc = clms_prep_and_send_track(cb, node, rec, SA_CLM_CHANGE_START,
-  notify_changes);
+  notify_changes, rootCauseEntity);
 }
 
 if (rc != NCSCC_RC_SUCCESS) {
@@ -1295,24 +1299,26 @@ void clms_send_track(CLMS_CB *cb, CLMS_CLUSTER_NODE 
*node,
 if (node_id == node->node_id) {
   /*Implies the change is
* on this local node */
-  rc = clms_send_track_local(node, rec, SA_CLM_CHANGE_VALIDATE);
+  rc = clms_send_track_local(node, rec, SA_CLM_CHANGE_VALIDATE,
+ rootCauseEntity);
 }
 
   } else
 rc = clms_prep_and_send_track(cb, node, rec, 
SA_CLM_CHANGE_VALIDATE,
-  notify_changes_only);
+  notify_changes_only, 
rootCauseEntity);
 
 } else if (rec->track_flags & SA_TRACK_CHANGES) {
   if (rec->track_flags & SA_TRACK_LOCAL) {
 if (node_id == node->node_id) {
   /*Implies the change is
* on this local node */
-  rc = clms_send_track_local(node, rec, SA_CLM_CHANGE_VALIDATE);
+  rc = clms_send_track_local(node, rec, SA_CLM_CHANGE_VALIDATE,
+ rootCauseEntity);
 }
 
   } else
 rc = clms_prep_and_send_track(cb, node, rec, 
SA_CLM_CHANGE_VALIDATE

[devel] [PATCH 0/1] Review Request for clmd: pass rootCauseEntity from PLM tracking to CLM tracking clients [#2834]

2018-04-12 Thread Alex Jones
Summary: clmd: pass rootCauseEntity from PLM tracking to CLM tracking clients 
[#2834]
Review request for Ticket(s): 2834
Peer Reviewer(s): Anders, Mathi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2834
Base revision: aff54ff091727f27830443332b830890668749cf
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 62589757a43679a24bc4c1f863a68346a23b5a37
Author: Alex Jones 
Date:   Thu, 12 Apr 2018 10:53:19 -0400

clmd: pass rootCauseEntity from PLM tracking to CLM tracking clients [#2834]

CLM tracking clients have no context for the tracking callback.

PLM rootCauseEntity is not passed by CLM to its own tracking clients.

When CLM tracking is invoked because of PLM tracking, pass on the
rootCauseEntity.



Complete diffstat:
--
 src/clm/clmd/clms_evt.cc  |  4 +--
 src/clm/clmd/clms_imm.cc  | 80 ++-
 src/clm/clmd/clms_imm.h   |  9 --
 src/clm/clmd/clms_plm.cc  |  3 +-
 src/clm/clmd/clms_util.cc | 13 
 5 files changed, 69 insertions(+), 40 deletions(-)


Testing Commands:
-
1) Create a CLM tracking client.
2) Using PLM, lock a parent (host) EE, that also has child EEs.


Testing, Expected Results:
--
1) rootCauseEntity of host should be passed in the tracking callback
2) all EEs (child and parent) should be present in the notification

Conditions of Submission:
-
Apr 18 or ack from developer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for plmd: use virDomainDestroy and virDomainCreate to reset VM [#2836]

2018-04-12 Thread Alex Jones
Summary: plmd: use virDomainDestroy and virDomainCreate to reset VM [#2836]
Review request for Ticket(s): 2836
Peer Reviewer(s): Mathi, Ravi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2836
Base revision: b13a65123bfddcc6f5105fe340131e3bd8a5ac70
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 56a0e35daf04083c5fb76270dbf0163b03500d58
Author: Alex Jones 
Date:   Thu, 12 Apr 2018 13:05:19 -0400

plmd: use virDomainDestroy and virDomainCreate to reset VM [#2836]

Abrupt restart or unlock-in of child EE does not always work.

virDomainReset() does not always work.

Use virDomainDestroy() and virDomainCreate() instead.



Complete diffstat:
--
 src/plm/plmd/plms_virt.cc | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)


Testing Commands:
-
1) Do lock/lock-in/unlock-in/unlock of child EE 50 times.


Testing, Expected Results:
--
1) The commands should never fail.


Conditions of Submission:
-
Apr 18 or ack from developer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] plmd: use virDomainDestroy and virDomainCreate to reset VM [#2836]

2018-04-12 Thread Alex Jones
Abrupt restart or unlock-in of child EE does not always work.

virDomainReset() does not always work.

Use virDomainDestroy() and virDomainCreate() instead.
---
 src/plm/plmd/plms_virt.cc | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/plm/plmd/plms_virt.cc b/src/plm/plmd/plms_virt.cc
index 2fd735ac0..0bf11e5a8 100644
--- a/src/plm/plmd/plms_virt.cc
+++ b/src/plm/plmd/plms_virt.cc
@@ -922,8 +922,20 @@ int PlmsVm::instantiate(virDomainPtr domain) {
 }
 
 int PlmsVm::restart(virDomainPtr domain) {
-  TRACE("calling virDomainReset to restart vm");
-  return virDomainReset(domain, 0);
+  TRACE("calling virDomainDestroy and virDomainCreate to restart vm");
+  int rc(-1);
+
+  do {
+rc = virDomainDestroy(domain);
+
+if (rc < 0) break;
+
+rc = virDomainCreate(domain);
+
+if (rc < 0) break;
+  } while (false);
+
+  return rc;
 }
 
 int PlmsVm::isolate(virDomainPtr domain) {
-- 
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

2018-04-12 Thread Hans Nordebäck
Hi Ravi,


I think stonith, implemented in ticket #1859, handles this case. This 
"flickering" was one the (manual) tests verifying the added stonith support.

It is important to have a separate interface for stonith, to be able to perform 
the remote fencing, similar to use a back plane.

Have you tested with stonith enabled?


/Regards HansN


Från: ravi-sekhar 
Skickat: den 12 april 2018 15:29:13
Till: Hans Nordebäck; Anders Widell
Kopia: opensaf-devel@lists.sourceforge.net; ravi-sekhar
Ämne: [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

---
 scripts/opensaf_reboot | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
index df65c26..b219c39 100644
--- a/scripts/opensaf_reboot
+++ b/scripts/opensaf_reboot
@@ -37,6 +37,9 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH
 if [ -f "$pkgsysconfdir/fmd.conf" ]; then
   . "$pkgsysconfdir/fmd.conf"
 fi
+if [ -f "$pkgsysconfdir/nid.conf" ]; then
+  . "$pkgsysconfdir/nid.conf"
+fi

 NODE_ID_FILE=$pkglocalstatedir/node_id

@@ -118,7 +121,17 @@ else
 # uncomment the following line if debugging errors that keep 
restarting the node
 # exit 0

+# If the application is using different interface for cluster 
communication, please
+# add your application specific isolation commands here
+
 logger -t "opensaf_reboot" "Rebooting local node; 
timeout=$OPENSAF_REBOOT_TIMEOUT"
+
+# Isolate the node
+if [ "$MDS_TRANSPORT" = "TIPC" ]; then
+   tipc-config -bd eth:$TIPC_ETH_IF
+else
+   $icmd pkill -STOP osafdtmd
+fi

 # Start a reboot supervision background process. Note that a 
similar
 # supervision is also done in the opensaf_reboot() function in 
LEAP.
@@ -128,12 +141,6 @@ else
 (sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" > 
"/proc/sysrq-trigger") &
 fi

-   # Stop some important opensaf processes to prevent bad things 
from happening
-   $icmd pkill -STOP osafamfwd
-   $icmd pkill -STOP osafamfnd
-   $icmd pkill -STOP osafamfd
-   $icmd pkill -STOP osaffmd
-
 # Flush OpenSAF internal log server messages to disk.
 $bindir/osaflog --flush

--
1.9.1

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] imm: make version parameter in immutil_xxx non-const [#2830]

2018-04-12 Thread Vu Minh Nguyen
Thanks Lennart. I will update the code as your comments before pushing.

@Ravi: Any comment from you? Thanks.

Regards, Vu

> -Original Message-
> From: Lennart Lund [mailto:lennart.l...@ericsson.com]
> Sent: Thursday, April 12, 2018 8:32 PM
> To: Vu Minh Nguyen ;
> ravisekhar.ko...@oracle.com; Hans Nordebäck
> ; Anders Widell
> 
> Cc: opensaf-devel@lists.sourceforge.net; Lennart Lund
> 
> Subject: RE: [PATCH 1/1] imm: make version parameter in immutil_xxx non-
> const [#2830]
> 
> 
> 
> > -Original Message-
> > From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au]
> > Sent: den 12 april 2018 12:37
> > To: Lennart Lund ;
> > ravisekhar.ko...@oracle.com; Hans Nordebäck
> > ; Anders Widell
> > 
> > Cc: opensaf-devel@lists.sourceforge.net
> > Subject: RE: [PATCH 1/1] imm: make version parameter in immutil_xxx non-
> > const [#2830]
> >
> > Hi Lennart,
> >
> > See my response inline, started with [Vu].
> >
> > Regards, Vu
> >
> > > -Original Message-
> > > From: Lennart Lund [mailto:lennart.l...@ericsson.com]
> > > Sent: Thursday, April 12, 2018 5:13 PM
> > > To: Vu Minh Nguyen ;
> > > ravisekhar.ko...@oracle.com; Hans Nordebäck
> > > ; Anders Widell
> > > 
> > > Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen
> > > 
> > > Subject: RE: [PATCH 1/1] imm: make version parameter in immutil_xxx
> > non-
> > > const [#2830]
> > >
> > > Hi Vu,
> > >
> > > Ack, but an important issue to fix. See comments below [Lennart]
> > >
> > > Thanks
> > > Lennart
> > >
> > > > -Original Message-
> > > > From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au]
> > > > Sent: den 5 april 2018 12:39
> > > > To: ravisekhar.ko...@oracle.com; Hans Nordebäck
> > > > ; Anders Widell
> > > > ; Lennart Lund
> > > 
> > > > Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen
> > > > 
> > > > Subject: [PATCH 1/1] imm: make version parameter in immutil_xxx non-
> > > > const [#2830]
> > > >
> > > > The version in saImmO{m,i}Initialize is input/output parameter and
is
> > > > declared
> > > > as non-constant for both IMM OM and OI API according to SAF spec.
> > > > But in immutil wrapper library, some are declared as constant and
don't
> > > > update
> > > > the in/out version before returning from such wrappers.
> > > >
> > > > This patch makes that parameter non-const and do update the version
> > > > before returning from wrapper APIs. Also fix the wrong usage of
> > > > these wrapper, passed const version, in some services/applications.
> > > > ---
> > > >  src/amf/amfd/imm.cc  | 11 +++
> > > >  src/amf/amfnd/util.cc|  3 ++-
> > > >  src/log/apitest/imm_tstutil.c|  5 -
> > > >  src/log/apitest/logtest.c|  9 ++---
> > > >  src/log/apitest/logtestfr.c  |  6 --
> > > >  src/log/apitest/tet_log_runtime_cfgobj.c |  3 ++-
> > > >  src/log/logd/lgs_config.cc   |  3 ++-
> > > >  src/log/logd/lgs_imm.cc  | 15 ++-
> > > >  src/log/logd/lgs_imm_gcfg.cc |  7 +--
> > > >  src/osaf/immutil/immutil.c   | 20 +++-
> > > >  src/osaf/immutil/immutil.h   |  6 +++---
> > > >  src/smf/smfd/SmfAdminState.cc|  4 ++--
> > > >  src/smf/smfd/SmfExecControlHdl.cc|  3 ++-
> > > >  13 files changed, 64 insertions(+), 31 deletions(-)
> > > >
> > > > diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc
> > > > index 47c0e5a..8c70325 100644
> > > > --- a/src/amf/amfd/imm.cc
> > > > +++ b/src/amf/amfd/imm.cc
> > > > @@ -1461,6 +1461,7 @@ done:
> > > >  SaAisErrorT avd_imm_init(void *avd_cb) {
> > > >SaAisErrorT error = SA_AIS_OK;
> > > >AVD_CL_CB *cb = (AVD_CL_CB *)avd_cb;
> > > > +  SaVersionT local_version = immVersion;
> > > >
> > > >TRACE_ENTER();
> > > >
> > > > @@ -1471,13 +1472,13 @@ SaAisErrorT avd_imm_init(void *avd_cb) {
> > > >
> > > >cb->avd_imm_status = AVD_IMM_INIT_ONGOING;
> > > >if ((error = immutil_saImmOiInitialize_2(&cb->immOiHandle,
> > > > &avd_callbacks,
> > > > -   &immVersion)) !=
SA_AIS_OK)
> > {
> > > > +   &local_version)) !=
> > SA_AIS_OK) {
> > > >  LOG_ER("saImmOiInitialize failed %u", error);
> > > >  goto done;
> > > >}
> > > >
> > > [Lennart] local_version has to be set again to immVersion here. This
is
> > the
> > > case in all functions where initialize using an IMM version is done
more
> > than
> > > once. In this case both OM and OI is initialized. In some cases it may
be
> > a try
> > > again loop where an initialize API may be called several times.
> > > Note: I have only written this comment once but it may be applicable
in
> > > several places.
> > [Vu] I don't think it is needed here as `local_version` is backup and
set
> > again inside that immutil_saImmOiInitialize_2() wrapper.
> > Unless immutil_saImmOiInitialize_2() is called inside a loop(*), we
don't
> > ne

Re: [devel] [PATCH 0/5] Review Request for split-brain: select active SC from largest network partition V3 [#2795]

2018-04-12 Thread Hans Nordebäck

Hi,



On 04/12/2018 04:15 PM, Gary Lee wrote:

Hi


On 12/04/18 23:34, Anders Widell wrote:

Ack with comments:

* There is no need to use "const" when passing function arguments by 
value. E.g. the argument "const uint64_t cluster_size" should be 
"uint64_t cluster_size".




[GL] Sure, but it doesn't do any harm, and would stop accidental 
assignments (that would be lost anyway).
[HansN] perhaps these functions should be const member functions? E.g. 
SaAisErrorT PromoteThisNode(bool graceful_takeover, uint64_t 
cluster_size) const;


* You assume that all nodes in the cluster have synchronized clocks 
(probably using NTP). Would it be possible to use an expiration time 
for the etcd key instead of writing a time stamp in the value, so 
that etcd automatically deletes the takeover request when it expires? 
That way we would not require synchronized clocks.




[GL] Good idea. I did question why I hadn't use TTL/lease once I had 
finished the ticket. :-) Will see what I can do!



regards,
Anders Widell

On 04/11/2018 09:35 AM, Gary Lee wrote:
Summary: split-brain: select active SC from largest network 
partition V3 [#2795]

Review request for Ticket(s): 2795
Peer Reviewer(s): Anders, Ravi, Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2795
Base revision: 1c302a300e449e8a8527671fbd6c7f4e2b41e95d
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

  Docs    n
  Build system    n
  RPM/packaging   n
  Configuration files n
  Startup scripts n
  SAF services    n
  OpenSAF services    y
  Core libraries  y
  Samples n
  Tests   n
  Other   n


Comments (indicate scope for each "y" above):
-

*** Changes from V2: ***

fmd: made cluster_size atomic
fmd: wait 3 seconds before promoting to active, to allow topology 
events to be processed first

osaf: add check for existing takeover request, before trying to lock
etcdv3 plugin: reliablity improvements


revision c7bc78656d5de11f6147727bd8612274fb6e438f
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:16:46 +1000

rded: adapt to new Consensus API [#2795]

- add 3 new internal message:

RDE_MSG_NODE_UP
RDE_MSG_NODE_DOWN
RDE_MSG_TAKEOVER_REQUEST_CALLBACK

- subscribe to AMFND service up events to keep track of the number
   of cluster members

- listen for takeover requests in KV store



revision 4899e5d0f5abdff8f15eca8ad17d3b13b6a00393
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:16:18 +1000

fmd: adapt to new Consensus API [#2795]



revision 812a315af21df06b2f9fdcc3d8fd5b7bbad3e550
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:15:41 +1000

amfd: adapt to new Consensus API [#2795]



revision b8a37c1b8965826e5faffbfebc44a84bdb6433a1
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:14:39 +1000

osaf: add lock takeover request fuction [#2795]

- add create and set (if previous value matches) functions to 
KeyValue class
- add Consensus::MonitorTakeoverRequest() function for use by RDE to 
answer takeover requests
- add Consensus::CreateTakeoverRequest() - before a SC is promoted 
to active, it will
   create a takeover request in the KV store. An existing SC can 
reject the lock takeover




revision 955be872ba5887b1b521eac9f7732dd3f6afc593
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:13:45 +1000

osaf: extend API to include a create key and an enhanced set key 
function [#2795]


- add create_key function (fails if key already exists)
- add setkey_match_prev function (set value if previous value matches)
- add missing quotes
- add etcd3.plugin



Added Files:

  src/osaf/consensus/plugins/etcd3.plugin


Complete diffstat:
--
  src/amf/amfd/role.cc |   2 +-
  src/fm/fmd/fm_cb.h   |   2 +-
  src/fm/fmd/fm_main.cc    |  26 +-
  src/fm/fmd/fm_mds.cc |   2 +
  src/fm/fmd/fm_rda.cc |  27 +-
  src/osaf/consensus/consensus.cc  | 435 
++-

  src/osaf/consensus/consensus.h   |  55 +++-
  src/osaf/consensus/key_value.cc  | 105 +---
  src/osaf/consensus/key_value.h   |  19 +-
  src/osaf/consensus/plugins/etcd.plugin   |  86 +-
  src/osaf/consensus/plugins/etcd3.plugin  | 366 
++

  src/osaf/consensus/plugins/sample.plugin |  67 -
  src/rde/rded/rde_cb.h    |  12 +-
  src/rde/rded/rde_main.cc |  75 --
  src/rde/rded/rde_mds.cc  |  39 ++-
  src/rde/rded/rde_rda.cc  |   2 +-
  src/rde/rded/role.cc |  46 +++-
  src/rde/rded/role.h  |   2 +-
  18 files changed, 1180 insertions(+), 188 deletions(-)


T