Re: [devel] [PATCH 1/1] imm: make version parameter in immutil_xxx non-const [#2830]
Hi Vu, Ack, but an important issue to fix. See comments below [Lennart] Thanks Lennart > -Original Message- > From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] > Sent: den 5 april 2018 12:39 > To: ravisekhar.ko...@oracle.com; Hans Nordebäck > ; Anders Widell > ; Lennart Lund > Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen > > Subject: [PATCH 1/1] imm: make version parameter in immutil_xxx non- > const [#2830] > > The version in saImmO{m,i}Initialize is input/output parameter and is > declared > as non-constant for both IMM OM and OI API according to SAF spec. > But in immutil wrapper library, some are declared as constant and don't > update > the in/out version before returning from such wrappers. > > This patch makes that parameter non-const and do update the version > before returning from wrapper APIs. Also fix the wrong usage of > these wrapper, passed const version, in some services/applications. > --- > src/amf/amfd/imm.cc | 11 +++ > src/amf/amfnd/util.cc| 3 ++- > src/log/apitest/imm_tstutil.c| 5 - > src/log/apitest/logtest.c| 9 ++--- > src/log/apitest/logtestfr.c | 6 -- > src/log/apitest/tet_log_runtime_cfgobj.c | 3 ++- > src/log/logd/lgs_config.cc | 3 ++- > src/log/logd/lgs_imm.cc | 15 ++- > src/log/logd/lgs_imm_gcfg.cc | 7 +-- > src/osaf/immutil/immutil.c | 20 +++- > src/osaf/immutil/immutil.h | 6 +++--- > src/smf/smfd/SmfAdminState.cc| 4 ++-- > src/smf/smfd/SmfExecControlHdl.cc| 3 ++- > 13 files changed, 64 insertions(+), 31 deletions(-) > > diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc > index 47c0e5a..8c70325 100644 > --- a/src/amf/amfd/imm.cc > +++ b/src/amf/amfd/imm.cc > @@ -1461,6 +1461,7 @@ done: > SaAisErrorT avd_imm_init(void *avd_cb) { >SaAisErrorT error = SA_AIS_OK; >AVD_CL_CB *cb = (AVD_CL_CB *)avd_cb; > + SaVersionT local_version = immVersion; > >TRACE_ENTER(); > > @@ -1471,13 +1472,13 @@ SaAisErrorT avd_imm_init(void *avd_cb) { > >cb->avd_imm_status = AVD_IMM_INIT_ONGOING; >if ((error = immutil_saImmOiInitialize_2(&cb->immOiHandle, > &avd_callbacks, > - &immVersion)) != SA_AIS_OK) { > + &local_version)) != SA_AIS_OK) { > LOG_ER("saImmOiInitialize failed %u", error); > goto done; >} > [Lennart] local_version has to be set again to immVersion here. This is the case in all functions where initialize using an IMM version is done more than once. In this case both OM and OI is initialized. In some cases it may be a try again loop where an initialize API may be called several times. Note: I have only written this comment once but it may be applicable in several places. >if ((error = immutil_saImmOmInitialize(&cb->immOmHandle, nullptr, > - &immVersion)) != SA_AIS_OK) { > + &local_version)) != SA_AIS_OK) { > LOG_ER("saImmOmInitialize failed %u", error); > goto done; >} > @@ -2075,6 +2076,7 @@ void avd_imm_update_runtime_attrs(void) { > */ > static void *avd_imm_reinit_bg_thread(void *_cb) { >SaAisErrorT rc = SA_AIS_OK; > + SaVersionT local_version; >AVD_CL_CB *cb = (AVD_CL_CB *)_cb; >AVD_EVT *evt; >uint32_t status; > @@ -2098,9 +2100,10 @@ static void *avd_imm_reinit_bg_thread(void > *_cb) { > > avd_cb->immOiHandle = 0; > avd_cb->is_implementer = false; > +local_version = immVersion; > > if ((rc = immutil_saImmOiInitialize_2(&cb->immOiHandle, &avd_callbacks, > - &immVersion)) != SA_AIS_OK) { > + &local_version)) != SA_AIS_OK) { >LOG_ER("saImmOiInitialize failed %u", rc); >osaf_mutex_unlock_ordie(&imm_reinit_mutex); >exit(EXIT_FAILURE); > @@ -2141,7 +2144,7 @@ static void *avd_imm_reinit_bg_thread(void *_cb) > { >/* Lets re-initialize Om interface also. */ >(void)immutil_saImmOmFinalize(cb->immOmHandle); >if ((rc = immutil_saImmOmInitialize(&cb->immOmHandle, nullptr, > - &immVersion)) != SA_AIS_OK) { > + &local_version)) != SA_AIS_OK) { > LOG_ER("saImmOmInitialize failed %u", rc); > continue; >} > diff --git a/src/amf/amfnd/util.cc b/src/amf/amfnd/util.cc > index f6dbb49..38bf426 100644 > --- a/src/amf/amfnd/util.cc > +++ b/src/amf/amfnd/util.cc > @@ -250,8 +250,9 @@ const char *avnd_failed_state_file_location(void) { > SaAisErrorT saImmOmInitialize_cond(SaImmHandleT *immHandle, > const SaImmCallbacksT *immCallbacks, > const SaVersionT *vers
Re: [devel] [PATCH 1/1] imm: make version parameter in immutil_xxx non-const [#2830]
Hi Lennart, See my response inline, started with [Vu]. Regards, Vu > -Original Message- > From: Lennart Lund [mailto:lennart.l...@ericsson.com] > Sent: Thursday, April 12, 2018 5:13 PM > To: Vu Minh Nguyen ; > ravisekhar.ko...@oracle.com; Hans Nordebäck > ; Anders Widell > > Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen > > Subject: RE: [PATCH 1/1] imm: make version parameter in immutil_xxx non- > const [#2830] > > Hi Vu, > > Ack, but an important issue to fix. See comments below [Lennart] > > Thanks > Lennart > > > -Original Message- > > From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] > > Sent: den 5 april 2018 12:39 > > To: ravisekhar.ko...@oracle.com; Hans Nordebäck > > ; Anders Widell > > ; Lennart Lund > > > Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen > > > > Subject: [PATCH 1/1] imm: make version parameter in immutil_xxx non- > > const [#2830] > > > > The version in saImmO{m,i}Initialize is input/output parameter and is > > declared > > as non-constant for both IMM OM and OI API according to SAF spec. > > But in immutil wrapper library, some are declared as constant and don't > > update > > the in/out version before returning from such wrappers. > > > > This patch makes that parameter non-const and do update the version > > before returning from wrapper APIs. Also fix the wrong usage of > > these wrapper, passed const version, in some services/applications. > > --- > > src/amf/amfd/imm.cc | 11 +++ > > src/amf/amfnd/util.cc| 3 ++- > > src/log/apitest/imm_tstutil.c| 5 - > > src/log/apitest/logtest.c| 9 ++--- > > src/log/apitest/logtestfr.c | 6 -- > > src/log/apitest/tet_log_runtime_cfgobj.c | 3 ++- > > src/log/logd/lgs_config.cc | 3 ++- > > src/log/logd/lgs_imm.cc | 15 ++- > > src/log/logd/lgs_imm_gcfg.cc | 7 +-- > > src/osaf/immutil/immutil.c | 20 +++- > > src/osaf/immutil/immutil.h | 6 +++--- > > src/smf/smfd/SmfAdminState.cc| 4 ++-- > > src/smf/smfd/SmfExecControlHdl.cc| 3 ++- > > 13 files changed, 64 insertions(+), 31 deletions(-) > > > > diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc > > index 47c0e5a..8c70325 100644 > > --- a/src/amf/amfd/imm.cc > > +++ b/src/amf/amfd/imm.cc > > @@ -1461,6 +1461,7 @@ done: > > SaAisErrorT avd_imm_init(void *avd_cb) { > >SaAisErrorT error = SA_AIS_OK; > >AVD_CL_CB *cb = (AVD_CL_CB *)avd_cb; > > + SaVersionT local_version = immVersion; > > > >TRACE_ENTER(); > > > > @@ -1471,13 +1472,13 @@ SaAisErrorT avd_imm_init(void *avd_cb) { > > > >cb->avd_imm_status = AVD_IMM_INIT_ONGOING; > >if ((error = immutil_saImmOiInitialize_2(&cb->immOiHandle, > > &avd_callbacks, > > - &immVersion)) != SA_AIS_OK) { > > + &local_version)) != SA_AIS_OK) { > > LOG_ER("saImmOiInitialize failed %u", error); > > goto done; > >} > > > [Lennart] local_version has to be set again to immVersion here. This is the > case in all functions where initialize using an IMM version is done more than > once. In this case both OM and OI is initialized. In some cases it may be a try > again loop where an initialize API may be called several times. > Note: I have only written this comment once but it may be applicable in > several places. [Vu] I don't think it is needed here as `local_version` is backup and set again inside that immutil_saImmOiInitialize_2() wrapper. Unless immutil_saImmOiInitialize_2() is called inside a loop(*), we don't need to have `local_version` reset. (*) Please refer to my next comment for the case - called inside a loop. > >if ((error = immutil_saImmOmInitialize(&cb->immOmHandle, nullptr, > > - &immVersion)) != SA_AIS_OK) { > > + &local_version)) != SA_AIS_OK) { > > LOG_ER("saImmOmInitialize failed %u", error); > > goto done; > >} > > @@ -2075,6 +2076,7 @@ void avd_imm_update_runtime_attrs(void) { > > */ > > static void *avd_imm_reinit_bg_thread(void *_cb) { > >SaAisErrorT rc = SA_AIS_OK; > > + SaVersionT local_version; > >AVD_CL_CB *cb = (AVD_CL_CB *)_cb; > >AVD_EVT *evt; > >uint32_t status; > > @@ -2098,9 +2100,10 @@ static void *avd_imm_reinit_bg_thread(void > > *_cb) { > > > > avd_cb->immOiHandle = 0; > > avd_cb->is_implementer = false; > > +local_version = immVersion; [Vu] Since immutil_saImmOiInitialize_2() is called inside a loop, we need to reset `local_version`. > > > > if ((rc = immutil_saImmOiInitialize_2(&cb->immOiHandle, > &avd_callbacks, > > - &immVersion)) != SA_AIS_OK) { > > + &local_version)) != SA_AIS_OK) { > >LOG
[devel] [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]
--- scripts/opensaf_reboot | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot index df65c26..b219c39 100644 --- a/scripts/opensaf_reboot +++ b/scripts/opensaf_reboot @@ -37,6 +37,9 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH if [ -f "$pkgsysconfdir/fmd.conf" ]; then . "$pkgsysconfdir/fmd.conf" fi +if [ -f "$pkgsysconfdir/nid.conf" ]; then + . "$pkgsysconfdir/nid.conf" +fi NODE_ID_FILE=$pkglocalstatedir/node_id @@ -118,7 +121,17 @@ else # uncomment the following line if debugging errors that keep restarting the node # exit 0 +# If the application is using different interface for cluster communication, please +# add your application specific isolation commands here + logger -t "opensaf_reboot" "Rebooting local node; timeout=$OPENSAF_REBOOT_TIMEOUT" + +# Isolate the node +if [ "$MDS_TRANSPORT" = "TIPC" ]; then + tipc-config -bd eth:$TIPC_ETH_IF +else + $icmd pkill -STOP osafdtmd +fi # Start a reboot supervision background process. Note that a similar # supervision is also done in the opensaf_reboot() function in LEAP. @@ -128,12 +141,6 @@ else (sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" > "/proc/sysrq-trigger") & fi - # Stop some important opensaf processes to prevent bad things from happening - $icmd pkill -STOP osafamfwd - $icmd pkill -STOP osafamfnd - $icmd pkill -STOP osafamfd - $icmd pkill -STOP osaffmd - # Flush OpenSAF internal log server messages to disk. $bindir/osaflog --flush -- 1.9.1 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for osaf:Isolate the node in the opensaf_reboot[#2833]
Summary: osaf:Isolate the node in the opensaf_reboot[#2833] Review request for Ticket(s): 2833 Peer Reviewer(s): Anders, Hans Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-2833 Base revision: aff54ff091727f27830443332b830890668749cf Personal repository: git://git.code.sf.net/u/ravi-sekhar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** revision 799e5b098e4932929a5710df36395eafbb155605 Author: ravi-sekhar Date: Thu, 12 Apr 2018 18:40:19 +0530 osaf: Isolate the node in the opensaf_reboot [#2833] Complete diffstat: -- scripts/opensaf_reboot | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) Testing Commands: - Bring up the demo application on SC-1(Active) and SC-2(Standby) such that application on SC-2 is Active and on SC-1 is Standby Simulate the link flickering 1) TCP confifured systems for simulating link down use: tipc-config -bd eth:eth0 for simulating link down use: tipc-config -be eth:eth0 2) TIPC configured systems: for simulating link down use: ifconfig eth0 down for simulating link up use:ifconfig eth0 up Testing, Expected Results: -- The node should get isolated immediatly, so that we won't see two active SUs ata a time Conditions of Submission: - Ack from reviewer on or before Aprili 20 Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo
Re: [devel] [PATCH 1/1] imm: make version parameter in immutil_xxx non-const [#2830]
> -Original Message- > From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] > Sent: den 12 april 2018 12:37 > To: Lennart Lund ; > ravisekhar.ko...@oracle.com; Hans Nordebäck > ; Anders Widell > > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [PATCH 1/1] imm: make version parameter in immutil_xxx non- > const [#2830] > > Hi Lennart, > > See my response inline, started with [Vu]. > > Regards, Vu > > > -Original Message- > > From: Lennart Lund [mailto:lennart.l...@ericsson.com] > > Sent: Thursday, April 12, 2018 5:13 PM > > To: Vu Minh Nguyen ; > > ravisekhar.ko...@oracle.com; Hans Nordebäck > > ; Anders Widell > > > > Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen > > > > Subject: RE: [PATCH 1/1] imm: make version parameter in immutil_xxx > non- > > const [#2830] > > > > Hi Vu, > > > > Ack, but an important issue to fix. See comments below [Lennart] > > > > Thanks > > Lennart > > > > > -Original Message- > > > From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] > > > Sent: den 5 april 2018 12:39 > > > To: ravisekhar.ko...@oracle.com; Hans Nordebäck > > > ; Anders Widell > > > ; Lennart Lund > > > > > Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen > > > > > > Subject: [PATCH 1/1] imm: make version parameter in immutil_xxx non- > > > const [#2830] > > > > > > The version in saImmO{m,i}Initialize is input/output parameter and is > > > declared > > > as non-constant for both IMM OM and OI API according to SAF spec. > > > But in immutil wrapper library, some are declared as constant and don't > > > update > > > the in/out version before returning from such wrappers. > > > > > > This patch makes that parameter non-const and do update the version > > > before returning from wrapper APIs. Also fix the wrong usage of > > > these wrapper, passed const version, in some services/applications. > > > --- > > > src/amf/amfd/imm.cc | 11 +++ > > > src/amf/amfnd/util.cc| 3 ++- > > > src/log/apitest/imm_tstutil.c| 5 - > > > src/log/apitest/logtest.c| 9 ++--- > > > src/log/apitest/logtestfr.c | 6 -- > > > src/log/apitest/tet_log_runtime_cfgobj.c | 3 ++- > > > src/log/logd/lgs_config.cc | 3 ++- > > > src/log/logd/lgs_imm.cc | 15 ++- > > > src/log/logd/lgs_imm_gcfg.cc | 7 +-- > > > src/osaf/immutil/immutil.c | 20 +++- > > > src/osaf/immutil/immutil.h | 6 +++--- > > > src/smf/smfd/SmfAdminState.cc| 4 ++-- > > > src/smf/smfd/SmfExecControlHdl.cc| 3 ++- > > > 13 files changed, 64 insertions(+), 31 deletions(-) > > > > > > diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc > > > index 47c0e5a..8c70325 100644 > > > --- a/src/amf/amfd/imm.cc > > > +++ b/src/amf/amfd/imm.cc > > > @@ -1461,6 +1461,7 @@ done: > > > SaAisErrorT avd_imm_init(void *avd_cb) { > > >SaAisErrorT error = SA_AIS_OK; > > >AVD_CL_CB *cb = (AVD_CL_CB *)avd_cb; > > > + SaVersionT local_version = immVersion; > > > > > >TRACE_ENTER(); > > > > > > @@ -1471,13 +1472,13 @@ SaAisErrorT avd_imm_init(void *avd_cb) { > > > > > >cb->avd_imm_status = AVD_IMM_INIT_ONGOING; > > >if ((error = immutil_saImmOiInitialize_2(&cb->immOiHandle, > > > &avd_callbacks, > > > - &immVersion)) != SA_AIS_OK) > { > > > + &local_version)) != > SA_AIS_OK) { > > > LOG_ER("saImmOiInitialize failed %u", error); > > > goto done; > > >} > > > > > [Lennart] local_version has to be set again to immVersion here. This is > the > > case in all functions where initialize using an IMM version is done more > than > > once. In this case both OM and OI is initialized. In some cases it may be > a try > > again loop where an initialize API may be called several times. > > Note: I have only written this comment once but it may be applicable in > > several places. > [Vu] I don't think it is needed here as `local_version` is backup and set > again inside that immutil_saImmOiInitialize_2() wrapper. > Unless immutil_saImmOiInitialize_2() is called inside a loop(*), we don't > need to have `local_version` reset.[Lennart] [Lennart] No, this is not the case and immutil initialize functions must return the "out" value and does just that. The important fix done with this ticket is to make sure that immutil initialize functions sets the in/out version parameter to the out value. This means that this function that calls immutil initialize functions has to use a local version variable and reset that variable back to the wanted state before each call to an immutil initialize function using the version. In this case first immutil_saImmOiInitialize_2(..., *local_version) is called which will change content of local_version after that immutil_saImmOmInitialize(..., *local_version) is c
Re: [devel] [PATCH 0/5] Review Request for split-brain: select active SC from largest network partition V3 [#2795]
Ack with comments: * There is no need to use "const" when passing function arguments by value. E.g. the argument "const uint64_t cluster_size" should be "uint64_t cluster_size". * You assume that all nodes in the cluster have synchronized clocks (probably using NTP). Would it be possible to use an expiration time for the etcd key instead of writing a time stamp in the value, so that etcd automatically deletes the takeover request when it expires? That way we would not require synchronized clocks. regards, Anders Widell On 04/11/2018 09:35 AM, Gary Lee wrote: Summary: split-brain: select active SC from largest network partition V3 [#2795] Review request for Ticket(s): 2795 Peer Reviewer(s): Anders, Ravi, Hans Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-2795 Base revision: 1c302a300e449e8a8527671fbd6c7f4e2b41e95d Personal repository: git://git.code.sf.net/u/userid-2226215/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesn OpenSAF servicesy Core libraries y Samples n Tests n Other n Comments (indicate scope for each "y" above): - *** Changes from V2: *** fmd: made cluster_size atomic fmd: wait 3 seconds before promoting to active, to allow topology events to be processed first osaf: add check for existing takeover request, before trying to lock etcdv3 plugin: reliablity improvements revision c7bc78656d5de11f6147727bd8612274fb6e438f Author: Gary Lee Date: Wed, 11 Apr 2018 17:16:46 +1000 rded: adapt to new Consensus API [#2795] - add 3 new internal message: RDE_MSG_NODE_UP RDE_MSG_NODE_DOWN RDE_MSG_TAKEOVER_REQUEST_CALLBACK - subscribe to AMFND service up events to keep track of the number of cluster members - listen for takeover requests in KV store revision 4899e5d0f5abdff8f15eca8ad17d3b13b6a00393 Author: Gary Lee Date: Wed, 11 Apr 2018 17:16:18 +1000 fmd: adapt to new Consensus API [#2795] revision 812a315af21df06b2f9fdcc3d8fd5b7bbad3e550 Author: Gary Lee Date: Wed, 11 Apr 2018 17:15:41 +1000 amfd: adapt to new Consensus API [#2795] revision b8a37c1b8965826e5faffbfebc44a84bdb6433a1 Author: Gary Lee Date: Wed, 11 Apr 2018 17:14:39 +1000 osaf: add lock takeover request fuction [#2795] - add create and set (if previous value matches) functions to KeyValue class - add Consensus::MonitorTakeoverRequest() function for use by RDE to answer takeover requests - add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, it will create a takeover request in the KV store. An existing SC can reject the lock takeover revision 955be872ba5887b1b521eac9f7732dd3f6afc593 Author: Gary Lee Date: Wed, 11 Apr 2018 17:13:45 +1000 osaf: extend API to include a create key and an enhanced set key function [#2795] - add create_key function (fails if key already exists) - add setkey_match_prev function (set value if previous value matches) - add missing quotes - add etcd3.plugin Added Files: src/osaf/consensus/plugins/etcd3.plugin Complete diffstat: -- src/amf/amfd/role.cc | 2 +- src/fm/fmd/fm_cb.h | 2 +- src/fm/fmd/fm_main.cc| 26 +- src/fm/fmd/fm_mds.cc | 2 + src/fm/fmd/fm_rda.cc | 27 +- src/osaf/consensus/consensus.cc | 435 ++- src/osaf/consensus/consensus.h | 55 +++- src/osaf/consensus/key_value.cc | 105 +--- src/osaf/consensus/key_value.h | 19 +- src/osaf/consensus/plugins/etcd.plugin | 86 +- src/osaf/consensus/plugins/etcd3.plugin | 366 ++ src/osaf/consensus/plugins/sample.plugin | 67 - src/rde/rded/rde_cb.h| 12 +- src/rde/rded/rde_main.cc | 75 -- src/rde/rded/rde_mds.cc | 39 ++- src/rde/rded/rde_rda.cc | 2 +- src/rde/rded/role.cc | 46 +++- src/rde/rded/role.h | 2 +- 18 files changed, 1180 insertions(+), 188 deletions(-) Testing Commands: - 1) SI swap of safSi=SC-2N,safApp=OpenSAF 2) Isolate standby cluster (eg. use iptables to block port 6700 on a TCP system) 3) Isolate active cluster Testing, Expected Results: -- 1) No error 2) Standby will fail to be promoted as active as the takeover request is rejected 3) Standby will be promoted Conditions of Submission: - Ack from any reviewer Arch Built StartedLinux distro ---
Re: [devel] [PATCH 0/5] Review Request for split-brain: select active SC from largest network partition V3 [#2795]
Hi On 12/04/18 23:34, Anders Widell wrote: Ack with comments: * There is no need to use "const" when passing function arguments by value. E.g. the argument "const uint64_t cluster_size" should be "uint64_t cluster_size". [GL] Sure, but it doesn't do any harm, and would stop accidental assignments (that would be lost anyway). * You assume that all nodes in the cluster have synchronized clocks (probably using NTP). Would it be possible to use an expiration time for the etcd key instead of writing a time stamp in the value, so that etcd automatically deletes the takeover request when it expires? That way we would not require synchronized clocks. [GL] Good idea. I did question why I hadn't use TTL/lease once I had finished the ticket. :-) Will see what I can do! regards, Anders Widell On 04/11/2018 09:35 AM, Gary Lee wrote: Summary: split-brain: select active SC from largest network partition V3 [#2795] Review request for Ticket(s): 2795 Peer Reviewer(s): Anders, Ravi, Hans Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-2795 Base revision: 1c302a300e449e8a8527671fbd6c7f4e2b41e95d Personal repository: git://git.code.sf.net/u/userid-2226215/review Impacted area Impact y/n Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services n OpenSAF services y Core libraries y Samples n Tests n Other n Comments (indicate scope for each "y" above): - *** Changes from V2: *** fmd: made cluster_size atomic fmd: wait 3 seconds before promoting to active, to allow topology events to be processed first osaf: add check for existing takeover request, before trying to lock etcdv3 plugin: reliablity improvements revision c7bc78656d5de11f6147727bd8612274fb6e438f Author: Gary Lee Date: Wed, 11 Apr 2018 17:16:46 +1000 rded: adapt to new Consensus API [#2795] - add 3 new internal message: RDE_MSG_NODE_UP RDE_MSG_NODE_DOWN RDE_MSG_TAKEOVER_REQUEST_CALLBACK - subscribe to AMFND service up events to keep track of the number of cluster members - listen for takeover requests in KV store revision 4899e5d0f5abdff8f15eca8ad17d3b13b6a00393 Author: Gary Lee Date: Wed, 11 Apr 2018 17:16:18 +1000 fmd: adapt to new Consensus API [#2795] revision 812a315af21df06b2f9fdcc3d8fd5b7bbad3e550 Author: Gary Lee Date: Wed, 11 Apr 2018 17:15:41 +1000 amfd: adapt to new Consensus API [#2795] revision b8a37c1b8965826e5faffbfebc44a84bdb6433a1 Author: Gary Lee Date: Wed, 11 Apr 2018 17:14:39 +1000 osaf: add lock takeover request fuction [#2795] - add create and set (if previous value matches) functions to KeyValue class - add Consensus::MonitorTakeoverRequest() function for use by RDE to answer takeover requests - add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, it will create a takeover request in the KV store. An existing SC can reject the lock takeover revision 955be872ba5887b1b521eac9f7732dd3f6afc593 Author: Gary Lee Date: Wed, 11 Apr 2018 17:13:45 +1000 osaf: extend API to include a create key and an enhanced set key function [#2795] - add create_key function (fails if key already exists) - add setkey_match_prev function (set value if previous value matches) - add missing quotes - add etcd3.plugin Added Files: src/osaf/consensus/plugins/etcd3.plugin Complete diffstat: -- src/amf/amfd/role.cc | 2 +- src/fm/fmd/fm_cb.h | 2 +- src/fm/fmd/fm_main.cc | 26 +- src/fm/fmd/fm_mds.cc | 2 + src/fm/fmd/fm_rda.cc | 27 +- src/osaf/consensus/consensus.cc | 435 ++- src/osaf/consensus/consensus.h | 55 +++- src/osaf/consensus/key_value.cc | 105 +--- src/osaf/consensus/key_value.h | 19 +- src/osaf/consensus/plugins/etcd.plugin | 86 +- src/osaf/consensus/plugins/etcd3.plugin | 366 ++ src/osaf/consensus/plugins/sample.plugin | 67 - src/rde/rded/rde_cb.h | 12 +- src/rde/rded/rde_main.cc | 75 -- src/rde/rded/rde_mds.cc | 39 ++- src/rde/rded/rde_rda.cc | 2 +- src/rde/rded/role.cc | 46 +++- src/rde/rded/role.h | 2 +- 18 files changed, 1180 insertions(+), 188 deletions(-) Testing Commands: - 1) SI swap of safSi=SC-2N,safApp=OpenSAF 2) Isolate standby cluster (eg. use iptables to block port 6700 on a TCP system) 3) Isolate active cluster Testing, Expected
[devel] [PATCH 1/1] clmd: pass rootCauseEntity from PLM tracking to CLM tracking clients [#2834]
CLM tracking clients have no context for the tracking callback. PLM rootCauseEntity is not passed by CLM to its own tracking clients. When CLM tracking is invoked because of PLM tracking, pass on the rootCauseEntity. --- src/clm/clmd/clms_evt.cc | 4 +-- src/clm/clmd/clms_imm.cc | 80 ++- src/clm/clmd/clms_imm.h | 9 -- src/clm/clmd/clms_plm.cc | 3 +- src/clm/clmd/clms_util.cc | 13 5 files changed, 69 insertions(+), 40 deletions(-) diff --git a/src/clm/clmd/clms_evt.cc b/src/clm/clmd/clms_evt.cc index 5c1ca992c..c7d2f4ceb 100644 --- a/src/clm/clmd/clms_evt.cc +++ b/src/clm/clmd/clms_evt.cc @@ -692,7 +692,7 @@ uint32_t proc_node_up_msg(CLMS_CB *cb, CLMSV_CLMS_EVT *evt) { node->init_view = ++(clms_cb->cluster_view_num); TRACE("node->init_view %llu", node->init_view); node->change = SA_CLM_NODE_JOINED; - clms_send_track(clms_cb, node, SA_CLM_CHANGE_COMPLETED, false); + clms_send_track(clms_cb, node, SA_CLM_CHANGE_COMPLETED, false, 0); /* Clear node->stat_change after sending the callback to * its clients */ node->stat_change = SA_FALSE; @@ -766,7 +766,7 @@ void clms_track_send_node_down(CLMS_CLUSTER_NODE *node) { node->stat_change = SA_TRUE; node->change = SA_CLM_NODE_LEFT; ++(clms_cb->cluster_view_num); - clms_send_track(clms_cb, node, SA_CLM_CHANGE_COMPLETED, true); + clms_send_track(clms_cb, node, SA_CLM_CHANGE_COMPLETED, true, 0); /* Clear node->stat_change after sending the callback to its clients */ node->stat_change = SA_FALSE; diff --git a/src/clm/clmd/clms_imm.cc b/src/clm/clmd/clms_imm.cc index 65a6ae11e..190509b05 100644 --- a/src/clm/clmd/clms_imm.cc +++ b/src/clm/clmd/clms_imm.cc @@ -1216,7 +1216,8 @@ static void clms_create_track_resp_list(CLMS_CLUSTER_NODE *node, * @param[in] stepCLM step for which to send the callback */ void clms_send_track(CLMS_CB *cb, CLMS_CLUSTER_NODE *node, - SaClmChangeStepT step, bool node_reboot) { + SaClmChangeStepT step, bool node_reboot, + const SaNameT *rootCauseEntity) { CLMS_CLIENT_INFO *rec; uint32_t client_id = 0; SaClmClusterNotificationT_4 *notify_changes = nullptr; @@ -1254,24 +1255,27 @@ void clms_send_track(CLMS_CB *cb, CLMS_CLUSTER_NODE *node, if (node_id == node->node_id) { /*Implies the change is * on this local node */ - rc = clms_send_track_local(node, rec, SA_CLM_CHANGE_START); + rc = clms_send_track_local(node, rec, SA_CLM_CHANGE_START, + rootCauseEntity); } } else rc = clms_prep_and_send_track(cb, node, rec, SA_CLM_CHANGE_START, - notify_changes_only); + notify_changes_only, + rootCauseEntity); } else if (rec->track_flags & SA_TRACK_CHANGES) { if (rec->track_flags & SA_TRACK_LOCAL) { if (node_id == node->node_id) { /*Implies the change is * on this local node */ - rc = clms_send_track_local(node, rec, SA_CLM_CHANGE_START); + rc = clms_send_track_local(node, rec, SA_CLM_CHANGE_START, + rootCauseEntity); } } else rc = clms_prep_and_send_track(cb, node, rec, SA_CLM_CHANGE_START, - notify_changes); + notify_changes, rootCauseEntity); } if (rc != NCSCC_RC_SUCCESS) { @@ -1295,24 +1299,26 @@ void clms_send_track(CLMS_CB *cb, CLMS_CLUSTER_NODE *node, if (node_id == node->node_id) { /*Implies the change is * on this local node */ - rc = clms_send_track_local(node, rec, SA_CLM_CHANGE_VALIDATE); + rc = clms_send_track_local(node, rec, SA_CLM_CHANGE_VALIDATE, + rootCauseEntity); } } else rc = clms_prep_and_send_track(cb, node, rec, SA_CLM_CHANGE_VALIDATE, - notify_changes_only); + notify_changes_only, rootCauseEntity); } else if (rec->track_flags & SA_TRACK_CHANGES) { if (rec->track_flags & SA_TRACK_LOCAL) { if (node_id == node->node_id) { /*Implies the change is * on this local node */ - rc = clms_send_track_local(node, rec, SA_CLM_CHANGE_VALIDATE); + rc = clms_send_track_local(node, rec, SA_CLM_CHANGE_VALIDATE, + rootCauseEntity); } } else rc = clms_prep_and_send_track(cb, node, rec, SA_CLM_CHANGE_VALIDATE
[devel] [PATCH 0/1] Review Request for clmd: pass rootCauseEntity from PLM tracking to CLM tracking clients [#2834]
Summary: clmd: pass rootCauseEntity from PLM tracking to CLM tracking clients [#2834] Review request for Ticket(s): 2834 Peer Reviewer(s): Anders, Mathi Pull request to: Affected branch(es): develop Development branch: ticket-2834 Base revision: aff54ff091727f27830443332b830890668749cf Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision 62589757a43679a24bc4c1f863a68346a23b5a37 Author: Alex Jones Date: Thu, 12 Apr 2018 10:53:19 -0400 clmd: pass rootCauseEntity from PLM tracking to CLM tracking clients [#2834] CLM tracking clients have no context for the tracking callback. PLM rootCauseEntity is not passed by CLM to its own tracking clients. When CLM tracking is invoked because of PLM tracking, pass on the rootCauseEntity. Complete diffstat: -- src/clm/clmd/clms_evt.cc | 4 +-- src/clm/clmd/clms_imm.cc | 80 ++- src/clm/clmd/clms_imm.h | 9 -- src/clm/clmd/clms_plm.cc | 3 +- src/clm/clmd/clms_util.cc | 13 5 files changed, 69 insertions(+), 40 deletions(-) Testing Commands: - 1) Create a CLM tracking client. 2) Using PLM, lock a parent (host) EE, that also has child EEs. Testing, Expected Results: -- 1) rootCauseEntity of host should be passed in the tracking callback 2) all EEs (child and parent) should be present in the notification Conditions of Submission: - Apr 18 or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for plmd: use virDomainDestroy and virDomainCreate to reset VM [#2836]
Summary: plmd: use virDomainDestroy and virDomainCreate to reset VM [#2836] Review request for Ticket(s): 2836 Peer Reviewer(s): Mathi, Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2836 Base revision: b13a65123bfddcc6f5105fe340131e3bd8a5ac70 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision 56a0e35daf04083c5fb76270dbf0163b03500d58 Author: Alex Jones Date: Thu, 12 Apr 2018 13:05:19 -0400 plmd: use virDomainDestroy and virDomainCreate to reset VM [#2836] Abrupt restart or unlock-in of child EE does not always work. virDomainReset() does not always work. Use virDomainDestroy() and virDomainCreate() instead. Complete diffstat: -- src/plm/plmd/plms_virt.cc | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) Testing Commands: - 1) Do lock/lock-in/unlock-in/unlock of child EE 50 times. Testing, Expected Results: -- 1) The commands should never fail. Conditions of Submission: - Apr 18 or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/1] plmd: use virDomainDestroy and virDomainCreate to reset VM [#2836]
Abrupt restart or unlock-in of child EE does not always work. virDomainReset() does not always work. Use virDomainDestroy() and virDomainCreate() instead. --- src/plm/plmd/plms_virt.cc | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/src/plm/plmd/plms_virt.cc b/src/plm/plmd/plms_virt.cc index 2fd735ac0..0bf11e5a8 100644 --- a/src/plm/plmd/plms_virt.cc +++ b/src/plm/plmd/plms_virt.cc @@ -922,8 +922,20 @@ int PlmsVm::instantiate(virDomainPtr domain) { } int PlmsVm::restart(virDomainPtr domain) { - TRACE("calling virDomainReset to restart vm"); - return virDomainReset(domain, 0); + TRACE("calling virDomainDestroy and virDomainCreate to restart vm"); + int rc(-1); + + do { +rc = virDomainDestroy(domain); + +if (rc < 0) break; + +rc = virDomainCreate(domain); + +if (rc < 0) break; + } while (false); + + return rc; } int PlmsVm::isolate(virDomainPtr domain) { -- 2.13.6 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]
Hi Ravi, I think stonith, implemented in ticket #1859, handles this case. This "flickering" was one the (manual) tests verifying the added stonith support. It is important to have a separate interface for stonith, to be able to perform the remote fencing, similar to use a back plane. Have you tested with stonith enabled? /Regards HansN Från: ravi-sekhar Skickat: den 12 april 2018 15:29:13 Till: Hans Nordebäck; Anders Widell Kopia: opensaf-devel@lists.sourceforge.net; ravi-sekhar Ämne: [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833] --- scripts/opensaf_reboot | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot index df65c26..b219c39 100644 --- a/scripts/opensaf_reboot +++ b/scripts/opensaf_reboot @@ -37,6 +37,9 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH if [ -f "$pkgsysconfdir/fmd.conf" ]; then . "$pkgsysconfdir/fmd.conf" fi +if [ -f "$pkgsysconfdir/nid.conf" ]; then + . "$pkgsysconfdir/nid.conf" +fi NODE_ID_FILE=$pkglocalstatedir/node_id @@ -118,7 +121,17 @@ else # uncomment the following line if debugging errors that keep restarting the node # exit 0 +# If the application is using different interface for cluster communication, please +# add your application specific isolation commands here + logger -t "opensaf_reboot" "Rebooting local node; timeout=$OPENSAF_REBOOT_TIMEOUT" + +# Isolate the node +if [ "$MDS_TRANSPORT" = "TIPC" ]; then + tipc-config -bd eth:$TIPC_ETH_IF +else + $icmd pkill -STOP osafdtmd +fi # Start a reboot supervision background process. Note that a similar # supervision is also done in the opensaf_reboot() function in LEAP. @@ -128,12 +141,6 @@ else (sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" > "/proc/sysrq-trigger") & fi - # Stop some important opensaf processes to prevent bad things from happening - $icmd pkill -STOP osafamfwd - $icmd pkill -STOP osafamfnd - $icmd pkill -STOP osafamfd - $icmd pkill -STOP osaffmd - # Flush OpenSAF internal log server messages to disk. $bindir/osaflog --flush -- 1.9.1 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] imm: make version parameter in immutil_xxx non-const [#2830]
Thanks Lennart. I will update the code as your comments before pushing. @Ravi: Any comment from you? Thanks. Regards, Vu > -Original Message- > From: Lennart Lund [mailto:lennart.l...@ericsson.com] > Sent: Thursday, April 12, 2018 8:32 PM > To: Vu Minh Nguyen ; > ravisekhar.ko...@oracle.com; Hans Nordebäck > ; Anders Widell > > Cc: opensaf-devel@lists.sourceforge.net; Lennart Lund > > Subject: RE: [PATCH 1/1] imm: make version parameter in immutil_xxx non- > const [#2830] > > > > > -Original Message- > > From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] > > Sent: den 12 april 2018 12:37 > > To: Lennart Lund ; > > ravisekhar.ko...@oracle.com; Hans Nordebäck > > ; Anders Widell > > > > Cc: opensaf-devel@lists.sourceforge.net > > Subject: RE: [PATCH 1/1] imm: make version parameter in immutil_xxx non- > > const [#2830] > > > > Hi Lennart, > > > > See my response inline, started with [Vu]. > > > > Regards, Vu > > > > > -Original Message- > > > From: Lennart Lund [mailto:lennart.l...@ericsson.com] > > > Sent: Thursday, April 12, 2018 5:13 PM > > > To: Vu Minh Nguyen ; > > > ravisekhar.ko...@oracle.com; Hans Nordebäck > > > ; Anders Widell > > > > > > Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen > > > > > > Subject: RE: [PATCH 1/1] imm: make version parameter in immutil_xxx > > non- > > > const [#2830] > > > > > > Hi Vu, > > > > > > Ack, but an important issue to fix. See comments below [Lennart] > > > > > > Thanks > > > Lennart > > > > > > > -Original Message- > > > > From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] > > > > Sent: den 5 april 2018 12:39 > > > > To: ravisekhar.ko...@oracle.com; Hans Nordebäck > > > > ; Anders Widell > > > > ; Lennart Lund > > > > > > > Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen > > > > > > > > Subject: [PATCH 1/1] imm: make version parameter in immutil_xxx non- > > > > const [#2830] > > > > > > > > The version in saImmO{m,i}Initialize is input/output parameter and is > > > > declared > > > > as non-constant for both IMM OM and OI API according to SAF spec. > > > > But in immutil wrapper library, some are declared as constant and don't > > > > update > > > > the in/out version before returning from such wrappers. > > > > > > > > This patch makes that parameter non-const and do update the version > > > > before returning from wrapper APIs. Also fix the wrong usage of > > > > these wrapper, passed const version, in some services/applications. > > > > --- > > > > src/amf/amfd/imm.cc | 11 +++ > > > > src/amf/amfnd/util.cc| 3 ++- > > > > src/log/apitest/imm_tstutil.c| 5 - > > > > src/log/apitest/logtest.c| 9 ++--- > > > > src/log/apitest/logtestfr.c | 6 -- > > > > src/log/apitest/tet_log_runtime_cfgobj.c | 3 ++- > > > > src/log/logd/lgs_config.cc | 3 ++- > > > > src/log/logd/lgs_imm.cc | 15 ++- > > > > src/log/logd/lgs_imm_gcfg.cc | 7 +-- > > > > src/osaf/immutil/immutil.c | 20 +++- > > > > src/osaf/immutil/immutil.h | 6 +++--- > > > > src/smf/smfd/SmfAdminState.cc| 4 ++-- > > > > src/smf/smfd/SmfExecControlHdl.cc| 3 ++- > > > > 13 files changed, 64 insertions(+), 31 deletions(-) > > > > > > > > diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc > > > > index 47c0e5a..8c70325 100644 > > > > --- a/src/amf/amfd/imm.cc > > > > +++ b/src/amf/amfd/imm.cc > > > > @@ -1461,6 +1461,7 @@ done: > > > > SaAisErrorT avd_imm_init(void *avd_cb) { > > > >SaAisErrorT error = SA_AIS_OK; > > > >AVD_CL_CB *cb = (AVD_CL_CB *)avd_cb; > > > > + SaVersionT local_version = immVersion; > > > > > > > >TRACE_ENTER(); > > > > > > > > @@ -1471,13 +1472,13 @@ SaAisErrorT avd_imm_init(void *avd_cb) { > > > > > > > >cb->avd_imm_status = AVD_IMM_INIT_ONGOING; > > > >if ((error = immutil_saImmOiInitialize_2(&cb->immOiHandle, > > > > &avd_callbacks, > > > > - &immVersion)) != SA_AIS_OK) > > { > > > > + &local_version)) != > > SA_AIS_OK) { > > > > LOG_ER("saImmOiInitialize failed %u", error); > > > > goto done; > > > >} > > > > > > > [Lennart] local_version has to be set again to immVersion here. This is > > the > > > case in all functions where initialize using an IMM version is done more > > than > > > once. In this case both OM and OI is initialized. In some cases it may be > > a try > > > again loop where an initialize API may be called several times. > > > Note: I have only written this comment once but it may be applicable in > > > several places. > > [Vu] I don't think it is needed here as `local_version` is backup and set > > again inside that immutil_saImmOiInitialize_2() wrapper. > > Unless immutil_saImmOiInitialize_2() is called inside a loop(*), we don't > > ne
Re: [devel] [PATCH 0/5] Review Request for split-brain: select active SC from largest network partition V3 [#2795]
Hi, On 04/12/2018 04:15 PM, Gary Lee wrote: Hi On 12/04/18 23:34, Anders Widell wrote: Ack with comments: * There is no need to use "const" when passing function arguments by value. E.g. the argument "const uint64_t cluster_size" should be "uint64_t cluster_size". [GL] Sure, but it doesn't do any harm, and would stop accidental assignments (that would be lost anyway). [HansN] perhaps these functions should be const member functions? E.g. SaAisErrorT PromoteThisNode(bool graceful_takeover, uint64_t cluster_size) const; * You assume that all nodes in the cluster have synchronized clocks (probably using NTP). Would it be possible to use an expiration time for the etcd key instead of writing a time stamp in the value, so that etcd automatically deletes the takeover request when it expires? That way we would not require synchronized clocks. [GL] Good idea. I did question why I hadn't use TTL/lease once I had finished the ticket. :-) Will see what I can do! regards, Anders Widell On 04/11/2018 09:35 AM, Gary Lee wrote: Summary: split-brain: select active SC from largest network partition V3 [#2795] Review request for Ticket(s): 2795 Peer Reviewer(s): Anders, Ravi, Hans Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-2795 Base revision: 1c302a300e449e8a8527671fbd6c7f4e2b41e95d Personal repository: git://git.code.sf.net/u/userid-2226215/review Impacted area Impact y/n Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services n OpenSAF services y Core libraries y Samples n Tests n Other n Comments (indicate scope for each "y" above): - *** Changes from V2: *** fmd: made cluster_size atomic fmd: wait 3 seconds before promoting to active, to allow topology events to be processed first osaf: add check for existing takeover request, before trying to lock etcdv3 plugin: reliablity improvements revision c7bc78656d5de11f6147727bd8612274fb6e438f Author: Gary Lee Date: Wed, 11 Apr 2018 17:16:46 +1000 rded: adapt to new Consensus API [#2795] - add 3 new internal message: RDE_MSG_NODE_UP RDE_MSG_NODE_DOWN RDE_MSG_TAKEOVER_REQUEST_CALLBACK - subscribe to AMFND service up events to keep track of the number of cluster members - listen for takeover requests in KV store revision 4899e5d0f5abdff8f15eca8ad17d3b13b6a00393 Author: Gary Lee Date: Wed, 11 Apr 2018 17:16:18 +1000 fmd: adapt to new Consensus API [#2795] revision 812a315af21df06b2f9fdcc3d8fd5b7bbad3e550 Author: Gary Lee Date: Wed, 11 Apr 2018 17:15:41 +1000 amfd: adapt to new Consensus API [#2795] revision b8a37c1b8965826e5faffbfebc44a84bdb6433a1 Author: Gary Lee Date: Wed, 11 Apr 2018 17:14:39 +1000 osaf: add lock takeover request fuction [#2795] - add create and set (if previous value matches) functions to KeyValue class - add Consensus::MonitorTakeoverRequest() function for use by RDE to answer takeover requests - add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, it will create a takeover request in the KV store. An existing SC can reject the lock takeover revision 955be872ba5887b1b521eac9f7732dd3f6afc593 Author: Gary Lee Date: Wed, 11 Apr 2018 17:13:45 +1000 osaf: extend API to include a create key and an enhanced set key function [#2795] - add create_key function (fails if key already exists) - add setkey_match_prev function (set value if previous value matches) - add missing quotes - add etcd3.plugin Added Files: src/osaf/consensus/plugins/etcd3.plugin Complete diffstat: -- src/amf/amfd/role.cc | 2 +- src/fm/fmd/fm_cb.h | 2 +- src/fm/fmd/fm_main.cc | 26 +- src/fm/fmd/fm_mds.cc | 2 + src/fm/fmd/fm_rda.cc | 27 +- src/osaf/consensus/consensus.cc | 435 ++- src/osaf/consensus/consensus.h | 55 +++- src/osaf/consensus/key_value.cc | 105 +--- src/osaf/consensus/key_value.h | 19 +- src/osaf/consensus/plugins/etcd.plugin | 86 +- src/osaf/consensus/plugins/etcd3.plugin | 366 ++ src/osaf/consensus/plugins/sample.plugin | 67 - src/rde/rded/rde_cb.h | 12 +- src/rde/rded/rde_main.cc | 75 -- src/rde/rded/rde_mds.cc | 39 ++- src/rde/rded/rde_rda.cc | 2 +- src/rde/rded/role.cc | 46 +++- src/rde/rded/role.h | 2 +- 18 files changed, 1180 insertions(+), 188 deletions(-) T