Re: [devel] [PATCH 1/1] imm: inform status to NID before exit during start-up phrase [#2845]

2018-05-09 Thread Vu Minh Nguyen
Thanks Hans,

I modified code a bit to test this change, and found that immnd was
respawned successfully as we expected.
Here are the syslogs:

May 10 10:20:09 PL-3 local0.notice osafimmnd[436]: NO SERVER STATE:
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
May 10 10:20:09 PL-3 local0.notice osafimmnd[436]: NO NODE STATE->
IMM_NODE_ISOLATED
May 10 10:20:10 PL-3 local0.notice osafimmnd[436]: NO NODE STATE->
IMM_NODE_W_AVAILABLE
May 10 10:20:10 PL-3 local0.notice osafimmnd[436]: NO SERVER STATE:
IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
May 10 10:20:10 PL-3 local0.err osafimmnd[436]: ER Sync MESSAGE:1470 OUT OF
ORDER my highest processed:1469 - exiting
May 10 10:20:10 PL-3 local0.err opensafd[400]: ER Failed   DESC:IMMND
May 10 10:20:10 PL-3 local0.err opensafd[400]: ER Going for recovery
May 10 10:20:10 PL-3 local0.err opensafd[400]: ER Trying To RESPAWN
/usr/local/lib/opensaf/clc-cli/osaf-immnd attempt #1
May 10 10:20:25 PL-3 local0.info osafimmnd[464]: mkfifo already exists:
/var/lib/opensaf/osafimmnd.fifo File exists
May 10 10:20:25 PL-3 local0.notice osafimmnd[464]: Started
...

Regards, Vu

> -Original Message-
> From: Hans Nordebäck 
> Sent: Wednesday, May 9, 2018 4:12 PM
> To: Vu Minh Nguyen ;
> ravisekhar.ko...@oracle.com; Anders Widell
> ; Lennart Lund 
> Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen
> 
> Subject: RE: [PATCH 1/1] imm: inform status to NID before exit during
start-
> up phrase [#2845]
> 
> Hi Vu,
> 
> I'll revise my comment a bit, before sending nid_notify, the fifo
monitoring is
> not started. So removing the exit should not be necessary, good if
> you can test this.
> 
> /Thanks HansN
> 
> -Original Message-
> From: Hans Nordebäck
> Sent: den 8 maj 2018 15:06
> To: 'Vu Minh Nguyen' ;
> ravisekhar.ko...@oracle.com; Anders Widell
> ; Lennart Lund 
> Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen
> 
> Subject: RE: [PATCH 1/1] imm: inform status to NID before exit during
start-
> up phrase [#2845]
> 
> Hi Vu,
> 
> Ack review only with one comment. If the exit() is called after
> immnd_ackToNid() the fifo monitoring in nodeinit.cc will be activated.
> I think you should remove the exit().
> /Thanks HansN
> 
> -Original Message-
> From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au]
> Sent: den 3 maj 2018 12:20
> To: ravisekhar.ko...@oracle.com; Hans Nordebäck
> ; Anders Widell
> ; Lennart Lund 
> Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen
> 
> Subject: [PATCH 1/1] imm: inform status to NID before exit during start-up
> phrase [#2845]
> 
> During node starts up phrase, which AMFD has not been come up, there is a
> case IMMND exit without informing failure result to NID (refer to the
ticket
> to see syslog). As the result, IMMND may not be respawned by NID process.
> 
> This patch ensures that NID is informed before exit.
> ---
>  src/imm/immnd/immnd_evt.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c
> index 8f3af92..2b9123d 100644
> --- a/src/imm/immnd/immnd_evt.c
> +++ b/src/imm/immnd/immnd_evt.c
> @@ -10779,6 +10779,7 @@ static uint32_t
> immnd_evt_proc_fevs_rcv(IMMND_CB *cb, IMMND_EVT *evt,
>   LOG_ER(
>   "MESSAGE:%llu OUT OF ORDER my highest
> processed:%llu - exiting",
>   msgNo, cb->highestProcessed);
> + immnd_ackToNid(NCSCC_RC_FAILURE);
>   exit(1);
>   } else if (cb
>  ->mSync) { /* If we receive out of sync
message
> --
> 1.9.1



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] ntf: remove sending SIGTERM and SIGABRT when killing osafntfimcnd [#2851]

2018-05-09 Thread Canh Van Truong
The handler of SIGTERM is just _Exit,  so using SIGTERM/SIGABRT is not needed.
The patch changes to send SIGKILL when terminating osafntfimcnd process.
---
 src/ntf/ntfd/ntfs_imcnutil.c| 100 
 src/ntf/ntfimcnd/ntfimcn_main.c |  18 
 2 files changed, 29 insertions(+), 89 deletions(-)

diff --git a/src/ntf/ntfd/ntfs_imcnutil.c b/src/ntf/ntfd/ntfs_imcnutil.c
index 00c2c0039..69c5d2e45 100644
--- a/src/ntf/ntfd/ntfs_imcnutil.c
+++ b/src/ntf/ntfd/ntfs_imcnutil.c
@@ -51,37 +51,6 @@ typedef struct {
 static init_params_t ipar = {0, 0, 0, false};
 pthread_mutex_t ntfimcn_mutex;
 
-/**
- * Kill the osafntfimcn child process using previously saved Pid
- * Error handled
- *
- * @param signal A valid signal
- * @return -1 on error (kill failed
- */
-static void kill_imcnproc(int signal)
-{
-   int rc = 0;
-
-   rc = kill(ipar.pid, signal);
-   if (rc == -1) {
-   if (errno == EPERM) {
-   /* We don't have permission to kill the process! */
-   LOG_ER(
-   "Child process osafntfimcn could not be killed: %s",
-   strerror(errno));
-   } else if (errno == ESRCH) {
-   /* The process was not found */
-   LOG_NO(
-   "Child process osafntfimcn could not be killed: %s",
-   strerror(errno));
-   } else {
-   /* EINVAL Invalid or unsupported signal number */
-   LOG_ER("%s Fatal error when killing %s", __FUNCTION__,
-  strerror(errno));
-   osaf_abort(0);
-   }
-   }
-}
 
 /**
  * Timed out wait for the imcnproc to terminate.
@@ -137,58 +106,47 @@ static int wait_imcnproc_termination(void)
 }
 
 /**
- * Wait for imcn process to exit.
- *  1.Send SIGTERM to the imcn process
- *  - Wait for process to terminate
- *~ Prevents "zombie" process
- *~ Guaranty that process is terminated before return
- *If the process does not terminate after a timeout the
- *termination escalates with the following steps:
- *  2. Abort signal is sent to imcn process
- * Wait for termination. If fail do next step.
- *  3. Send SIGKILL
- * Wait for termination. If no termination is detected, ignore and
- * continue NTF. Something is wrong and we may end up with a 'zombie'
- * process but there is nothing more to do except aborting NTF causing a
- * node restart.
- *
- * Also handle the case that the process already is killed before
- * this function is called.
+ * Send SIGKILL to osafntfimcnd then waiting for termination. If no termination
+ * is detected, ignore and continue NTF. Something is wrong and we may end up
+ * with a 'zombie' process but there is nothing more to do  except aborting NTF
+ * causing a node restart.
  *
  */
-static void timedwait_imcn_exit(void)
+static void kill_imcnproc_in_timewait(void)
 {
int rc = 0;
TRACE_ENTER();
 
-   /* Send SIGTERM for normal termination */
-   kill_imcnproc(SIGTERM);
-   rc = wait_imcnproc_termination();
-   if (rc == 0) {
-   TRACE("\tImcn successfully terminated");
-   goto done;
-   }
-
-   /* Normal termination has timed out. Escalate to step 2 */
-   TRACE("\tNormal termination failed. Escalate to abort");
-   kill_imcnproc(SIGABRT);
-   rc = wait_imcnproc_termination();
-   if (rc == 0) {
-   TRACE("\tImcn successfully aborted");
-   goto done;
+   LOG_NO("%s: SIGKILL sent to osafntfimcnd process pid = %d",
+  __FUNCTION__, ipar.pid);
+   rc = kill(ipar.pid, SIGKILL);
+   if (rc == -1) {
+   if (errno == EPERM) {
+   /* We don't have permission to kill the process! */
+   LOG_ER(
+   "Child process osafntfimcn could not be killed: %s",
+   strerror(errno));
+   } else if (errno == ESRCH) {
+   /* The process was not found */
+   LOG_NO(
+   "Child process osafntfimcn could not be killed: %s",
+   strerror(errno));
+   } else {
+   /* EINVAL Invalid or unsupported signal number */
+   LOG_ER("%s Fatal error when killing %s", __FUNCTION__,
+  strerror(errno));
+   osaf_abort(0);
+   }
}
 
-   /* Abort has also timed out. Escalate to step 3 */
-   TRACE("\tNormal termination failed. Escalate to kill");
-   kill_imcnproc(SIGKILL);
rc = wait_imcnproc_termination();
if (rc == 0) {
TRACE("\tImcn successfully killed");
} else {
-   LOG_WA("The osafntfimcnd process could not be killed!");

[devel] [PATCH 0/1] Review Request for ntf: remove sending SIGTERM and SIGABRT when killing osafntfimcnd V2 [#2851]

2018-05-09 Thread Canh Van Truong
Summary: ntf: remove sending SIGTERM and SIGABRT when killing osafntfimcnd 
[#2851]
Review request for Ticket(s): 2851
Peer Reviewer(s): Lennart, Vu
Pull request to: Vu 
Affected branch(es): develop, release
Development branch: ticket-2851
Base revision: fdb037f450570582df20d603193cffe1eff8b107
Personal repository: git://git.code.sf.net/u/canht32/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 6f811f16f7bcaca6b67dd624f8a184ca4135ab9d
Author: Canh Van Truong 
Date:   Wed, 9 May 2018 21:32:48 +0700

ntf: remove sending SIGTERM and SIGABRT when killing osafntfimcnd [#2851]

The handler of SIGTERM is just _Exit,  so using SIGTERM/SIGABRT is not needed.
The patch changes to send SIGKILL when terminating osafntfimcnd process.



Complete diffstat:
--
 src/ntf/ntfd/ntfs_imcnutil.c| 100 
 src/ntf/ntfimcnd/ntfimcn_main.c |  18 
 2 files changed, 29 insertions(+), 89 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] ntf: remove sending SIGTERM and SIGABRT when killing osafntfimcnd [#2851]

2018-05-09 Thread Lennart Lund
Hi Canh,

You have removed the code that makes ntf wait for ntfimcn to terminate before 
returning from CSI set callback when doing a switch over. This will probably 
work but I don't think it is a good idea.
The only change needed is to remove kill_imcnproc(SIGTERM) and 
kill_imcnproc(SIGABRT) from the timedwait_imcn_exit() function and add a
LOG_NO("%s: NTF sent SIGKILL to ntfimcn process pid [%d]); just before calling 
kill_imcnproc(SIGKILL).

Removing waiting for ntfimcn process to terminate in handle_state_ntfimcn() 
means that when doing switch over ntfimcn on the new active may start before 
(the old) ntfimcn on the to become standby node is terminated. If this happen 
the special applier is not released when the new ntfimcn wants to be applier. 
For now two special applier names exist and if ntfimcn cannot use one it will 
try to use the other one so this may still work. If the wait mechanism is kept 
it is possible to remove using two special appliers in ntfimcn so the other one 
can be used for something else or. There is also a risk that someone in the 
future removes the applier name handling without thinking about the termination 
timing problem and gets a hard to find intermittent problem.

My conclusion:
Keep the wait for termination logic and go for the simpler solution described 
above

Thanks
Lennart

> -Original Message-
> From: Canh Van Truong [mailto:canh.v.tru...@dektech.com.au]
> Sent: den 9 maj 2018 10:30
> To: Lennart Lund ; Vu Minh Nguyen
> ; Minh Hon Chau
> 
> Cc: opensaf-devel@lists.sourceforge.net; Canh Van Truong
> 
> Subject: [PATCH 1/1] ntf: remove sending SIGTERM and SIGABRT when killing
> osafntfimcnd [#2851]
> 
> The handler of SIGTERM is just _Exit,  so using SIGTERM/SIGABRT is not
> needed.
> The patch changes to send SIGKILL when terminating osafntfimcnd process.
> ---
>  src/ntf/ntfd/ntfs_imcnutil.c| 136 
> +++-
>  src/ntf/ntfimcnd/ntfimcn_main.c |  18 --
>  2 files changed, 8 insertions(+), 146 deletions(-)
> 
> diff --git a/src/ntf/ntfd/ntfs_imcnutil.c b/src/ntf/ntfd/ntfs_imcnutil.c
> index 00c2c0039..8e9db29ea 100644
> --- a/src/ntf/ntfd/ntfs_imcnutil.c
> +++ b/src/ntf/ntfd/ntfs_imcnutil.c
> @@ -55,27 +55,19 @@ pthread_mutex_t ntfimcn_mutex;
>   * Kill the osafntfimcn child process using previously saved Pid
>   * Error handled
>   *
> - * @param signal A valid signal
> - * @return -1 on error (kill failed
>   */
> -static void kill_imcnproc(int signal)
> +static void kill_imcnproc()
>  {
> - int rc = 0;
> -
> - rc = kill(ipar.pid, signal);
> + LOG_NO("%s: Terminating osafntfimcnd (%d) process",
> +__FUNCTION__, ipar.pid);
> + int rc = kill(ipar.pid, SIGKILL);
>   if (rc == -1) {
>   if (errno == EPERM) {
> - /* We don't have permission to kill the process! */
> + // We don't have permission to kill the process!
>   LOG_ER(
>   "Child process osafntfimcn could not be killed: %s",
>   strerror(errno));
> - } else if (errno == ESRCH) {
> - /* The process was not found */
> - LOG_NO(
> - "Child process osafntfimcn could not be killed: %s",
> - strerror(errno));
> - } else {
> - /* EINVAL Invalid or unsupported signal number */
> + } else if (errno != ESRCH) {
>   LOG_ER("%s Fatal error when killing %s",
> __FUNCTION__,
>  strerror(errno));
>   osaf_abort(0);
> @@ -83,115 +75,6 @@ static void kill_imcnproc(int signal)
>   }
>  }
> 
> -/**
> - * Timed out wait for the imcnproc to terminate.
> - *
> - * @return -1 If wait timed out
> - */
> -static int wait_imcnproc_termination(void)
> -{
> - struct timespec wait_poll;
> - pid_t rc_pid = (pid_t)0;
> - int retry_cnt = 0;
> - const int max_retry = 100;
> - const uint64_t wait_poll_ms = 100;
> - int status = 0;
> - int rc = 0;
> -
> - TRACE_ENTER();
> - osaf_millis_to_timespec(wait_poll_ms, _poll);
> -
> - /* Wait for child process to terminate */
> - retry_cnt = 0;
> - do {
> - rc_pid = waitpid(ipar.pid, , WNOHANG);
> - if (rc_pid == -1) {
> - if (errno == EINTR) {
> - TRACE("\tGot EINTR continue waiting");
> - rc_pid = (pid_t)0;
> - } else if (errno == ECHILD) {
> - TRACE("\t Done, the process is terminated");
> - rc = 0;
> - break;
> - } else {
> - /* EINVAL Invalid option */
> -   

Re: [devel] [PATCH 1/1] ntf: remove sending SIGTERM and SIGABRT when killing osafntfimcnd [#2851]

2018-05-09 Thread Canh Van Truong
Hi aVu,

"[Vu] Use "else {" instead (without if) in order to cover all error cases?"

Look like function "KillProc()", I think we don't need handle printout in
the case with the error "ESRCH" (The process or process group specified in
pid cannot be found) ?

Regards
Canh

-Original Message-
From: Vu Minh Nguyen  
Sent: Wednesday, May 9, 2018 4:33 PM
To: 'Canh Van Truong' ;
lennart.l...@ericsson.com; minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] ntf: remove sending SIGTERM and SIGABRT when
killing osafntfimcnd [#2851]

Hi Canh,

Ack with a minor comment, started with [Vu].

Regards, Vu

> -Original Message-
> From: Canh Van Truong 
> Sent: Wednesday, May 9, 2018 3:30 PM
> To: lennart.l...@ericsson.com; vu.m.ngu...@dektech.com.au;
> minh.c...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net; Canh Van Truong
> 
> Subject: [PATCH 1/1] ntf: remove sending SIGTERM and SIGABRT when killing
> osafntfimcnd [#2851]
> 
> The handler of SIGTERM is just _Exit,  so using SIGTERM/SIGABRT is not
> needed.
> The patch changes to send SIGKILL when terminating osafntfimcnd process.
> ---
>  src/ntf/ntfd/ntfs_imcnutil.c| 136
+++-
>  src/ntf/ntfimcnd/ntfimcn_main.c |  18 --
>  2 files changed, 8 insertions(+), 146 deletions(-)
> 
> diff --git a/src/ntf/ntfd/ntfs_imcnutil.c b/src/ntf/ntfd/ntfs_imcnutil.c
> index 00c2c0039..8e9db29ea 100644
> --- a/src/ntf/ntfd/ntfs_imcnutil.c
> +++ b/src/ntf/ntfd/ntfs_imcnutil.c
> @@ -55,27 +55,19 @@ pthread_mutex_t ntfimcn_mutex;
>   * Kill the osafntfimcn child process using previously saved Pid
>   * Error handled
>   *
> - * @param signal A valid signal
> - * @return -1 on error (kill failed
>   */
> -static void kill_imcnproc(int signal)
> +static void kill_imcnproc()
>  {
> - int rc = 0;
> -
> - rc = kill(ipar.pid, signal);
> + LOG_NO("%s: Terminating osafntfimcnd (%d) process",
> +__FUNCTION__, ipar.pid);
> + int rc = kill(ipar.pid, SIGKILL);
>   if (rc == -1) {
>   if (errno == EPERM) {
> - /* We don't have permission to kill the process! */
> + // We don't have permission to kill the process!
>   LOG_ER(
>   "Child process osafntfimcn could not be killed:
%s",
>   strerror(errno));
> - } else if (errno == ESRCH) {
> - /* The process was not found */
> - LOG_NO(
> - "Child process osafntfimcn could not be killed:
%s",
> - strerror(errno));
> - } else {
> - /* EINVAL Invalid or unsupported signal number */
> + } else if (errno != ESRCH) {
[Vu] Use "else {" instead (without if) in order to cover all error cases?
>   LOG_ER("%s Fatal error when killing %s",
> __FUNCTION__,
>  strerror(errno));
>   osaf_abort(0);
> @@ -83,115 +75,6 @@ static void kill_imcnproc(int signal)
>   }
>  }
> 
> -/**
> - * Timed out wait for the imcnproc to terminate.
> - *
> - * @return -1 If wait timed out
> - */
> -static int wait_imcnproc_termination(void)
> -{
> - struct timespec wait_poll;
> - pid_t rc_pid = (pid_t)0;
> - int retry_cnt = 0;
> - const int max_retry = 100;
> - const uint64_t wait_poll_ms = 100;
> - int status = 0;
> - int rc = 0;
> -
> - TRACE_ENTER();
> - osaf_millis_to_timespec(wait_poll_ms, _poll);
> -
> - /* Wait for child process to terminate */
> - retry_cnt = 0;
> - do {
> - rc_pid = waitpid(ipar.pid, , WNOHANG);
> - if (rc_pid == -1) {
> - if (errno == EINTR) {
> - TRACE("\tGot EINTR continue waiting");
> - rc_pid = (pid_t)0;
> - } else if (errno == ECHILD) {
> - TRACE("\t Done, the process is terminated");
> - rc = 0;
> - break;
> - } else {
> - /* EINVAL Invalid option */
> - LOG_ER("%s Internal error %s", __FUNCTION__,
> -strerror(errno));
> - osaf_abort(0);
> - }
> - }
> -
> - /* No status available. Wait and try again */
> - retry_cnt++;
> - if (retry_cnt <= max_retry) {
> - osaf_nanosleep(_poll);
> - } else {
> - TRACE("\tTermination timeout");
> - rc = -1;
> - break;
> - }
> - } while (rc_pid == 0);
> -
> - TRACE_LEAVE2("rc = %d, 

Re: [devel] [PATCH 1/1] ntf: remove sending SIGTERM and SIGABRT when killing osafntfimcnd [#2851]

2018-05-09 Thread Vu Minh Nguyen
Hi Canh,

Ack with a minor comment, started with [Vu].

Regards, Vu

> -Original Message-
> From: Canh Van Truong 
> Sent: Wednesday, May 9, 2018 3:30 PM
> To: lennart.l...@ericsson.com; vu.m.ngu...@dektech.com.au;
> minh.c...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net; Canh Van Truong
> 
> Subject: [PATCH 1/1] ntf: remove sending SIGTERM and SIGABRT when killing
> osafntfimcnd [#2851]
> 
> The handler of SIGTERM is just _Exit,  so using SIGTERM/SIGABRT is not
> needed.
> The patch changes to send SIGKILL when terminating osafntfimcnd process.
> ---
>  src/ntf/ntfd/ntfs_imcnutil.c| 136
+++-
>  src/ntf/ntfimcnd/ntfimcn_main.c |  18 --
>  2 files changed, 8 insertions(+), 146 deletions(-)
> 
> diff --git a/src/ntf/ntfd/ntfs_imcnutil.c b/src/ntf/ntfd/ntfs_imcnutil.c
> index 00c2c0039..8e9db29ea 100644
> --- a/src/ntf/ntfd/ntfs_imcnutil.c
> +++ b/src/ntf/ntfd/ntfs_imcnutil.c
> @@ -55,27 +55,19 @@ pthread_mutex_t ntfimcn_mutex;
>   * Kill the osafntfimcn child process using previously saved Pid
>   * Error handled
>   *
> - * @param signal A valid signal
> - * @return -1 on error (kill failed
>   */
> -static void kill_imcnproc(int signal)
> +static void kill_imcnproc()
>  {
> - int rc = 0;
> -
> - rc = kill(ipar.pid, signal);
> + LOG_NO("%s: Terminating osafntfimcnd (%d) process",
> +__FUNCTION__, ipar.pid);
> + int rc = kill(ipar.pid, SIGKILL);
>   if (rc == -1) {
>   if (errno == EPERM) {
> - /* We don't have permission to kill the process! */
> + // We don't have permission to kill the process!
>   LOG_ER(
>   "Child process osafntfimcn could not be killed:
%s",
>   strerror(errno));
> - } else if (errno == ESRCH) {
> - /* The process was not found */
> - LOG_NO(
> - "Child process osafntfimcn could not be killed:
%s",
> - strerror(errno));
> - } else {
> - /* EINVAL Invalid or unsupported signal number */
> + } else if (errno != ESRCH) {
[Vu] Use "else {" instead (without if) in order to cover all error cases?
>   LOG_ER("%s Fatal error when killing %s",
> __FUNCTION__,
>  strerror(errno));
>   osaf_abort(0);
> @@ -83,115 +75,6 @@ static void kill_imcnproc(int signal)
>   }
>  }
> 
> -/**
> - * Timed out wait for the imcnproc to terminate.
> - *
> - * @return -1 If wait timed out
> - */
> -static int wait_imcnproc_termination(void)
> -{
> - struct timespec wait_poll;
> - pid_t rc_pid = (pid_t)0;
> - int retry_cnt = 0;
> - const int max_retry = 100;
> - const uint64_t wait_poll_ms = 100;
> - int status = 0;
> - int rc = 0;
> -
> - TRACE_ENTER();
> - osaf_millis_to_timespec(wait_poll_ms, _poll);
> -
> - /* Wait for child process to terminate */
> - retry_cnt = 0;
> - do {
> - rc_pid = waitpid(ipar.pid, , WNOHANG);
> - if (rc_pid == -1) {
> - if (errno == EINTR) {
> - TRACE("\tGot EINTR continue waiting");
> - rc_pid = (pid_t)0;
> - } else if (errno == ECHILD) {
> - TRACE("\t Done, the process is terminated");
> - rc = 0;
> - break;
> - } else {
> - /* EINVAL Invalid option */
> - LOG_ER("%s Internal error %s", __FUNCTION__,
> -strerror(errno));
> - osaf_abort(0);
> - }
> - }
> -
> - /* No status available. Wait and try again */
> - retry_cnt++;
> - if (retry_cnt <= max_retry) {
> - osaf_nanosleep(_poll);
> - } else {
> - TRACE("\tTermination timeout");
> - rc = -1;
> - break;
> - }
> - } while (rc_pid == 0);
> -
> - TRACE_LEAVE2("rc = %d, retry_cnt = %d", rc, retry_cnt);
> - return rc;
> -}
> -
> -/**
> - * Wait for imcn process to exit.
> - *  1.Send SIGTERM to the imcn process
> - *  - Wait for process to terminate
> - *~ Prevents "zombie" process
> - *~ Guaranty that process is terminated before return
> - *If the process does not terminate after a timeout the
> - *termination escalates with the following steps:
> - *  2. Abort signal is sent to imcn process
> - * Wait for termination. If fail do next step.
> - *  3. Send SIGKILL
> - * Wait for termination. If no termination is detected, ignore and
> - * continue NTF. Something 

Re: [devel] [PATCH 1/1] imm: inform status to NID before exit during start-up phrase [#2845]

2018-05-09 Thread Hans Nordebäck
Hi Vu,

I'll revise my comment a bit, before sending nid_notify, the fifo monitoring is 
not started. So removing the exit should not be necessary, good if 
you can test this.

/Thanks HansN

-Original Message-
From: Hans Nordebäck 
Sent: den 8 maj 2018 15:06
To: 'Vu Minh Nguyen' ; ravisekhar.ko...@oracle.com; 
Anders Widell ; Lennart Lund 

Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: RE: [PATCH 1/1] imm: inform status to NID before exit during start-up 
phrase [#2845]

Hi Vu,

Ack review only with one comment. If the exit() is called after 
immnd_ackToNid() the fifo monitoring in nodeinit.cc will be activated.
I think you should remove the exit().
/Thanks HansN

-Original Message-
From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] 
Sent: den 3 maj 2018 12:20
To: ravisekhar.ko...@oracle.com; Hans Nordebäck ; 
Anders Widell ; Lennart Lund 

Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: [PATCH 1/1] imm: inform status to NID before exit during start-up 
phrase [#2845]

During node starts up phrase, which AMFD has not been come up, there is a case 
IMMND exit without informing failure result to NID (refer to the ticket to see 
syslog). As the result, IMMND may not be respawned by NID process.

This patch ensures that NID is informed before exit.
---
 src/imm/immnd/immnd_evt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/imm/immnd/immnd_evt.c b/src/imm/immnd/immnd_evt.c index 
8f3af92..2b9123d 100644
--- a/src/imm/immnd/immnd_evt.c
+++ b/src/imm/immnd/immnd_evt.c
@@ -10779,6 +10779,7 @@ static uint32_t immnd_evt_proc_fevs_rcv(IMMND_CB *cb, 
IMMND_EVT *evt,
LOG_ER(
"MESSAGE:%llu OUT OF ORDER my highest 
processed:%llu - exiting",
msgNo, cb->highestProcessed);
+   immnd_ackToNid(NCSCC_RC_FAILURE);
exit(1);
} else if (cb
   ->mSync) { /* If we receive out of sync message
--
1.9.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] ntf: remove sending SIGTERM and SIGABRT when killing osafntfimcnd [#2851]

2018-05-09 Thread Canh Van Truong
The handler of SIGTERM is just _Exit,  so using SIGTERM/SIGABRT is not needed.
The patch changes to send SIGKILL when terminating osafntfimcnd process.
---
 src/ntf/ntfd/ntfs_imcnutil.c| 136 +++-
 src/ntf/ntfimcnd/ntfimcn_main.c |  18 --
 2 files changed, 8 insertions(+), 146 deletions(-)

diff --git a/src/ntf/ntfd/ntfs_imcnutil.c b/src/ntf/ntfd/ntfs_imcnutil.c
index 00c2c0039..8e9db29ea 100644
--- a/src/ntf/ntfd/ntfs_imcnutil.c
+++ b/src/ntf/ntfd/ntfs_imcnutil.c
@@ -55,27 +55,19 @@ pthread_mutex_t ntfimcn_mutex;
  * Kill the osafntfimcn child process using previously saved Pid
  * Error handled
  *
- * @param signal A valid signal
- * @return -1 on error (kill failed
  */
-static void kill_imcnproc(int signal)
+static void kill_imcnproc()
 {
-   int rc = 0;
-
-   rc = kill(ipar.pid, signal);
+   LOG_NO("%s: Terminating osafntfimcnd (%d) process",
+  __FUNCTION__, ipar.pid);
+   int rc = kill(ipar.pid, SIGKILL);
if (rc == -1) {
if (errno == EPERM) {
-   /* We don't have permission to kill the process! */
+   // We don't have permission to kill the process!
LOG_ER(
"Child process osafntfimcn could not be killed: %s",
strerror(errno));
-   } else if (errno == ESRCH) {
-   /* The process was not found */
-   LOG_NO(
-   "Child process osafntfimcn could not be killed: %s",
-   strerror(errno));
-   } else {
-   /* EINVAL Invalid or unsupported signal number */
+   } else if (errno != ESRCH) {
LOG_ER("%s Fatal error when killing %s", __FUNCTION__,
   strerror(errno));
osaf_abort(0);
@@ -83,115 +75,6 @@ static void kill_imcnproc(int signal)
}
 }
 
-/**
- * Timed out wait for the imcnproc to terminate.
- *
- * @return -1 If wait timed out
- */
-static int wait_imcnproc_termination(void)
-{
-   struct timespec wait_poll;
-   pid_t rc_pid = (pid_t)0;
-   int retry_cnt = 0;
-   const int max_retry = 100;
-   const uint64_t wait_poll_ms = 100;
-   int status = 0;
-   int rc = 0;
-
-   TRACE_ENTER();
-   osaf_millis_to_timespec(wait_poll_ms, _poll);
-
-   /* Wait for child process to terminate */
-   retry_cnt = 0;
-   do {
-   rc_pid = waitpid(ipar.pid, , WNOHANG);
-   if (rc_pid == -1) {
-   if (errno == EINTR) {
-   TRACE("\tGot EINTR continue waiting");
-   rc_pid = (pid_t)0;
-   } else if (errno == ECHILD) {
-   TRACE("\t Done, the process is terminated");
-   rc = 0;
-   break;
-   } else {
-   /* EINVAL Invalid option */
-   LOG_ER("%s Internal error %s", __FUNCTION__,
-  strerror(errno));
-   osaf_abort(0);
-   }
-   }
-
-   /* No status available. Wait and try again */
-   retry_cnt++;
-   if (retry_cnt <= max_retry) {
-   osaf_nanosleep(_poll);
-   } else {
-   TRACE("\tTermination timeout");
-   rc = -1;
-   break;
-   }
-   } while (rc_pid == 0);
-
-   TRACE_LEAVE2("rc = %d, retry_cnt = %d", rc, retry_cnt);
-   return rc;
-}
-
-/**
- * Wait for imcn process to exit.
- *  1.Send SIGTERM to the imcn process
- *  - Wait for process to terminate
- *~ Prevents "zombie" process
- *~ Guaranty that process is terminated before return
- *If the process does not terminate after a timeout the
- *termination escalates with the following steps:
- *  2. Abort signal is sent to imcn process
- * Wait for termination. If fail do next step.
- *  3. Send SIGKILL
- * Wait for termination. If no termination is detected, ignore and
- * continue NTF. Something is wrong and we may end up with a 'zombie'
- * process but there is nothing more to do except aborting NTF causing a
- * node restart.
- *
- * Also handle the case that the process already is killed before
- * this function is called.
- *
- */
-static void timedwait_imcn_exit(void)
-{
-   int rc = 0;
-   TRACE_ENTER();
-
-   /* Send SIGTERM for normal termination */
-   kill_imcnproc(SIGTERM);
-   rc = wait_imcnproc_termination();
-   if (rc == 0) {
-   TRACE("\tImcn successfully terminated");
-   goto done;
-   }
-
-   /* Normal termination has timed out. Escalate to step 2 */
-   

[devel] [PATCH 0/1] Review Request for ntf: remove sending SIGTERM and SIGABRT when killing osafntfimcnd [#2851]

2018-05-09 Thread Canh Van Truong
Summary: ntf: remove sending SIGTERM and SIGABRT when killing osafntfimcnd 
[#2851]
Review request for Ticket(s): 2851
Peer Reviewer(s): Lennart, Minh, Vu 
Pull request to: Minh
Affected branch(es): develop, release
Development branch: ticket-2851
Base revision: c44d5c9f076bdfbc9bd5fded69bcbb30e65d0f14
Personal repository: git://git.code.sf.net/u/canht32/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 54579eaa3f6cd05492e0f13f7e3c7835168cc81f
Author: Canh Van Truong 
Date:   Wed, 9 May 2018 15:16:25 +0700

ntf: remove sending SIGTERM and SIGABRT when killing osafntfimcnd [#2851]

The handler of SIGTERM is just _Exit,  so using SIGTERM/SIGABRT is not needed.
The patch changes to send SIGKILL when terminating osafntfimcnd process.



Complete diffstat:
--
 src/ntf/ntfd/ntfs_imcnutil.c| 136 +++-
 src/ntf/ntfimcnd/ntfimcn_main.c |  18 --
 2 files changed, 8 insertions(+), 146 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
Ack from reviewers


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] plm: don't instantiate child EEs twice when unlocking parent EE [#2846]

2018-05-09 Thread Ravi Sekhar Reddy Konda
Hi Alex,

Ack for the patch, code  review only

Regards,
Ravi

-Original Message-
From: Alex Jones [mailto:ajo...@rbbn.com] 
Sent: Thursday, May 03, 2018 9:08 PM
To: mathi.np@gmail.com; Ravi Sekhar Reddy Konda 

Cc: opensaf-devel@lists.sourceforge.net; Alex Jones 
Subject: [PATCH 1/1] plm: don't instantiate child EEs twice when unlocking 
parent EE [#2846]

Child EEs (VMs) can fail to boot up when unlocking the parent EE.

The current code resets the VM when unlocking the parent EE. This is done in 
plms_move_chld_ent_to_insvc(). Later in the unlock function, the child EEs are 
reset again. libvirt does not like these resets being done in less than 1 
second, and often will not boot the VM.

Don't reset the child EEs twice when unlocking the parent EE.
---
 src/plm/plmd/plms_adm_fsm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/plm/plmd/plms_adm_fsm.c b/src/plm/plmd/plms_adm_fsm.c index 
8f5725cd8..a29dc28e0 100644
--- a/src/plm/plmd/plms_adm_fsm.c
+++ b/src/plm/plmd/plms_adm_fsm.c
@@ -4520,7 +4520,10 @@ static SaUint32T plms_ent_unlock(PLMS_ENTITY *ent, 
PLMS_TRACK_INFO *trk_info,
 
if ((PLMS_EE_ENTITY == head->plm_entity->entity_type) &&
(!plms_rdness_flag_is_set(head->plm_entity,
- SA_PLM_RF_DEPENDENCY))) {
+ SA_PLM_RF_DEPENDENCY)) &&
+   /* child EEs have already been instantiated above */
+   head->plm_entity->parent->entity_type !=
+   PLMS_EE_ENTITY) {
ret_err = plms_ee_instantiate(head->plm_entity,
  false, true);
if (NCSCC_RC_SUCCESS != ret_err) {
--
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] imm: fix PBE terminated when adding data with duplicated values [#2422]

2018-05-09 Thread Vu Minh Nguyen
Hi Danh,

Ack from me.

Thanks, Vu.

> -Original Message-
> From: Danh Vo 
> Sent: Tuesday, May 8, 2018 4:35 PM
> To: ravisekhar.ko...@oracle.com; hans.nordeb...@ericsson.com;
> zoran.milinko...@ericsson.com; anders.wid...@ericsson.com;
> lennart.l...@ericsson.com; vu.m.ngu...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net; Danh Vo
> 
> Subject: [PATCH 1/1] imm: fix PBE terminated when adding data with
> duplicated values [#2422]
> 
> When adding value set which has duplicated values into non-pure
> runtime attribute (cached or persistent), the first loop (doIt=0) does
> not validate in the correct way. It tries to detect duplicated
> values between current values and provided values without updating
> current values. So the current values remain the old values and
> validation just checks on that value set. Duplicated values cannot be
> dectected in provided values by current values. As a result, err is
> still SA_AIS_OK even though provided values are duplicated in the first
> loop.
> 
> This fix also increases performance for the previous fix. The validation
> should be performed in first loop (doIt=0) instead of both loops.
> ---
>  src/imm/immnd/ImmModel.cc | 42 +++-
> --
>  1 file changed, 27 insertions(+), 15 deletions(-)
> 
> diff --git a/src/imm/immnd/ImmModel.cc b/src/imm/immnd/ImmModel.cc
> index 156c308..80ba6ef 100644
> --- a/src/imm/immnd/ImmModel.cc
> +++ b/src/imm/immnd/ImmModel.cc
> @@ -18119,7 +18119,7 @@ SaAisErrorT ImmModel::rtObjectUpdate(
>  eduAtValToOs(, &(p->attrValue.attrValue),
>  (SaImmValueTypeT)p->attrValue.attrValueType);
> 
> -if ((attr->mFlags & SA_IMM_ATTR_NO_DUPLICATES) &&
> +if (!doIt && (attr->mFlags & SA_IMM_ATTR_NO_DUPLICATES) &&
>  (multiattr->hasMatchingValue(tmpos))) {
>LOG_NO(
>"ERR_INVALID_PARAM: multivalued attr '%s' with "
> @@ -18127,9 +18127,7 @@ SaAisErrorT ImmModel::rtObjectUpdate(
>"call. Object:'%s'.", attrName.c_str(),
objectName.c_str());
>err = SA_AIS_ERR_INVALID_PARAM;
>break;  // out of for switch
> -}
> -
> -if (doIt) {
> +} else if (doIt){
>multiattr->setExtraValue(tmpos);
>  }
>}
> @@ -18147,22 +18145,36 @@ SaAisErrorT ImmModel::rtObjectUpdate(
> 
>  osafassert(attrValue->isMultiValued());
>  ImmAttrMultiValue* multiattr = (ImmAttrMultiValue*)attrValue;
> -IMMSV_EDU_ATTR_VAL_LIST* al = p->attrValue.attrMoreValues;
> 
> +// Note: tmpMultiVal is used for validation purpose. It is
only
> +// valid when doIt = 0. It holds the first value which does
not
> +// exist in p->attrValue.attrMoreValues.
> +ImmAttrMultiValue tmpMultiVal;
> +if (!doIt && (attr->mFlags & SA_IMM_ATTR_NO_DUPLICATES)) {
> +  eduAtValToOs(, &(p->attrValue.attrValue),
> +
(SaImmValueTypeT)p->attrValue.attrValueType);
> +  tmpMultiVal = *multiattr;
> +  tmpMultiVal.setExtraValue(tmpos);
> +}
> +
> +IMMSV_EDU_ATTR_VAL_LIST* al = p->attrValue.attrMoreValues;
>  while (al) {
>eduAtValToOs(, &(al->n),
>(SaImmValueTypeT)p->attrValue.attrValueType);
> -  if ((attr->mFlags & SA_IMM_ATTR_NO_DUPLICATES) &&
> -  (multiattr->hasMatchingValue(tmpos))) {
> -LOG_NO(
> -"ERR_INVALID_PARAM: multivalued attr '%s' with "
> -"NO_DUPLICATES yet duplicate values provided in
rta-update "
> -"call. Object:'%s'.", attrName.c_str(),
objectName.c_str());
> -err = SA_AIS_ERR_INVALID_PARAM;
> -break;  // out of loop
> -  }
> 
> -  if (doIt) {
> +  if (!doIt && (attr->mFlags & SA_IMM_ATTR_NO_DUPLICATES)) {
> +if (tmpMultiVal.hasMatchingValue(tmpos)) {
> +  LOG_NO(
> +  "ERR_INVALID_PARAM: multivalued attr '%s' with "
> +  "NO_DUPLICATES yet duplicate values provided in "
> +  "rta-update call. Object:'%s'.",
> +  attrName.c_str(), objectName.c_str());
> +  err = SA_AIS_ERR_INVALID_PARAM;
> +  break;  // out of loop
> +} else {
> +  tmpMultiVal.setExtraValue(tmpos);
> +}
> +  } else if (doIt) {
>  multiattr->setExtraValue(tmpos);
>}
> 
> --
> 2.7.4



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot