Re: [devel] [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process takes too long time before exec (#514)

2013-07-29 Thread Nagendra Kumar
Hi Hans N,

>>1. OPENSAF_CHILD_EXEC_TIME_TOLERANCE is the name of a new environment 
>>variable where value is used as input to alarm,  if not set it is default 2 
>>seconds.
Do we have some place holder for this variable for configuration and are we 
going to add it in README for information.

>> if the child  "hangs" before exec this extra coredump should give 
>> information  where/what is wrong.
This means that fork hangs, am I right ? If yes, then dump is not going to 
provide any information as it is a system call, it can only show, it hangs in 
fork.

>> After exec, it will work as usual
This confirms that we are only targeting fork to debug.

Thanks
-Nagu

-Original Message-
From: Hans Nordebäck [mailto:hans.nordeb...@ericsson.com] 
Sent: 30 July 2013 11:57
To: Nagendra Kumar
Cc: opensaf-devel@lists.sourceforge.net; Praveen Malviya; Ramesh Babu Betham; 
Hans Feldt; Hans Nordebäck
Subject: RE: [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process 
takes too long time before exec (#514)

Hi Nagu,

1. OPENSAF_CHILD_EXEC_TIME_TOLERANCE is the name of a new environment variable 
where value is used as input to alarm,  if not set it is default 2 seconds.
2. Yes you are right, in this particular case it is set to 10 sec, that's why 
the env. variable above can be set.
3. This alarm is just an additional precaution, at no extra cost,  to check the 
child part before the exec.  After exec
 it will work as usual but if the child  "hangs" before exec this extra 
coredump should give information  where/what is wrong.

/BR HansN

-Original Message-
From: Nagendra Kumar [mailto:nagendr...@oracle.com] 
Sent: den 30 juli 2013 07:11
To: Hans Nordebäck; Praveen Malviya; Hans Feldt; Ramesh Babu Betham
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process 
takes too long time before exec (#514)

Hi Hans N,
For my understanding, can you please provide the below 
information:

1.  I can't find OPENSAF_CHILD_EXEC_TIME_TOLERANCE in opensaf source code.
2.  I hope the child process is hung for more than saAmfCtDefClcCliTimeout 
resulting in CLC time out. Am I right?
3.  Even we add assert in child process and we get core dump, but it may 
not give any information as it got delayed because of 
system issue. Are we targeting, which system call the child process is 
hung?

Thanks
-Nagu

-Original Message-
From: Hans Nordeback [mailto:hans.nordeb...@ericsson.com] 
Sent: 22 July 2013 17:07
To: Nagendra Kumar; Praveen Malviya; hans.fe...@ericsson.com; Ramesh Babu Betham
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process takes 
too long time before exec (#514)

 osaf/libs/core/leap/os_defs.c |  27 +++
 1 files changed, 27 insertions(+), 0 deletions(-)


amfnd calls ncs_os_process_execute_timed and the child process takes too long 
time before exec, (10 sec timeout). An alarm is set in the 
ncs_os_process_execute_timed child process. If timed out a core dump will be 
produced to be able to trouble shoot.

diff --git a/osaf/libs/core/leap/os_defs.c b/osaf/libs/core/leap/os_defs.c
--- a/osaf/libs/core/leap/os_defs.c
+++ b/osaf/libs/core/leap/os_defs.c
@@ -65,6 +65,15 @@ bool gl_ncs_atomic_mtx_initialise = fals
  * description of SOCK_CLOEXEC. */
 static pthread_mutex_t s_cloexec_mutex = PTHREAD_MUTEX_INITIALIZER;
 
+/*
+ * ALRM signal is used to detect if child process takes too long time before 
exec.
+ * 
+ * @param sig
+ */
+static void sigalrm_handler(int sig)
+{
+   abort();
+}
 /***
  *
  * uns64
@@ -999,6 +1008,22 @@ uint32_t ncs_os_process_execute_timed(NC
osaf_mutex_lock_ordie(&s_cloexec_mutex);
 
if ((pid = fork()) == 0) {
+unsigned int alarm_time_sec;
+char* alarm_time;
+
+if (signal(SIGALRM, sigalrm_handler) == SIG_ERR) {
+LOG_ER("signal ALRM failed: %s", strerror(errno));
+}
+if ((alarm_time = getenv("OPENSAF_CHILD_EXEC_TIME_TOLERANCE")) 
!= NULL) {
+alarm_time_sec = strtol(alarm_time, NULL, 0);
+}
+else {
+// default alarm timeout 2 seconds
+alarm_time_sec = 2;
+}
+
+alarm(alarm_time_sec);
+
/*
 ** Make sure forked processes have default scheduling class
 ** independent of the callers scheduling class.
@@ -1054,6 +1079,8 @@ uint32_t ncs_os_process_execute_timed(NC
}
 #endif
 
+alarm(0);
+
/* child part */
if (execvp(req->i_script, req->i_argv) == -1) {
syslog(LOG_ERR, "%s: execvp '%s' faile

Re: [devel] [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process takes too long time before exec (#514)

2013-07-29 Thread Hans Nordebäck
Hi Nagu,

1. OPENSAF_CHILD_EXEC_TIME_TOLERANCE is the name of a new environment variable 
where value is used as input to alarm,  if not set it is default 2 seconds.
2. Yes you are right, in this particular case it is set to 10 sec, that's why 
the env. variable above can be set.
3. This alarm is just an additional precaution, at no extra cost,  to check the 
child part before the exec.  After exec
 it will work as usual but if the child  "hangs" before exec this extra 
coredump should give information  where/what is wrong.

/BR HansN

-Original Message-
From: Nagendra Kumar [mailto:nagendr...@oracle.com] 
Sent: den 30 juli 2013 07:11
To: Hans Nordebäck; Praveen Malviya; Hans Feldt; Ramesh Babu Betham
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process 
takes too long time before exec (#514)

Hi Hans N,
For my understanding, can you please provide the below 
information:

1.  I can't find OPENSAF_CHILD_EXEC_TIME_TOLERANCE in opensaf source code.
2.  I hope the child process is hung for more than saAmfCtDefClcCliTimeout 
resulting in CLC time out. Am I right?
3.  Even we add assert in child process and we get core dump, but it may 
not give any information as it got delayed because of 
system issue. Are we targeting, which system call the child process is 
hung?

Thanks
-Nagu

-Original Message-
From: Hans Nordeback [mailto:hans.nordeb...@ericsson.com] 
Sent: 22 July 2013 17:07
To: Nagendra Kumar; Praveen Malviya; hans.fe...@ericsson.com; Ramesh Babu Betham
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process takes 
too long time before exec (#514)

 osaf/libs/core/leap/os_defs.c |  27 +++
 1 files changed, 27 insertions(+), 0 deletions(-)


amfnd calls ncs_os_process_execute_timed and the child process takes too long 
time before exec, (10 sec timeout). An alarm is set in the 
ncs_os_process_execute_timed child process. If timed out a core dump will be 
produced to be able to trouble shoot.

diff --git a/osaf/libs/core/leap/os_defs.c b/osaf/libs/core/leap/os_defs.c
--- a/osaf/libs/core/leap/os_defs.c
+++ b/osaf/libs/core/leap/os_defs.c
@@ -65,6 +65,15 @@ bool gl_ncs_atomic_mtx_initialise = fals
  * description of SOCK_CLOEXEC. */
 static pthread_mutex_t s_cloexec_mutex = PTHREAD_MUTEX_INITIALIZER;
 
+/*
+ * ALRM signal is used to detect if child process takes too long time before 
exec.
+ * 
+ * @param sig
+ */
+static void sigalrm_handler(int sig)
+{
+   abort();
+}
 /***
  *
  * uns64
@@ -999,6 +1008,22 @@ uint32_t ncs_os_process_execute_timed(NC
osaf_mutex_lock_ordie(&s_cloexec_mutex);
 
if ((pid = fork()) == 0) {
+unsigned int alarm_time_sec;
+char* alarm_time;
+
+if (signal(SIGALRM, sigalrm_handler) == SIG_ERR) {
+LOG_ER("signal ALRM failed: %s", strerror(errno));
+}
+if ((alarm_time = getenv("OPENSAF_CHILD_EXEC_TIME_TOLERANCE")) 
!= NULL) {
+alarm_time_sec = strtol(alarm_time, NULL, 0);
+}
+else {
+// default alarm timeout 2 seconds
+alarm_time_sec = 2;
+}
+
+alarm(alarm_time_sec);
+
/*
 ** Make sure forked processes have default scheduling class
 ** independent of the callers scheduling class.
@@ -1054,6 +1079,8 @@ uint32_t ncs_os_process_execute_timed(NC
}
 #endif
 
+alarm(0);
+
/* child part */
if (execvp(req->i_script, req->i_argv) == -1) {
syslog(LOG_ERR, "%s: execvp '%s' failed - %s", 
__FUNCTION__, req->i_script, strerror(errno));

--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process takes too long time before exec (#514)

2013-07-29 Thread Nagendra Kumar
Hi Hans N,
For my understanding, can you please provide the below 
information:

1.  I can't find OPENSAF_CHILD_EXEC_TIME_TOLERANCE in opensaf source code.
2.  I hope the child process is hung for more than saAmfCtDefClcCliTimeout 
resulting in CLC time out. Am I right?
3.  Even we add assert in child process and we get core dump, but it may 
not give any information as it got delayed because of 
system issue. Are we targeting, which system call the child process is 
hung?

Thanks
-Nagu

-Original Message-
From: Hans Nordeback [mailto:hans.nordeb...@ericsson.com] 
Sent: 22 July 2013 17:07
To: Nagendra Kumar; Praveen Malviya; hans.fe...@ericsson.com; Ramesh Babu Betham
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process takes 
too long time before exec (#514)

 osaf/libs/core/leap/os_defs.c |  27 +++
 1 files changed, 27 insertions(+), 0 deletions(-)


amfnd calls ncs_os_process_execute_timed and the child process takes too long 
time before exec, (10 sec timeout). An alarm is set in the 
ncs_os_process_execute_timed child process. If timed out a core dump will be 
produced to be able to trouble shoot.

diff --git a/osaf/libs/core/leap/os_defs.c b/osaf/libs/core/leap/os_defs.c
--- a/osaf/libs/core/leap/os_defs.c
+++ b/osaf/libs/core/leap/os_defs.c
@@ -65,6 +65,15 @@ bool gl_ncs_atomic_mtx_initialise = fals
  * description of SOCK_CLOEXEC. */
 static pthread_mutex_t s_cloexec_mutex = PTHREAD_MUTEX_INITIALIZER;
 
+/*
+ * ALRM signal is used to detect if child process takes too long time before 
exec.
+ * 
+ * @param sig
+ */
+static void sigalrm_handler(int sig)
+{
+   abort();
+}
 /***
  *
  * uns64
@@ -999,6 +1008,22 @@ uint32_t ncs_os_process_execute_timed(NC
osaf_mutex_lock_ordie(&s_cloexec_mutex);
 
if ((pid = fork()) == 0) {
+unsigned int alarm_time_sec;
+char* alarm_time;
+
+if (signal(SIGALRM, sigalrm_handler) == SIG_ERR) {
+LOG_ER("signal ALRM failed: %s", strerror(errno));
+}
+if ((alarm_time = getenv("OPENSAF_CHILD_EXEC_TIME_TOLERANCE")) 
!= NULL) {
+alarm_time_sec = strtol(alarm_time, NULL, 0);
+}
+else {
+// default alarm timeout 2 seconds
+alarm_time_sec = 2;
+}
+
+alarm(alarm_time_sec);
+
/*
 ** Make sure forked processes have default scheduling class
 ** independent of the callers scheduling class.
@@ -1054,6 +1079,8 @@ uint32_t ncs_os_process_execute_timed(NC
}
 #endif
 
+alarm(0);
+
/* child part */
if (execvp(req->i_script, req->i_argv) == -1) {
syslog(LOG_ERR, "%s: execvp '%s' failed - %s", 
__FUNCTION__, req->i_script, strerror(errno));

--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] AMF static code analysis regression in 4.2.4 and 4.3.1

2013-07-29 Thread Nagendra Kumar
Sorry, the diagrams are lost. Adding text instead of diagrams:

-Original Message-
From: Nagendra Kumar 
Sent: 30 July 2013 10:15
To: Anders Widell; opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] AMF static code analysis regression in 4.2.4 and 4.3.1

Hi,

 

Just briefing it so that we are all on the same page:

 

Let us say, SU1 has three SIs(SI1, SI2 and SI3) assigned and i_su->list_of_susi 
pointer points to 

i_su->list_of_susi -> SUSI1->SUSI2->SUSI3->NULL
 

    while (i_su->list_of_susi != AVD_SU_SI_REL_NULL) {

 

    /* free all the CSI assignments  */

    avd_compcsi_delete(cb, i_su->list_of_susi, false);

    /* Unassign the SUSI */

    m_AVD_SU_SI_TRG_DEL(cb, i_su->list_of_susi);

    }

 

First iteration of above code : 

1.   avd_compcsi_delete deletes comp csi of SUSI1 from list_of_csicomp.

2.   m_AVD_SU_SI_TRG_DEL calls avd_susi_delete(SUSI1). After below line of 
execution, p_su_si is null and i_su_si points to SUSI1.

 

    /* check the SU list to get the prev pointer */

    i_su_si = susi->su->list_of_susi;

    p_su_si = NULL;

    while ((i_su_si != NULL) && (i_su_si != susi)) {

    p_su_si = i_su_si;

    i_su_si = i_su_si->su_next;

    }

 

3.   Now, the below lines of code executes. 

/* now delete it from the SU list */

    if (p_su_si == NULL) {

    susi->su->list_of_susi = susi->su_next;

    susi->su_next = NULL;

    } else {

    p_su_si->su_next = susi->su_next;

    susi->su_next = NULL;

    }

 

After this line of execution, i_su->list_of_susi points to SUSI2. And 
SUSI1->next is NULL now (Earlier SUSI1->next was set to SUSI2).

 

4.   After below lines of execution, SUSI1->si is null and SUSI1->su is 
null.

    susi->si = NULL;

    susi->su = NULL;

5.   The below line free SUSI1.

    free(susi);

6.   At this state the link list is as below:


i_su->list_of_susi ->SUSI2->SUSI3-> NULL
 

Next time, when while loop executes, SUSI2 will be deleted and after third 
iteration, SUSI3 will be deleted and i_su->list_of_susi will be null and while 
loop will exit.


Let me know if any further clarification is needed.
 

Thanks

-Nagu

 

-Original Message-
From: Anders Widell [mailto:anders.wid...@ericsson.com] 
Sent: 29 July 2013 20:16
To: opensaf-devel@lists.sourceforge.net
Cc: Nagendra Kumar; Hans Feldt; praveen malviya; UABHANO
Subject: Re: [devel] AMF static code analysis regression in 4.2.4 and 4.3.1

 

I assume the side effect that modifies the variable i_su->list_of_susi in the 
loop conditional happens at the following lines in the code you sent:

 

 /* now delete it from the SU list */

 if (p_su_si == NULL) {

 susi->su->list_of_susi = susi->su_next;

 

This only happens when p_su_si is NULL. What happens if p_su_si is not NULL? 
Will i_su->list_of_susi have the same value also in the next iteration? 
free(susi) is executed unconditionally at the end of the

avd_susi_delete() function, though there are a couple of return statements in 
some branches in the code above it.

 

regards,

Anders Widell

 

The code is

2013-07-29 14:14, Anders Widell skrev:

> Thanks for your analysis. I still don't understand the code, but if 

> you think this warning is a false positive I take your word for it. 

> Some additional info from the warning:

> 

> The free happens at line 504 in file avd_siass.c:

>   free(susi);

> 

> The dereference happens at line 1070 in file avd_csi.c:

>   while (susi->list_of_csicomp != NULL) {

> 

> regards,

> Anders Widell

> 

> 2013-07-29 13:50, Nagendra Kumar skrev:

>> Hi,

>>   i_su->list_of_susi has list of susi. m_AVD_SU_SI_TRG_DEL calls 
>> avd_susi_delete, which in tern separate susi from i_su->list_of_susi and 
>> deletes it.

>> 

>> Code is below:

>>   /* now delete it from the SU list */

>>   if (p_su_si == NULL) {

>>   susi->su->list_of_susi = susi->su_next;

>>   susi->su_next = NULL;

>>   } else {

>>   p_su_si->su_next = susi->su_next;

>>   susi->su_next = NULL;

>>   }

>> 

>>   /* now delete it from the SI list */

>>   if (p_si_su == NULL) {

>>   susi->si->list_of_sisu = susi->si_next;

>>   susi->si_next = NULL;

>>   } else {

>>   p_si_su->si_next = susi->si_next;

>>   susi->si_next = NULL;

>>   }

>> 

>> And then deletes it. This means  one susi is deleted from from 
>> i_su->list_of_susi. When last susi is deleted from i_su->list_of_susi, then 
>> i_su->list_of_susi becomes null and it exists from 'while loop'.

>> 

>> So, I see no problem.

>> 

>> Let me know if any furthe

Re: [devel] AMF static code analysis regression in 4.2.4 and 4.3.1

2013-07-29 Thread Nagendra Kumar
Hi,

 

Just briefing it so that we are all on the same page:

 

Let us say, SU1 has three SIs(SI1, SI2 and SI3) assigned and i_su->list_of_susi 
pointer points to 



 


i_su->list_of_susi à
      NULL

 

 

 

    while (i_su->list_of_susi != AVD_SU_SI_REL_NULL) {

 

    /* free all the CSI assignments  */

    avd_compcsi_delete(cb, i_su->list_of_susi, false);

    /* Unassign the SUSI */

    m_AVD_SU_SI_TRG_DEL(cb, i_su->list_of_susi);

    }

 

First iteration of above code : 

1.   avd_compcsi_delete deletes comp csi of SUSI1 from list_of_csicomp.

2.   m_AVD_SU_SI_TRG_DEL calls avd_susi_delete(SUSI1). After below line of 
execution, p_su_si is null and i_su_si points to SUSI1.

 

    /* check the SU list to get the prev pointer */

    i_su_si = susi->su->list_of_susi;

    p_su_si = NULL;

    while ((i_su_si != NULL) && (i_su_si != susi)) {

    p_su_si = i_su_si;

    i_su_si = i_su_si->su_next;

    }

 

3.   Now, the below lines of code executes. 

/* now delete it from the SU list */

    if (p_su_si == NULL) {

    susi->su->list_of_susi = susi->su_next;

    susi->su_next = NULL;

    } else {

    p_su_si->su_next = susi->su_next;

    susi->su_next = NULL;

    }

 

After this line of execution, i_su->list_of_susi points to SUSI2. And 
SUSI1->next is NULL now (Earlier SUSI1->next was set to SUSI2).

 

4.   After below lines of execution, SUSI1->si is null and SUSI1->su is 
null.

    susi->si = NULL;

    susi->su = NULL;

5.   The below line free SUSI1.

    free(susi);

6.   At this state the link list is as below:



 


i_su->list_of_susi à
   NULL

 

 

Next time, when while loop executes, SUSI2 will be deleted and after third 
iteration, SUSI3 will be deleted and i_su->list_of_susi will be null and while 
loop will exit.

 

Let me know if any further clarification is needed.

 

Thanks

-Nagu

 

-Original Message-
From: Anders Widell [mailto:anders.wid...@ericsson.com] 
Sent: 29 July 2013 20:16
To: opensaf-devel@lists.sourceforge.net
Cc: Nagendra Kumar; Hans Feldt; praveen malviya; UABHANO
Subject: Re: [devel] AMF static code analysis regression in 4.2.4 and 4.3.1

 

I assume the side effect that modifies the variable i_su->list_of_susi in the 
loop conditional happens at the following lines in the code you sent:

 

 /* now delete it from the SU list */

 if (p_su_si == NULL) {

 susi->su->list_of_susi = susi->su_next;

 

This only happens when p_su_si is NULL. What happens if p_su_si is not NULL? 
Will i_su->list_of_susi have the same value also in the next iteration? 
free(susi) is executed unconditionally at the end of the

avd_susi_delete() function, though there are a couple of return statements in 
some branches in the code above it.

 

regards,

Anders Widell

 

The code is

2013-07-29 14:14, Anders Widell skrev:

> Thanks for your analysis. I still don't understand the code, but if 

> you think this warning is a false positive I take your word for it. 

> Some additional info from the warning:

> 

> The free happens at line 504 in file avd_siass.c:

>   free(susi);

> 

> The dereference happens at line 1070 in file avd_csi.c:

>   while (susi->list_of_csicomp != NULL) {

> 

> regards,

> Anders Widell

> 

> 2013-07-29 13:50, Nagendra Kumar skrev:

>> Hi,

>>   i_su->list_of_susi has list of susi. m_AVD_SU_SI_TRG_DEL calls 
>> avd_susi_delete, which in tern separate susi from i_su->list_of_susi and 
>> deletes it.

>> 

>> Code is below:

>>   /* now delete it from the SU list */

>>   if (p_su_si == NULL) {

>>   susi->su->list_of_susi = susi->su_next;

>>   susi->su_next = NULL;

>>   } else {

>>   p_su_si->su_next = susi->su_next;

>>   susi->su_next = NULL;

>>   }

>> 

>>   /* now delete it from the SI list */

>>   if (p_si_su == NULL) {

>>   susi->si->list_of_sisu = susi->si_next;

>>   susi->si_next = NULL;

>>   } else {

>>   p_si_su->si_next = susi->si_next;

>>   susi->si_next = NULL;

>>   }

>> 

>> And then deletes it. This means  one susi is deleted from from 
>> i_su->list_of_susi. When last susi is deleted from i_su->list_of_susi, then 
>> i_su->list_of_susi becomes null and it exists from 'while loop'.

>> 

>> So, I see no problem.

>> 

>> Let me know if any further clarifications is required.

>> 

>> Th

Re: [devel] AMF static code analysis regression in 4.2.4 and 4.3.1

2013-07-29 Thread Anders Widell
I assume the side effect that modifies the variable i_su->list_of_susi 
in the loop conditional happens at the following lines in the code you sent:

 /* now delete it from the SU list */
 if (p_su_si == NULL) {
 susi->su->list_of_susi = susi->su_next;

This only happens when p_su_si is NULL. What happens if p_su_si is not 
NULL? Will i_su->list_of_susi have the same value also in the next 
iteration? free(susi) is executed unconditionally at the end of the 
avd_susi_delete() function, though there are a couple of return 
statements in some branches in the code above it.

regards,
Anders Widell

The code is
2013-07-29 14:14, Anders Widell skrev:
> Thanks for your analysis. I still don't understand the code, but if you
> think this warning is a false positive I take your word for it. Some
> additional info from the warning:
>
> The free happens at line 504 in file avd_siass.c:
>   free(susi);
>
> The dereference happens at line 1070 in file avd_csi.c:
>   while (susi->list_of_csicomp != NULL) {
>
> regards,
> Anders Widell
>
> 2013-07-29 13:50, Nagendra Kumar skrev:
>> Hi,
>>  i_su->list_of_susi has list of susi. m_AVD_SU_SI_TRG_DEL calls 
>> avd_susi_delete, which in tern separate susi from i_su->list_of_susi and 
>> deletes it.
>>
>> Code is below:
>>   /* now delete it from the SU list */
>>   if (p_su_si == NULL) {
>>   susi->su->list_of_susi = susi->su_next;
>>   susi->su_next = NULL;
>>   } else {
>>   p_su_si->su_next = susi->su_next;
>>   susi->su_next = NULL;
>>   }
>>
>>   /* now delete it from the SI list */
>>   if (p_si_su == NULL) {
>>   susi->si->list_of_sisu = susi->si_next;
>>   susi->si_next = NULL;
>>   } else {
>>   p_si_su->si_next = susi->si_next;
>>   susi->si_next = NULL;
>>   }
>>
>> And then deletes it. This means  one susi is deleted from from 
>> i_su->list_of_susi. When last susi is deleted from i_su->list_of_susi, then 
>> i_su->list_of_susi becomes null and it exists from 'while loop'.
>>
>> So, I see no problem.
>>
>> Let me know if any further clarifications is required.
>>
>> Thanks
>> -Nagu
>>
>> -Original Message-
>> From: Anders Widell [mailto:anders.wid...@ericsson.com]
>> Sent: 29 July 2013 16:12
>> To: opensaf-devel@lists.sourceforge.net
>> Cc: Nagendra Kumar; Hans Feldt; praveen malviya; UABHANO
>> Subject: AMF static code analysis regression in 4.2.4 and 4.3.1
>>
>> Hi!
>>
>> I ran some static code analysis on the release candidates for OpenSAF
>> 4.2.4 and 4.3.1. I got a few regressions towards 4.2.3 and 4.3.0, and I need 
>> your help to analyze the following in avd_sgproc.c. The warning says that 
>> i_su->list_of_susi is used after free(). It is freed by 
>> m_AVD_SU_SI_TRG_DEL(), and then dereferenced by avd_compcsi_delete() in the 
>> next iteration.
>>
>> When I look at the code, I don't understand it at all. Does this loop below 
>> terminate? The loop terminates when i_su->list_of_susi is NULL, but it is 
>> not modified within the loop body! If the loop terminates, it must be 
>> because i_su->list_of_susi is somehow modified as a side-effect of calling 
>> avd_compcsi_delete() or m_AVD_SU_SI_TRG_DEL(). This is a very ugly way 
>> coding!!!
>>
>> Line 1398 - 1408 in osaf/services/saf/avsv/avd/avd_sgproc.c on branch 
>> opensaf-4.3.x (tag 4.3.1RC1):
>> -
>>/* Free all the SU SI assignments for all the SIs on the
>> * the SU if there are any.
>> */
>>
>>while (i_su->list_of_susi != AVD_SU_SI_REL_NULL) {
>>
>>/* free all the CSI assignments  */
>>avd_compcsi_delete(cb, i_su->list_of_susi, false);
>>/* Unassign the SUSI */
>>m_AVD_SU_SI_TRG_DEL(cb, i_su->list_of_susi);
>>}
>> -
>>
>> regards,
>> Anders Widell
>>
>
> --
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> ___
> Opensaf-devel mailing list
> Opensaf-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>
>


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.ne

Re: [devel] #227 clmd asserts on active controller during node lock timeout

2013-07-29 Thread Mathivanan Naickan Palanivelu
Finally I could get hold of the traces! Turned out to be a simpler case than 
the initial analysis. Thanks surender for re-running and sharing the traces.

It's a simple case of 

-  Issue CLM Lock of a node (PL-5)

-  Make the PL-5 node non-member

-  Lock callback timesout and the nodeentry is not found(whichis fine) 
and the abort gets hit.

 

While the root cause is of an incorrectly placed abort, the fix is to lookup 
based on name than on id because the node with that id has gone down and is not 
relevant any more.

 

Cheers,

Mathi.

 

From: surender khetavath [mailto:surend...@users.sf.net] 
Sent: Friday, July 05, 2013 4:36 PM
To: [opensaf:tickets] 
Subject: [opensaf:tickets] #227 clmd asserts on active controller during node 
lock timeout

 

The issue is always reproducible. 
Test:

A campaign is modeled to include PL-5 and an SU on this node. For this the 
script '/usr/share/opensaf/immxml/immxml-modify-config' is being used. While 
doing rollback clm crash is observed. It is seen that the campaign is doing a 
lock/lock-in op on PL-5 and simultaneously the script immxml-modify-config is 
also trying to perform admin lock i.e the lines below if commented in 
immxml-modify-config, then the rollback goes fine. if enabled then clm crashes. 

 PLMNODE=`cat $CURRENT_NODECFG | grep ".. $node " | awk '{ print $ 3 }'`
 trace "PLMNODE: $PLMNODE"
 cmd="amf-adm lock safNode=$PLMNODE,safCluster=myClmCluster"

The scripts, configuration are attached. 

Attachment: scripts.tgz (4.9 kB; application/x-compressed-tar)

  _  

HYPERLINK "http://sourceforge.net/p/opensaf/tickets/227/"[tickets:#227] clmd 
asserts on active controller during node lock timeout

Status: unassigned
Created: Wed May 15, 2013 10:23 AM UTC by Mathi Naickan
Last Updated: Fri Jun 28, 2013 10:45 AM UTC
Owner: Mathi Naickan

I have asked for traces from the submitter.

changeset : 4007 with patch 2865
scenario:

Trying to do lock/lock-in of PL-5.
amf-adm lock safNode=PL-5,safCluster=myClmCluster
error - saImmOmAdminOperationInvoke_2 FAILED: SA_AIS_ERR_TIMEOUT (5)
error: failed to eval/store amf-adm lock safNode=PL-5,safCluster=myClmCluster 
failed. Aborting script! exitCode: 1


0 0x7fb446240b55 in raise () from /lib64/libc.so.6


(gdb) bt


0 0x7fb446240b55 in raise () from /lib64/libc.so.6


1 0x7fb446242131 in abort () from /lib64/libc.so.6


2 0x7fb447881e44 in osafassert_fail (file=0x420380 "clms_evt.c", line=390,


func=0x420680 "proc_node_lock_tmr_exp_msg", assertion=0x42069b "op_node != 
NULL") at sysf_def.c:301


3 0x0040954a in proc_node_lock_tmr_exp_msg (evt=0x655290) at 
clms_evt.c:390


4 0x0040bc42 in clms_process_mbx (mbx=0x6298a0) at clms_evt.c:1272


5 0x00412b3b in main (argc=1, argv=0x7fff3162cb28) at clms_main.c:455


(gdb) bt full


0 0x7fb446240b55 in raise () from /lib64/libc.so.6


No symbol table info available.


1 0x7fb446242131 in abort () from /lib64/libc.so.6


No symbol table info available.


2 0x7fb447881e44 in osafassert_fail (file=0x420380 "clms_evt.c", line=390,


func=0x420680 "proc_node_lock_tmr_exp_msg", assertion=0x42069b "op_node != 
NULL") at sysf_def.c:301
No locals.


3 0x0040954a in proc_node_lock_tmr_exp_msg (evt=0x655290) at 
clms_evt.c:390


rc = 1
node_id = 132367
op_node = 0x0
FUNCTION = "proc_node_lock_tmr_exp_msg"


4 0x0040bc42 in clms_process_mbx (mbx=0x6298a0) at clms_evt.c:1272


msg = 0x655290
FUNCTION = "clms_process_mbx"


5 0x00412b3b in main (argc=1, argv=0x7fff3162cb28) at clms_main.c:455


ret = 1
mbx_fd = {raise_obj = 11, rmv_obj = 12}
error = SA_AIS_OK
rc = 1
FUNCTION = "main"
syslog on sc-1:
==
Mar 13 12:27:23 SLES1 osafclmd[6575]: clms_evt.c:390: 
proc_node_lock_tmr_exp_msg: Assertion 'op_node != NULL' failed.
Mar 13 12:27:23 SLES1 osafamfnd[6604]: NO 
'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Mar 13 12:27:23 SLES1 osafamfnd[6604]: ER 
safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Mar 13 12:27:23 SLES1 osafamfnd[6604]: Rebooting OpenSAF NodeId? = 131343 EE 
Name = , Reason: Component faulted: recovery is node failfast
Mar 13 12:27:23 SLES1 opensaf_reboot: Rebooting local node

  _  

Sent from sourceforge.net because you indicated interest in 
https://sourceforge.net/p/opensaf/tickets/227/

To unsubscribe from further messages, please visit 
https://sourceforge.net/auth/subscriptions/
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
Opensaf-devel mailin

Re: [devel] AMF static code analysis regression in 4.2.4 and 4.3.1

2013-07-29 Thread Anders Widell
Thanks for your analysis. I still don't understand the code, but if you 
think this warning is a false positive I take your word for it. Some 
additional info from the warning:

The free happens at line 504 in file avd_siass.c:
 free(susi);

The dereference happens at line 1070 in file avd_csi.c:
 while (susi->list_of_csicomp != NULL) {

regards,
Anders Widell

2013-07-29 13:50, Nagendra Kumar skrev:
> Hi,
>   i_su->list_of_susi has list of susi. m_AVD_SU_SI_TRG_DEL calls 
> avd_susi_delete, which in tern separate susi from i_su->list_of_susi and 
> deletes it.
>
> Code is below:
>  /* now delete it from the SU list */
>  if (p_su_si == NULL) {
>  susi->su->list_of_susi = susi->su_next;
>  susi->su_next = NULL;
>  } else {
>  p_su_si->su_next = susi->su_next;
>  susi->su_next = NULL;
>  }
>
>  /* now delete it from the SI list */
>  if (p_si_su == NULL) {
>  susi->si->list_of_sisu = susi->si_next;
>  susi->si_next = NULL;
>  } else {
>  p_si_su->si_next = susi->si_next;
>  susi->si_next = NULL;
>  }
>
> And then deletes it. This means  one susi is deleted from from 
> i_su->list_of_susi. When last susi is deleted from i_su->list_of_susi, then 
> i_su->list_of_susi becomes null and it exists from 'while loop'.
>
> So, I see no problem.
>
> Let me know if any further clarifications is required.
>
> Thanks
> -Nagu
>
> -Original Message-
> From: Anders Widell [mailto:anders.wid...@ericsson.com]
> Sent: 29 July 2013 16:12
> To: opensaf-devel@lists.sourceforge.net
> Cc: Nagendra Kumar; Hans Feldt; praveen malviya; UABHANO
> Subject: AMF static code analysis regression in 4.2.4 and 4.3.1
>
> Hi!
>
> I ran some static code analysis on the release candidates for OpenSAF
> 4.2.4 and 4.3.1. I got a few regressions towards 4.2.3 and 4.3.0, and I need 
> your help to analyze the following in avd_sgproc.c. The warning says that 
> i_su->list_of_susi is used after free(). It is freed by 
> m_AVD_SU_SI_TRG_DEL(), and then dereferenced by avd_compcsi_delete() in the 
> next iteration.
>
> When I look at the code, I don't understand it at all. Does this loop below 
> terminate? The loop terminates when i_su->list_of_susi is NULL, but it is not 
> modified within the loop body! If the loop terminates, it must be because 
> i_su->list_of_susi is somehow modified as a side-effect of calling 
> avd_compcsi_delete() or m_AVD_SU_SI_TRG_DEL(). This is a very ugly way 
> coding!!!
>
> Line 1398 - 1408 in osaf/services/saf/avsv/avd/avd_sgproc.c on branch 
> opensaf-4.3.x (tag 4.3.1RC1):
> -
>   /* Free all the SU SI assignments for all the SIs on the
>* the SU if there are any.
>*/
>
>   while (i_su->list_of_susi != AVD_SU_SI_REL_NULL) {
>
>   /* free all the CSI assignments  */
>   avd_compcsi_delete(cb, i_su->list_of_susi, false);
>   /* Unassign the SUSI */
>   m_AVD_SU_SI_TRG_DEL(cb, i_su->list_of_susi);
>   }
> -
>
> regards,
> Anders Widell
>


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] AMF static code analysis regression in 4.2.4 and 4.3.1

2013-07-29 Thread Nagendra Kumar
Hi,
i_su->list_of_susi has list of susi. m_AVD_SU_SI_TRG_DEL calls 
avd_susi_delete, which in tern separate susi from i_su->list_of_susi and 
deletes it.

Code is below:
/* now delete it from the SU list */
if (p_su_si == NULL) {
susi->su->list_of_susi = susi->su_next;
susi->su_next = NULL;
} else {
p_su_si->su_next = susi->su_next;
susi->su_next = NULL;
}

/* now delete it from the SI list */
if (p_si_su == NULL) {
susi->si->list_of_sisu = susi->si_next;
susi->si_next = NULL;
} else {
p_si_su->si_next = susi->si_next;
susi->si_next = NULL;
}

And then deletes it. This means  one susi is deleted from from 
i_su->list_of_susi. When last susi is deleted from i_su->list_of_susi, then 
i_su->list_of_susi becomes null and it exists from 'while loop'.

So, I see no problem.

Let me know if any further clarifications is required.

Thanks
-Nagu

-Original Message-
From: Anders Widell [mailto:anders.wid...@ericsson.com] 
Sent: 29 July 2013 16:12
To: opensaf-devel@lists.sourceforge.net
Cc: Nagendra Kumar; Hans Feldt; praveen malviya; UABHANO
Subject: AMF static code analysis regression in 4.2.4 and 4.3.1

Hi!

I ran some static code analysis on the release candidates for OpenSAF
4.2.4 and 4.3.1. I got a few regressions towards 4.2.3 and 4.3.0, and I need 
your help to analyze the following in avd_sgproc.c. The warning says that 
i_su->list_of_susi is used after free(). It is freed by m_AVD_SU_SI_TRG_DEL(), 
and then dereferenced by avd_compcsi_delete() in the next iteration.

When I look at the code, I don't understand it at all. Does this loop below 
terminate? The loop terminates when i_su->list_of_susi is NULL, but it is not 
modified within the loop body! If the loop terminates, it must be because 
i_su->list_of_susi is somehow modified as a side-effect of calling 
avd_compcsi_delete() or m_AVD_SU_SI_TRG_DEL(). This is a very ugly way coding!!!

Line 1398 - 1408 in osaf/services/saf/avsv/avd/avd_sgproc.c on branch 
opensaf-4.3.x (tag 4.3.1RC1):
-
 /* Free all the SU SI assignments for all the SIs on the
  * the SU if there are any.
  */

 while (i_su->list_of_susi != AVD_SU_SI_REL_NULL) {

 /* free all the CSI assignments  */
 avd_compcsi_delete(cb, i_su->list_of_susi, false);
 /* Unassign the SUSI */
 m_AVD_SU_SI_TRG_DEL(cb, i_su->list_of_susi);
 }
-

regards,
Anders Widell


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] AMF static code analysis regression in 4.2.4 and 4.3.1

2013-07-29 Thread Anders Widell
Hi!

I ran some static code analysis on the release candidates for OpenSAF 
4.2.4 and 4.3.1. I got a few regressions towards 4.2.3 and 4.3.0, and I 
need your help to analyze the following in avd_sgproc.c. The warning 
says that i_su->list_of_susi is used after free(). It is freed by 
m_AVD_SU_SI_TRG_DEL(), and then dereferenced by avd_compcsi_delete() in 
the next iteration.

When I look at the code, I don't understand it at all. Does this loop 
below terminate? The loop terminates when i_su->list_of_susi is NULL, 
but it is not modified within the loop body! If the loop terminates, it 
must be because i_su->list_of_susi is somehow modified as a side-effect 
of calling avd_compcsi_delete() or m_AVD_SU_SI_TRG_DEL(). This is a very 
ugly way coding!!!

Line 1398 - 1408 in osaf/services/saf/avsv/avd/avd_sgproc.c on branch 
opensaf-4.3.x (tag 4.3.1RC1):
-
 /* Free all the SU SI assignments for all the SIs on the
  * the SU if there are any.
  */

 while (i_su->list_of_susi != AVD_SU_SI_REL_NULL) {

 /* free all the CSI assignments  */
 avd_compcsi_delete(cb, i_su->list_of_susi, false);
 /* Unassign the SUSI */
 m_AVD_SU_SI_TRG_DEL(cb, i_su->list_of_susi);
 }
-

regards,
Anders Widell


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel