Re: [devel] [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process takes too long time before exec (#514)
Hi Hans N, >>1. OPENSAF_CHILD_EXEC_TIME_TOLERANCE is the name of a new environment >>variable where value is used as input to alarm, if not set it is default 2 >>seconds. Do we have some place holder for this variable for configuration and are we going to add it in README for information. >> if the child "hangs" before exec this extra coredump should give >> information where/what is wrong. This means that fork hangs, am I right ? If yes, then dump is not going to provide any information as it is a system call, it can only show, it hangs in fork. >> After exec, it will work as usual This confirms that we are only targeting fork to debug. Thanks -Nagu -Original Message- From: Hans Nordebäck [mailto:hans.nordeb...@ericsson.com] Sent: 30 July 2013 11:57 To: Nagendra Kumar Cc: opensaf-devel@lists.sourceforge.net; Praveen Malviya; Ramesh Babu Betham; Hans Feldt; Hans Nordebäck Subject: RE: [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process takes too long time before exec (#514) Hi Nagu, 1. OPENSAF_CHILD_EXEC_TIME_TOLERANCE is the name of a new environment variable where value is used as input to alarm, if not set it is default 2 seconds. 2. Yes you are right, in this particular case it is set to 10 sec, that's why the env. variable above can be set. 3. This alarm is just an additional precaution, at no extra cost, to check the child part before the exec. After exec it will work as usual but if the child "hangs" before exec this extra coredump should give information where/what is wrong. /BR HansN -Original Message- From: Nagendra Kumar [mailto:nagendr...@oracle.com] Sent: den 30 juli 2013 07:11 To: Hans Nordebäck; Praveen Malviya; Hans Feldt; Ramesh Babu Betham Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process takes too long time before exec (#514) Hi Hans N, For my understanding, can you please provide the below information: 1. I can't find OPENSAF_CHILD_EXEC_TIME_TOLERANCE in opensaf source code. 2. I hope the child process is hung for more than saAmfCtDefClcCliTimeout resulting in CLC time out. Am I right? 3. Even we add assert in child process and we get core dump, but it may not give any information as it got delayed because of system issue. Are we targeting, which system call the child process is hung? Thanks -Nagu -Original Message- From: Hans Nordeback [mailto:hans.nordeb...@ericsson.com] Sent: 22 July 2013 17:07 To: Nagendra Kumar; Praveen Malviya; hans.fe...@ericsson.com; Ramesh Babu Betham Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process takes too long time before exec (#514) osaf/libs/core/leap/os_defs.c | 27 +++ 1 files changed, 27 insertions(+), 0 deletions(-) amfnd calls ncs_os_process_execute_timed and the child process takes too long time before exec, (10 sec timeout). An alarm is set in the ncs_os_process_execute_timed child process. If timed out a core dump will be produced to be able to trouble shoot. diff --git a/osaf/libs/core/leap/os_defs.c b/osaf/libs/core/leap/os_defs.c --- a/osaf/libs/core/leap/os_defs.c +++ b/osaf/libs/core/leap/os_defs.c @@ -65,6 +65,15 @@ bool gl_ncs_atomic_mtx_initialise = fals * description of SOCK_CLOEXEC. */ static pthread_mutex_t s_cloexec_mutex = PTHREAD_MUTEX_INITIALIZER; +/* + * ALRM signal is used to detect if child process takes too long time before exec. + * + * @param sig + */ +static void sigalrm_handler(int sig) +{ + abort(); +} /*** * * uns64 @@ -999,6 +1008,22 @@ uint32_t ncs_os_process_execute_timed(NC osaf_mutex_lock_ordie(&s_cloexec_mutex); if ((pid = fork()) == 0) { +unsigned int alarm_time_sec; +char* alarm_time; + +if (signal(SIGALRM, sigalrm_handler) == SIG_ERR) { +LOG_ER("signal ALRM failed: %s", strerror(errno)); +} +if ((alarm_time = getenv("OPENSAF_CHILD_EXEC_TIME_TOLERANCE")) != NULL) { +alarm_time_sec = strtol(alarm_time, NULL, 0); +} +else { +// default alarm timeout 2 seconds +alarm_time_sec = 2; +} + +alarm(alarm_time_sec); + /* ** Make sure forked processes have default scheduling class ** independent of the callers scheduling class. @@ -1054,6 +1079,8 @@ uint32_t ncs_os_process_execute_timed(NC } #endif +alarm(0); + /* child part */ if (execvp(req->i_script, req->i_argv) == -1) { syslog(LOG_ERR, "%s: execvp '%s' faile
Re: [devel] [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process takes too long time before exec (#514)
Hi Nagu, 1. OPENSAF_CHILD_EXEC_TIME_TOLERANCE is the name of a new environment variable where value is used as input to alarm, if not set it is default 2 seconds. 2. Yes you are right, in this particular case it is set to 10 sec, that's why the env. variable above can be set. 3. This alarm is just an additional precaution, at no extra cost, to check the child part before the exec. After exec it will work as usual but if the child "hangs" before exec this extra coredump should give information where/what is wrong. /BR HansN -Original Message- From: Nagendra Kumar [mailto:nagendr...@oracle.com] Sent: den 30 juli 2013 07:11 To: Hans Nordebäck; Praveen Malviya; Hans Feldt; Ramesh Babu Betham Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process takes too long time before exec (#514) Hi Hans N, For my understanding, can you please provide the below information: 1. I can't find OPENSAF_CHILD_EXEC_TIME_TOLERANCE in opensaf source code. 2. I hope the child process is hung for more than saAmfCtDefClcCliTimeout resulting in CLC time out. Am I right? 3. Even we add assert in child process and we get core dump, but it may not give any information as it got delayed because of system issue. Are we targeting, which system call the child process is hung? Thanks -Nagu -Original Message- From: Hans Nordeback [mailto:hans.nordeb...@ericsson.com] Sent: 22 July 2013 17:07 To: Nagendra Kumar; Praveen Malviya; hans.fe...@ericsson.com; Ramesh Babu Betham Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process takes too long time before exec (#514) osaf/libs/core/leap/os_defs.c | 27 +++ 1 files changed, 27 insertions(+), 0 deletions(-) amfnd calls ncs_os_process_execute_timed and the child process takes too long time before exec, (10 sec timeout). An alarm is set in the ncs_os_process_execute_timed child process. If timed out a core dump will be produced to be able to trouble shoot. diff --git a/osaf/libs/core/leap/os_defs.c b/osaf/libs/core/leap/os_defs.c --- a/osaf/libs/core/leap/os_defs.c +++ b/osaf/libs/core/leap/os_defs.c @@ -65,6 +65,15 @@ bool gl_ncs_atomic_mtx_initialise = fals * description of SOCK_CLOEXEC. */ static pthread_mutex_t s_cloexec_mutex = PTHREAD_MUTEX_INITIALIZER; +/* + * ALRM signal is used to detect if child process takes too long time before exec. + * + * @param sig + */ +static void sigalrm_handler(int sig) +{ + abort(); +} /*** * * uns64 @@ -999,6 +1008,22 @@ uint32_t ncs_os_process_execute_timed(NC osaf_mutex_lock_ordie(&s_cloexec_mutex); if ((pid = fork()) == 0) { +unsigned int alarm_time_sec; +char* alarm_time; + +if (signal(SIGALRM, sigalrm_handler) == SIG_ERR) { +LOG_ER("signal ALRM failed: %s", strerror(errno)); +} +if ((alarm_time = getenv("OPENSAF_CHILD_EXEC_TIME_TOLERANCE")) != NULL) { +alarm_time_sec = strtol(alarm_time, NULL, 0); +} +else { +// default alarm timeout 2 seconds +alarm_time_sec = 2; +} + +alarm(alarm_time_sec); + /* ** Make sure forked processes have default scheduling class ** independent of the callers scheduling class. @@ -1054,6 +1079,8 @@ uint32_t ncs_os_process_execute_timed(NC } #endif +alarm(0); + /* child part */ if (execvp(req->i_script, req->i_argv) == -1) { syslog(LOG_ERR, "%s: execvp '%s' failed - %s", __FUNCTION__, req->i_script, strerror(errno)); -- Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process takes too long time before exec (#514)
Hi Hans N, For my understanding, can you please provide the below information: 1. I can't find OPENSAF_CHILD_EXEC_TIME_TOLERANCE in opensaf source code. 2. I hope the child process is hung for more than saAmfCtDefClcCliTimeout resulting in CLC time out. Am I right? 3. Even we add assert in child process and we get core dump, but it may not give any information as it got delayed because of system issue. Are we targeting, which system call the child process is hung? Thanks -Nagu -Original Message- From: Hans Nordeback [mailto:hans.nordeb...@ericsson.com] Sent: 22 July 2013 17:07 To: Nagendra Kumar; Praveen Malviya; hans.fe...@ericsson.com; Ramesh Babu Betham Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process takes too long time before exec (#514) osaf/libs/core/leap/os_defs.c | 27 +++ 1 files changed, 27 insertions(+), 0 deletions(-) amfnd calls ncs_os_process_execute_timed and the child process takes too long time before exec, (10 sec timeout). An alarm is set in the ncs_os_process_execute_timed child process. If timed out a core dump will be produced to be able to trouble shoot. diff --git a/osaf/libs/core/leap/os_defs.c b/osaf/libs/core/leap/os_defs.c --- a/osaf/libs/core/leap/os_defs.c +++ b/osaf/libs/core/leap/os_defs.c @@ -65,6 +65,15 @@ bool gl_ncs_atomic_mtx_initialise = fals * description of SOCK_CLOEXEC. */ static pthread_mutex_t s_cloexec_mutex = PTHREAD_MUTEX_INITIALIZER; +/* + * ALRM signal is used to detect if child process takes too long time before exec. + * + * @param sig + */ +static void sigalrm_handler(int sig) +{ + abort(); +} /*** * * uns64 @@ -999,6 +1008,22 @@ uint32_t ncs_os_process_execute_timed(NC osaf_mutex_lock_ordie(&s_cloexec_mutex); if ((pid = fork()) == 0) { +unsigned int alarm_time_sec; +char* alarm_time; + +if (signal(SIGALRM, sigalrm_handler) == SIG_ERR) { +LOG_ER("signal ALRM failed: %s", strerror(errno)); +} +if ((alarm_time = getenv("OPENSAF_CHILD_EXEC_TIME_TOLERANCE")) != NULL) { +alarm_time_sec = strtol(alarm_time, NULL, 0); +} +else { +// default alarm timeout 2 seconds +alarm_time_sec = 2; +} + +alarm(alarm_time_sec); + /* ** Make sure forked processes have default scheduling class ** independent of the callers scheduling class. @@ -1054,6 +1079,8 @@ uint32_t ncs_os_process_execute_timed(NC } #endif +alarm(0); + /* child part */ if (execvp(req->i_script, req->i_argv) == -1) { syslog(LOG_ERR, "%s: execvp '%s' failed - %s", __FUNCTION__, req->i_script, strerror(errno)); -- Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] AMF static code analysis regression in 4.2.4 and 4.3.1
Sorry, the diagrams are lost. Adding text instead of diagrams: -Original Message- From: Nagendra Kumar Sent: 30 July 2013 10:15 To: Anders Widell; opensaf-devel@lists.sourceforge.net Subject: Re: [devel] AMF static code analysis regression in 4.2.4 and 4.3.1 Hi, Just briefing it so that we are all on the same page: Let us say, SU1 has three SIs(SI1, SI2 and SI3) assigned and i_su->list_of_susi pointer points to i_su->list_of_susi -> SUSI1->SUSI2->SUSI3->NULL while (i_su->list_of_susi != AVD_SU_SI_REL_NULL) { /* free all the CSI assignments */ avd_compcsi_delete(cb, i_su->list_of_susi, false); /* Unassign the SUSI */ m_AVD_SU_SI_TRG_DEL(cb, i_su->list_of_susi); } First iteration of above code : 1. avd_compcsi_delete deletes comp csi of SUSI1 from list_of_csicomp. 2. m_AVD_SU_SI_TRG_DEL calls avd_susi_delete(SUSI1). After below line of execution, p_su_si is null and i_su_si points to SUSI1. /* check the SU list to get the prev pointer */ i_su_si = susi->su->list_of_susi; p_su_si = NULL; while ((i_su_si != NULL) && (i_su_si != susi)) { p_su_si = i_su_si; i_su_si = i_su_si->su_next; } 3. Now, the below lines of code executes. /* now delete it from the SU list */ if (p_su_si == NULL) { susi->su->list_of_susi = susi->su_next; susi->su_next = NULL; } else { p_su_si->su_next = susi->su_next; susi->su_next = NULL; } After this line of execution, i_su->list_of_susi points to SUSI2. And SUSI1->next is NULL now (Earlier SUSI1->next was set to SUSI2). 4. After below lines of execution, SUSI1->si is null and SUSI1->su is null. susi->si = NULL; susi->su = NULL; 5. The below line free SUSI1. free(susi); 6. At this state the link list is as below: i_su->list_of_susi ->SUSI2->SUSI3-> NULL Next time, when while loop executes, SUSI2 will be deleted and after third iteration, SUSI3 will be deleted and i_su->list_of_susi will be null and while loop will exit. Let me know if any further clarification is needed. Thanks -Nagu -Original Message- From: Anders Widell [mailto:anders.wid...@ericsson.com] Sent: 29 July 2013 20:16 To: opensaf-devel@lists.sourceforge.net Cc: Nagendra Kumar; Hans Feldt; praveen malviya; UABHANO Subject: Re: [devel] AMF static code analysis regression in 4.2.4 and 4.3.1 I assume the side effect that modifies the variable i_su->list_of_susi in the loop conditional happens at the following lines in the code you sent: /* now delete it from the SU list */ if (p_su_si == NULL) { susi->su->list_of_susi = susi->su_next; This only happens when p_su_si is NULL. What happens if p_su_si is not NULL? Will i_su->list_of_susi have the same value also in the next iteration? free(susi) is executed unconditionally at the end of the avd_susi_delete() function, though there are a couple of return statements in some branches in the code above it. regards, Anders Widell The code is 2013-07-29 14:14, Anders Widell skrev: > Thanks for your analysis. I still don't understand the code, but if > you think this warning is a false positive I take your word for it. > Some additional info from the warning: > > The free happens at line 504 in file avd_siass.c: > free(susi); > > The dereference happens at line 1070 in file avd_csi.c: > while (susi->list_of_csicomp != NULL) { > > regards, > Anders Widell > > 2013-07-29 13:50, Nagendra Kumar skrev: >> Hi, >> i_su->list_of_susi has list of susi. m_AVD_SU_SI_TRG_DEL calls >> avd_susi_delete, which in tern separate susi from i_su->list_of_susi and >> deletes it. >> >> Code is below: >> /* now delete it from the SU list */ >> if (p_su_si == NULL) { >> susi->su->list_of_susi = susi->su_next; >> susi->su_next = NULL; >> } else { >> p_su_si->su_next = susi->su_next; >> susi->su_next = NULL; >> } >> >> /* now delete it from the SI list */ >> if (p_si_su == NULL) { >> susi->si->list_of_sisu = susi->si_next; >> susi->si_next = NULL; >> } else { >> p_si_su->si_next = susi->si_next; >> susi->si_next = NULL; >> } >> >> And then deletes it. This means one susi is deleted from from >> i_su->list_of_susi. When last susi is deleted from i_su->list_of_susi, then >> i_su->list_of_susi becomes null and it exists from 'while loop'. >> >> So, I see no problem. >> >> Let me know if any furthe
Re: [devel] AMF static code analysis regression in 4.2.4 and 4.3.1
Hi, Just briefing it so that we are all on the same page: Let us say, SU1 has three SIs(SI1, SI2 and SI3) assigned and i_su->list_of_susi pointer points to i_su->list_of_susi à NULL while (i_su->list_of_susi != AVD_SU_SI_REL_NULL) { /* free all the CSI assignments */ avd_compcsi_delete(cb, i_su->list_of_susi, false); /* Unassign the SUSI */ m_AVD_SU_SI_TRG_DEL(cb, i_su->list_of_susi); } First iteration of above code : 1. avd_compcsi_delete deletes comp csi of SUSI1 from list_of_csicomp. 2. m_AVD_SU_SI_TRG_DEL calls avd_susi_delete(SUSI1). After below line of execution, p_su_si is null and i_su_si points to SUSI1. /* check the SU list to get the prev pointer */ i_su_si = susi->su->list_of_susi; p_su_si = NULL; while ((i_su_si != NULL) && (i_su_si != susi)) { p_su_si = i_su_si; i_su_si = i_su_si->su_next; } 3. Now, the below lines of code executes. /* now delete it from the SU list */ if (p_su_si == NULL) { susi->su->list_of_susi = susi->su_next; susi->su_next = NULL; } else { p_su_si->su_next = susi->su_next; susi->su_next = NULL; } After this line of execution, i_su->list_of_susi points to SUSI2. And SUSI1->next is NULL now (Earlier SUSI1->next was set to SUSI2). 4. After below lines of execution, SUSI1->si is null and SUSI1->su is null. susi->si = NULL; susi->su = NULL; 5. The below line free SUSI1. free(susi); 6. At this state the link list is as below: i_su->list_of_susi à NULL Next time, when while loop executes, SUSI2 will be deleted and after third iteration, SUSI3 will be deleted and i_su->list_of_susi will be null and while loop will exit. Let me know if any further clarification is needed. Thanks -Nagu -Original Message- From: Anders Widell [mailto:anders.wid...@ericsson.com] Sent: 29 July 2013 20:16 To: opensaf-devel@lists.sourceforge.net Cc: Nagendra Kumar; Hans Feldt; praveen malviya; UABHANO Subject: Re: [devel] AMF static code analysis regression in 4.2.4 and 4.3.1 I assume the side effect that modifies the variable i_su->list_of_susi in the loop conditional happens at the following lines in the code you sent: /* now delete it from the SU list */ if (p_su_si == NULL) { susi->su->list_of_susi = susi->su_next; This only happens when p_su_si is NULL. What happens if p_su_si is not NULL? Will i_su->list_of_susi have the same value also in the next iteration? free(susi) is executed unconditionally at the end of the avd_susi_delete() function, though there are a couple of return statements in some branches in the code above it. regards, Anders Widell The code is 2013-07-29 14:14, Anders Widell skrev: > Thanks for your analysis. I still don't understand the code, but if > you think this warning is a false positive I take your word for it. > Some additional info from the warning: > > The free happens at line 504 in file avd_siass.c: > free(susi); > > The dereference happens at line 1070 in file avd_csi.c: > while (susi->list_of_csicomp != NULL) { > > regards, > Anders Widell > > 2013-07-29 13:50, Nagendra Kumar skrev: >> Hi, >> i_su->list_of_susi has list of susi. m_AVD_SU_SI_TRG_DEL calls >> avd_susi_delete, which in tern separate susi from i_su->list_of_susi and >> deletes it. >> >> Code is below: >> /* now delete it from the SU list */ >> if (p_su_si == NULL) { >> susi->su->list_of_susi = susi->su_next; >> susi->su_next = NULL; >> } else { >> p_su_si->su_next = susi->su_next; >> susi->su_next = NULL; >> } >> >> /* now delete it from the SI list */ >> if (p_si_su == NULL) { >> susi->si->list_of_sisu = susi->si_next; >> susi->si_next = NULL; >> } else { >> p_si_su->si_next = susi->si_next; >> susi->si_next = NULL; >> } >> >> And then deletes it. This means one susi is deleted from from >> i_su->list_of_susi. When last susi is deleted from i_su->list_of_susi, then >> i_su->list_of_susi becomes null and it exists from 'while loop'. >> >> So, I see no problem. >> >> Let me know if any further clarifications is required. >> >> Th
Re: [devel] AMF static code analysis regression in 4.2.4 and 4.3.1
I assume the side effect that modifies the variable i_su->list_of_susi in the loop conditional happens at the following lines in the code you sent: /* now delete it from the SU list */ if (p_su_si == NULL) { susi->su->list_of_susi = susi->su_next; This only happens when p_su_si is NULL. What happens if p_su_si is not NULL? Will i_su->list_of_susi have the same value also in the next iteration? free(susi) is executed unconditionally at the end of the avd_susi_delete() function, though there are a couple of return statements in some branches in the code above it. regards, Anders Widell The code is 2013-07-29 14:14, Anders Widell skrev: > Thanks for your analysis. I still don't understand the code, but if you > think this warning is a false positive I take your word for it. Some > additional info from the warning: > > The free happens at line 504 in file avd_siass.c: > free(susi); > > The dereference happens at line 1070 in file avd_csi.c: > while (susi->list_of_csicomp != NULL) { > > regards, > Anders Widell > > 2013-07-29 13:50, Nagendra Kumar skrev: >> Hi, >> i_su->list_of_susi has list of susi. m_AVD_SU_SI_TRG_DEL calls >> avd_susi_delete, which in tern separate susi from i_su->list_of_susi and >> deletes it. >> >> Code is below: >> /* now delete it from the SU list */ >> if (p_su_si == NULL) { >> susi->su->list_of_susi = susi->su_next; >> susi->su_next = NULL; >> } else { >> p_su_si->su_next = susi->su_next; >> susi->su_next = NULL; >> } >> >> /* now delete it from the SI list */ >> if (p_si_su == NULL) { >> susi->si->list_of_sisu = susi->si_next; >> susi->si_next = NULL; >> } else { >> p_si_su->si_next = susi->si_next; >> susi->si_next = NULL; >> } >> >> And then deletes it. This means one susi is deleted from from >> i_su->list_of_susi. When last susi is deleted from i_su->list_of_susi, then >> i_su->list_of_susi becomes null and it exists from 'while loop'. >> >> So, I see no problem. >> >> Let me know if any further clarifications is required. >> >> Thanks >> -Nagu >> >> -Original Message- >> From: Anders Widell [mailto:anders.wid...@ericsson.com] >> Sent: 29 July 2013 16:12 >> To: opensaf-devel@lists.sourceforge.net >> Cc: Nagendra Kumar; Hans Feldt; praveen malviya; UABHANO >> Subject: AMF static code analysis regression in 4.2.4 and 4.3.1 >> >> Hi! >> >> I ran some static code analysis on the release candidates for OpenSAF >> 4.2.4 and 4.3.1. I got a few regressions towards 4.2.3 and 4.3.0, and I need >> your help to analyze the following in avd_sgproc.c. The warning says that >> i_su->list_of_susi is used after free(). It is freed by >> m_AVD_SU_SI_TRG_DEL(), and then dereferenced by avd_compcsi_delete() in the >> next iteration. >> >> When I look at the code, I don't understand it at all. Does this loop below >> terminate? The loop terminates when i_su->list_of_susi is NULL, but it is >> not modified within the loop body! If the loop terminates, it must be >> because i_su->list_of_susi is somehow modified as a side-effect of calling >> avd_compcsi_delete() or m_AVD_SU_SI_TRG_DEL(). This is a very ugly way >> coding!!! >> >> Line 1398 - 1408 in osaf/services/saf/avsv/avd/avd_sgproc.c on branch >> opensaf-4.3.x (tag 4.3.1RC1): >> - >>/* Free all the SU SI assignments for all the SIs on the >> * the SU if there are any. >> */ >> >>while (i_su->list_of_susi != AVD_SU_SI_REL_NULL) { >> >>/* free all the CSI assignments */ >>avd_compcsi_delete(cb, i_su->list_of_susi, false); >>/* Unassign the SUSI */ >>m_AVD_SU_SI_TRG_DEL(cb, i_su->list_of_susi); >>} >> - >> >> regards, >> Anders Widell >> > > -- > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > ___ > Opensaf-devel mailing list > Opensaf-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/opensaf-devel > > -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.ne
Re: [devel] #227 clmd asserts on active controller during node lock timeout
Finally I could get hold of the traces! Turned out to be a simpler case than the initial analysis. Thanks surender for re-running and sharing the traces. It's a simple case of - Issue CLM Lock of a node (PL-5) - Make the PL-5 node non-member - Lock callback timesout and the nodeentry is not found(whichis fine) and the abort gets hit. While the root cause is of an incorrectly placed abort, the fix is to lookup based on name than on id because the node with that id has gone down and is not relevant any more. Cheers, Mathi. From: surender khetavath [mailto:surend...@users.sf.net] Sent: Friday, July 05, 2013 4:36 PM To: [opensaf:tickets] Subject: [opensaf:tickets] #227 clmd asserts on active controller during node lock timeout The issue is always reproducible. Test: A campaign is modeled to include PL-5 and an SU on this node. For this the script '/usr/share/opensaf/immxml/immxml-modify-config' is being used. While doing rollback clm crash is observed. It is seen that the campaign is doing a lock/lock-in op on PL-5 and simultaneously the script immxml-modify-config is also trying to perform admin lock i.e the lines below if commented in immxml-modify-config, then the rollback goes fine. if enabled then clm crashes. PLMNODE=`cat $CURRENT_NODECFG | grep ".. $node " | awk '{ print $ 3 }'` trace "PLMNODE: $PLMNODE" cmd="amf-adm lock safNode=$PLMNODE,safCluster=myClmCluster" The scripts, configuration are attached. Attachment: scripts.tgz (4.9 kB; application/x-compressed-tar) _ HYPERLINK "http://sourceforge.net/p/opensaf/tickets/227/"[tickets:#227] clmd asserts on active controller during node lock timeout Status: unassigned Created: Wed May 15, 2013 10:23 AM UTC by Mathi Naickan Last Updated: Fri Jun 28, 2013 10:45 AM UTC Owner: Mathi Naickan I have asked for traces from the submitter. changeset : 4007 with patch 2865 scenario: Trying to do lock/lock-in of PL-5. amf-adm lock safNode=PL-5,safCluster=myClmCluster error - saImmOmAdminOperationInvoke_2 FAILED: SA_AIS_ERR_TIMEOUT (5) error: failed to eval/store amf-adm lock safNode=PL-5,safCluster=myClmCluster failed. Aborting script! exitCode: 1 0 0x7fb446240b55 in raise () from /lib64/libc.so.6 (gdb) bt 0 0x7fb446240b55 in raise () from /lib64/libc.so.6 1 0x7fb446242131 in abort () from /lib64/libc.so.6 2 0x7fb447881e44 in osafassert_fail (file=0x420380 "clms_evt.c", line=390, func=0x420680 "proc_node_lock_tmr_exp_msg", assertion=0x42069b "op_node != NULL") at sysf_def.c:301 3 0x0040954a in proc_node_lock_tmr_exp_msg (evt=0x655290) at clms_evt.c:390 4 0x0040bc42 in clms_process_mbx (mbx=0x6298a0) at clms_evt.c:1272 5 0x00412b3b in main (argc=1, argv=0x7fff3162cb28) at clms_main.c:455 (gdb) bt full 0 0x7fb446240b55 in raise () from /lib64/libc.so.6 No symbol table info available. 1 0x7fb446242131 in abort () from /lib64/libc.so.6 No symbol table info available. 2 0x7fb447881e44 in osafassert_fail (file=0x420380 "clms_evt.c", line=390, func=0x420680 "proc_node_lock_tmr_exp_msg", assertion=0x42069b "op_node != NULL") at sysf_def.c:301 No locals. 3 0x0040954a in proc_node_lock_tmr_exp_msg (evt=0x655290) at clms_evt.c:390 rc = 1 node_id = 132367 op_node = 0x0 FUNCTION = "proc_node_lock_tmr_exp_msg" 4 0x0040bc42 in clms_process_mbx (mbx=0x6298a0) at clms_evt.c:1272 msg = 0x655290 FUNCTION = "clms_process_mbx" 5 0x00412b3b in main (argc=1, argv=0x7fff3162cb28) at clms_main.c:455 ret = 1 mbx_fd = {raise_obj = 11, rmv_obj = 12} error = SA_AIS_OK rc = 1 FUNCTION = "main" syslog on sc-1: == Mar 13 12:27:23 SLES1 osafclmd[6575]: clms_evt.c:390: proc_node_lock_tmr_exp_msg: Assertion 'op_node != NULL' failed. Mar 13 12:27:23 SLES1 osafamfnd[6604]: NO 'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Mar 13 12:27:23 SLES1 osafamfnd[6604]: ER safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Mar 13 12:27:23 SLES1 osafamfnd[6604]: Rebooting OpenSAF NodeId? = 131343 EE Name = , Reason: Component faulted: recovery is node failfast Mar 13 12:27:23 SLES1 opensaf_reboot: Rebooting local node _ Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/227/ To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk ___ Opensaf-devel mailin
Re: [devel] AMF static code analysis regression in 4.2.4 and 4.3.1
Thanks for your analysis. I still don't understand the code, but if you think this warning is a false positive I take your word for it. Some additional info from the warning: The free happens at line 504 in file avd_siass.c: free(susi); The dereference happens at line 1070 in file avd_csi.c: while (susi->list_of_csicomp != NULL) { regards, Anders Widell 2013-07-29 13:50, Nagendra Kumar skrev: > Hi, > i_su->list_of_susi has list of susi. m_AVD_SU_SI_TRG_DEL calls > avd_susi_delete, which in tern separate susi from i_su->list_of_susi and > deletes it. > > Code is below: > /* now delete it from the SU list */ > if (p_su_si == NULL) { > susi->su->list_of_susi = susi->su_next; > susi->su_next = NULL; > } else { > p_su_si->su_next = susi->su_next; > susi->su_next = NULL; > } > > /* now delete it from the SI list */ > if (p_si_su == NULL) { > susi->si->list_of_sisu = susi->si_next; > susi->si_next = NULL; > } else { > p_si_su->si_next = susi->si_next; > susi->si_next = NULL; > } > > And then deletes it. This means one susi is deleted from from > i_su->list_of_susi. When last susi is deleted from i_su->list_of_susi, then > i_su->list_of_susi becomes null and it exists from 'while loop'. > > So, I see no problem. > > Let me know if any further clarifications is required. > > Thanks > -Nagu > > -Original Message- > From: Anders Widell [mailto:anders.wid...@ericsson.com] > Sent: 29 July 2013 16:12 > To: opensaf-devel@lists.sourceforge.net > Cc: Nagendra Kumar; Hans Feldt; praveen malviya; UABHANO > Subject: AMF static code analysis regression in 4.2.4 and 4.3.1 > > Hi! > > I ran some static code analysis on the release candidates for OpenSAF > 4.2.4 and 4.3.1. I got a few regressions towards 4.2.3 and 4.3.0, and I need > your help to analyze the following in avd_sgproc.c. The warning says that > i_su->list_of_susi is used after free(). It is freed by > m_AVD_SU_SI_TRG_DEL(), and then dereferenced by avd_compcsi_delete() in the > next iteration. > > When I look at the code, I don't understand it at all. Does this loop below > terminate? The loop terminates when i_su->list_of_susi is NULL, but it is not > modified within the loop body! If the loop terminates, it must be because > i_su->list_of_susi is somehow modified as a side-effect of calling > avd_compcsi_delete() or m_AVD_SU_SI_TRG_DEL(). This is a very ugly way > coding!!! > > Line 1398 - 1408 in osaf/services/saf/avsv/avd/avd_sgproc.c on branch > opensaf-4.3.x (tag 4.3.1RC1): > - > /* Free all the SU SI assignments for all the SIs on the >* the SU if there are any. >*/ > > while (i_su->list_of_susi != AVD_SU_SI_REL_NULL) { > > /* free all the CSI assignments */ > avd_compcsi_delete(cb, i_su->list_of_susi, false); > /* Unassign the SUSI */ > m_AVD_SU_SI_TRG_DEL(cb, i_su->list_of_susi); > } > - > > regards, > Anders Widell > -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] AMF static code analysis regression in 4.2.4 and 4.3.1
Hi, i_su->list_of_susi has list of susi. m_AVD_SU_SI_TRG_DEL calls avd_susi_delete, which in tern separate susi from i_su->list_of_susi and deletes it. Code is below: /* now delete it from the SU list */ if (p_su_si == NULL) { susi->su->list_of_susi = susi->su_next; susi->su_next = NULL; } else { p_su_si->su_next = susi->su_next; susi->su_next = NULL; } /* now delete it from the SI list */ if (p_si_su == NULL) { susi->si->list_of_sisu = susi->si_next; susi->si_next = NULL; } else { p_si_su->si_next = susi->si_next; susi->si_next = NULL; } And then deletes it. This means one susi is deleted from from i_su->list_of_susi. When last susi is deleted from i_su->list_of_susi, then i_su->list_of_susi becomes null and it exists from 'while loop'. So, I see no problem. Let me know if any further clarifications is required. Thanks -Nagu -Original Message- From: Anders Widell [mailto:anders.wid...@ericsson.com] Sent: 29 July 2013 16:12 To: opensaf-devel@lists.sourceforge.net Cc: Nagendra Kumar; Hans Feldt; praveen malviya; UABHANO Subject: AMF static code analysis regression in 4.2.4 and 4.3.1 Hi! I ran some static code analysis on the release candidates for OpenSAF 4.2.4 and 4.3.1. I got a few regressions towards 4.2.3 and 4.3.0, and I need your help to analyze the following in avd_sgproc.c. The warning says that i_su->list_of_susi is used after free(). It is freed by m_AVD_SU_SI_TRG_DEL(), and then dereferenced by avd_compcsi_delete() in the next iteration. When I look at the code, I don't understand it at all. Does this loop below terminate? The loop terminates when i_su->list_of_susi is NULL, but it is not modified within the loop body! If the loop terminates, it must be because i_su->list_of_susi is somehow modified as a side-effect of calling avd_compcsi_delete() or m_AVD_SU_SI_TRG_DEL(). This is a very ugly way coding!!! Line 1398 - 1408 in osaf/services/saf/avsv/avd/avd_sgproc.c on branch opensaf-4.3.x (tag 4.3.1RC1): - /* Free all the SU SI assignments for all the SIs on the * the SU if there are any. */ while (i_su->list_of_susi != AVD_SU_SI_REL_NULL) { /* free all the CSI assignments */ avd_compcsi_delete(cb, i_su->list_of_susi, false); /* Unassign the SUSI */ m_AVD_SU_SI_TRG_DEL(cb, i_su->list_of_susi); } - regards, Anders Widell -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] AMF static code analysis regression in 4.2.4 and 4.3.1
Hi! I ran some static code analysis on the release candidates for OpenSAF 4.2.4 and 4.3.1. I got a few regressions towards 4.2.3 and 4.3.0, and I need your help to analyze the following in avd_sgproc.c. The warning says that i_su->list_of_susi is used after free(). It is freed by m_AVD_SU_SI_TRG_DEL(), and then dereferenced by avd_compcsi_delete() in the next iteration. When I look at the code, I don't understand it at all. Does this loop below terminate? The loop terminates when i_su->list_of_susi is NULL, but it is not modified within the loop body! If the loop terminates, it must be because i_su->list_of_susi is somehow modified as a side-effect of calling avd_compcsi_delete() or m_AVD_SU_SI_TRG_DEL(). This is a very ugly way coding!!! Line 1398 - 1408 in osaf/services/saf/avsv/avd/avd_sgproc.c on branch opensaf-4.3.x (tag 4.3.1RC1): - /* Free all the SU SI assignments for all the SIs on the * the SU if there are any. */ while (i_su->list_of_susi != AVD_SU_SI_REL_NULL) { /* free all the CSI assignments */ avd_compcsi_delete(cb, i_su->list_of_susi, false); /* Unassign the SUSI */ m_AVD_SU_SI_TRG_DEL(cb, i_su->list_of_susi); } - regards, Anders Widell -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel