Hi Hans,

Please see inline with [Praveen].

Thanks,
Praveen

On 16-Dec-16 4:45 PM, Hans Nordeback wrote:
> Hi Praveen, please see inline with [HansN].
>
> /Thanks HansN
>
>
> On 12/16/2016 11:13 AM, praveen malviya wrote:
>> Hi Hans,
>>
>> Please see inline with [Praveen].
>>
>> Thanks,
>> Praveen
>>
>> On 16-Dec-16 3:14 PM, Hans Nordeback wrote:
>>> Hi Praveen,
>>>
>>> please see some comments/questions inline below.
>>>
>>> /Thanks HansN
>>>
>>>
>>> On 12/16/2016 07:42 AM, praveen malviya wrote:
>>>> Hi Hans,
>>>>
>>>> Currently, AMFND responds to NID when all MW components are in AMF
>>>> control. So there is window.
>>> [HansN] you mean 'no window'?
>> [Praveen] Sorry I missed 'no'. It is indeed 'no window'.
>>>> I have explicitly mentioned in the patch below (see Note in main() in
>>>> amfnd.cc below), that this may be usable only in future.
>>>> However there is one very weak case, when AMFND is just starting NoRed
>>>> components like CPD etc and immnd has still not become AMF component
>>>> and also AMFND has not responded to NID. Now at this stage if immnd
>>>> process crashes then node will reboot only after CPD instantiation
>>>> timer expires which is about 10 seconds.
>>> [HansN] with #2204 v2 nid will detect that immnd has crashed and it is
>>> configurable in the opensafd script to reboot the node or not at
>>> nid failure. If amfnd, with this patch, also detects that immnd has
>>> crashed, there may be a raise between amfnd and nid,
>>> where amfnd performs a reboot while nid may be configured to not reboot?
>> [Praveen] I agree with this. But this puts a restriction on NID
>> recovery that its recovery can be reboot only because if NID wants to
>> restart IMMND process again, then all other MW processes needs to be
>> examined to take this into account and be ready for re-initialization
>> with IMM in case the restarted IMM does not resurrect old handles. Not
>> only handles, there can be other resources also.
>
> [HansN] the configurable restart of e.g. IMMND will not be done after
> nid notify. There will be no restart attempts of e.g IMMND after
>
> nid notify. The FIFOmonitoring will not restart the service, but exit.
[Praveen] So the concern that I raised does not hold now as NID will not 
restart IMMND after it has done NID notify. So we have following 
sequences of handlings:
1) If IMMND crashes before replying to NID then NID will restart it.
2) If IMMND crashes after NID reply then NID will exit with error. Now 
opensafd script will reboot the node based on the user configured value 
of REBOOT_ON_FAIL_TIMEOUT.

Now case2 above gives choice to the user to configure for node reboot in 
such cases.
Coming back to this patch of #2158: AMFND is started by NID as last 
service. By this time, IMMND has already notified to the NID. As I had 
already said #2158 patch will reboot the node. Here NID's action can be 
to exit and based on NID status opensaf script will take action. So 
rebooting the node because of #2158 patch will interfere with user's 
choice of REBOOT_ON_FAIL_TIMEOUT. Because of this fact #2158 becomes a 
duplicate of #2204.


>
>> Since #2204 is a defect so, I think, enhance capabilities can be
>> ignored. In that case #2158 becomes almost a duplicate of #2204.
>>
>>>> But this patch will do an immediate reboot. In this case it can be
>>>> argued that NID can also take action but I think since this is the
>>>> last service NID will also act very lately. Here other NoRed
>>>> components instantiation will fail as, I think, they want to use IMM
>>>> which is not available.
>>>> I will see if something can be done to make IMM as first Nored comp to
>>>> get instantiated using compinstantiationlevel. In that case this patch
>>>> may not be needed.
>>>>
>>>>
>>>> Thanks,
>>>> Praveen
>>>>
>>>> On 15-Dec-16 7:45 PM, Hans Nordeback wrote:
>>>>> Hi Praveen,
>>>>>
>>>>> I'll review this ticket tomorrow, but one question in advance, what is
>>>>> the use case for this ticket?
>>>>>
>>>>> Looking at the ticket description, the use case seems to be immnd
>>>>> crashes during the nid phase,
>>>>>
>>>>> but that use case is handled by ticket #2204. Is there a "window"
>>>>> after
>>>>> amfnd has called "nid_notify" where
>>>>>
>>>>> immnd is not monitored by amf?  /Thanks HansN
>>>>>
>>>>>
>>>>> On 12/13/2016 11:48 AM, praveen.malv...@oracle.com wrote:
>>>>>> osaf/services/saf/amf/amfnd/clc.cc              |  11 +++
>>>>>>   osaf/services/saf/amf/amfnd/comp.cc             |  13 ++++
>>>>>>   osaf/services/saf/amf/amfnd/evt.cc              |   4 +
>>>>>>   osaf/services/saf/amf/amfnd/include/avnd_cb.h   |   6 ++
>>>>>>   osaf/services/saf/amf/amfnd/include/avnd_comp.h |   1 +
>>>>>>   osaf/services/saf/amf/amfnd/include/avnd_evt.h  |   1 +
>>>>>>   osaf/services/saf/amf/amfnd/include/avnd_proc.h |   1 +
>>>>>>   osaf/services/saf/amf/amfnd/main.cc             |  72
>>>>>> +++++++++++++++++++++++-
>>>>>>   8 files changed, 105 insertions(+), 4 deletions(-)
>>>>>>
>>>>>>
>>>>>> If IMMND dies before it becomes AMF component there is no entity to
>>>>>> restart it.
>>>>>> With ticket #2204, NID will monitor using existing FIFO each of the
>>>>>> started process until it itself exits.
>>>>>> This patch will also monitor IMMND using FIFO.
>>>>>> If IMMND dies before becoming AMF component and after NID exit then
>>>>>> with this patch AMFND will
>>>>>> reboot the node because AMFND does not have configuration to restart
>>>>>> IMMND.
>>>>>>
>>>>>> diff --git a/osaf/services/saf/amf/amfnd/clc.cc
>>>>>> b/osaf/services/saf/amf/amfnd/clc.cc
>>>>>> --- a/osaf/services/saf/amf/amfnd/clc.cc
>>>>>> +++ b/osaf/services/saf/amf/amfnd/clc.cc
>>>>>> @@ -909,6 +909,7 @@ uint32_t avnd_comp_clc_fsm_run(AVND_CB *
>>>>>>       TRACE_LEAVE2("%u", rc);
>>>>>>       return rc;
>>>>>>   }
>>>>>> +
>>>>>>
>>>>>> /****************************************************************************
>>>>>>
>>>>>>
>>>>>>
>>>>>>     Name          : avnd_comp_clc_st_chng_prc
>>>>>>    @@ -1486,6 +1487,12 @@ uint32_t avnd_comp_clc_st_chng_prc(AVND_
>>>>>>               (comp->su->pres == SA_AMF_PRESENCE_INSTANTIATED))
>>>>>>           rc = avnd_compdb_rec_del(cb, comp->name);
>>>>>>   +    if ((final_st == SA_AMF_PRESENCE_INSTANTIATED) &&
>>>>>> +            (comp->is_comp_immnd() == true)) {
>>>>>> +        AVND_EVT *evt = avnd_evt_create(cb, AVND_EVT_IMMND_COMP_UP,
>>>>>> 0, 0, 0, 0, 0);
>>>>>> +        if (evt)
>>>>>> +            rc = avnd_evt_send(cb, evt);
>>>>>> +    }
>>>>>>    done:
>>>>>>       TRACE_LEAVE2("%u", rc);
>>>>>>       return rc;
>>>>>> @@ -1553,6 +1560,10 @@ uint32_t avnd_comp_clc_uninst_inst_hdler
>>>>>>             /* transition to 'instantiating' state */
>>>>>>           avnd_comp_pres_state_set(cb, comp,
>>>>>> SA_AMF_PRESENCE_INSTANTIATING);
>>>>>> +        if (comp->is_comp_immnd() == true) {
>>>>>> +          TRACE("immnd is instantiating");
>>>>>> +          avnd_cb->imm_comp_state = AMFND_IMM_COMP_INSTANTIATING;
>>>>>> +        }
>>>>>>       }
>>>>>>     done:
>>>>>> diff --git a/osaf/services/saf/amf/amfnd/comp.cc
>>>>>> b/osaf/services/saf/amf/amfnd/comp.cc
>>>>>> --- a/osaf/services/saf/amf/amfnd/comp.cc
>>>>>> +++ b/osaf/services/saf/amf/amfnd/comp.cc
>>>>>> @@ -3016,3 +3016,16 @@ uint32_t avnd_amfa_mds_info_evh(AVND_CB
>>>>>>     return NCSCC_RC_SUCCESS;
>>>>>>   }
>>>>>>   +bool AVND_COMP::is_comp_immnd() const {
>>>>>> +  if ((su->is_ncs == true) &&
>>>>>> +    (name.find("safSg=NoRed,safApp=OpenSAF") !=
>>>>>> std::string::npos) &&
>>>>>> +    (name.find("safComp=IMMND") != std::string::npos))
>>>>>> +                return true;
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>> +uint32_t avnd_immnd_comp_evh(AVND_CB *cb, AVND_EVT *evt) {
>>>>>> +    avnd_cb->imm_comp_state = AMFND_IMM_COMP_UP;
>>>>>> +    TRACE("IMMND is now AMF component");
>>>>>> +    return NCSCC_RC_SUCCESS;
>>>>>> +}
>>>>>> diff --git a/osaf/services/saf/amf/amfnd/evt.cc
>>>>>> b/osaf/services/saf/amf/amfnd/evt.cc
>>>>>> --- a/osaf/services/saf/amf/amfnd/evt.cc
>>>>>> +++ b/osaf/services/saf/amf/amfnd/evt.cc
>>>>>> @@ -185,6 +185,8 @@ AVND_EVT *avnd_evt_create(AVND_CB *cb,
>>>>>>         case AVND_EVT_AMFA_MDS_VER_INFO:
>>>>>>           break;
>>>>>> +    case AVND_EVT_IMMND_COMP_UP:
>>>>>> +        break;
>>>>>>       default:
>>>>>>           delete evt;
>>>>>>           evt = nullptr;
>>>>>> @@ -316,6 +318,8 @@ void avnd_evt_destroy(AVND_EVT *evt)
>>>>>>           break;
>>>>>>       case AVND_EVT_AMFA_MDS_VER_INFO:
>>>>>>           break;
>>>>>> +    case AVND_EVT_IMMND_COMP_UP:
>>>>>> +        break;
>>>>>>         default:
>>>>>>           LOG_NO("%s: unknown event type %u", __FUNCTION__, type);
>>>>>> diff --git a/osaf/services/saf/amf/amfnd/include/avnd_cb.h
>>>>>> b/osaf/services/saf/amf/amfnd/include/avnd_cb.h
>>>>>> --- a/osaf/services/saf/amf/amfnd/include/avnd_cb.h
>>>>>> +++ b/osaf/services/saf/amf/amfnd/include/avnd_cb.h
>>>>>> @@ -34,6 +34,11 @@
>>>>>>   #define AVND_CB_H
>>>>>>   #include <map>
>>>>>>   +typedef enum {
>>>>>> +        AMFND_IMM_COMP_BASE = 1,
>>>>>> +        AMFND_IMM_COMP_INSTANTIATING = 2,
>>>>>> +        AMFND_IMM_COMP_UP = 3
>>>>>> +} AMFND_IMM_COMP_STATUS;
>>>>>>     typedef struct avnd_cb_tag {
>>>>>>       SYSF_MBX mbx;        /* mailbox on which AvND waits */
>>>>>> @@ -121,6 +126,7 @@ typedef struct avnd_cb_tag {
>>>>>>       SaTimeT scs_absence_max_duration;
>>>>>>       /* the timer for supervision of the absence of SC */
>>>>>>       AVND_TMR sc_absence_tmr;
>>>>>> +    AMFND_IMM_COMP_STATUS imm_comp_state;
>>>>>>   } AVND_CB;
>>>>>>     #define AVND_CB_NULL ((AVND_CB *)0)
>>>>>> diff --git a/osaf/services/saf/amf/amfnd/include/avnd_comp.h
>>>>>> b/osaf/services/saf/amf/amfnd/include/avnd_comp.h
>>>>>> --- a/osaf/services/saf/amf/amfnd/include/avnd_comp.h
>>>>>> +++ b/osaf/services/saf/amf/amfnd/include/avnd_comp.h
>>>>>> @@ -401,6 +401,7 @@ typedef struct avnd_comp_tag {
>>>>>>       std::bitset<NumAttrs> *use_comptype_attr;
>>>>>>       SaInvocationT term_cbq_inv_value; /* invocation value for
>>>>>> termination callback. */
>>>>>>       SaVersionT version; //SAF version of comp.
>>>>>> +    bool is_comp_immnd() const;
>>>>>>   } AVND_COMP;
>>>>>>     #define AVND_COMP_NULL ((AVND_COMP *)0)
>>>>>> diff --git a/osaf/services/saf/amf/amfnd/include/avnd_evt.h
>>>>>> b/osaf/services/saf/amf/amfnd/include/avnd_evt.h
>>>>>> --- a/osaf/services/saf/amf/amfnd/include/avnd_evt.h
>>>>>> +++ b/osaf/services/saf/amf/amfnd/include/avnd_evt.h
>>>>>> @@ -113,6 +113,7 @@ typedef enum avnd_evt_type {
>>>>>>       AVND_EVT_TMR_QSCING_CMPL,
>>>>>>       AVND_EVT_IR,
>>>>>>       AVND_EVT_AMFA_MDS_VER_INFO,
>>>>>> +    AVND_EVT_IMMND_COMP_UP,
>>>>>>         AVND_EVT_MAX
>>>>>>   } AVND_EVT_TYPE;
>>>>>> diff --git a/osaf/services/saf/amf/amfnd/include/avnd_proc.h
>>>>>> b/osaf/services/saf/amf/amfnd/include/avnd_proc.h
>>>>>> --- a/osaf/services/saf/amf/amfnd/include/avnd_proc.h
>>>>>> +++ b/osaf/services/saf/amf/amfnd/include/avnd_proc.h
>>>>>> @@ -118,5 +118,6 @@ uint32_t avnd_evt_tmr_avd_hb_duration_ev
>>>>>>   uint32_t avnd_evt_tmr_sc_absence_evh(struct avnd_cb_tag *, struct
>>>>>> avnd_evt_tag *);
>>>>>>   uint32_t avnd_evt_avd_reboot_evh(struct avnd_cb_tag *, struct
>>>>>> avnd_evt_tag *);
>>>>>>   uint32_t avnd_amfa_mds_info_evh(struct avnd_cb_tag *, struct
>>>>>> avnd_evt_tag *);
>>>>>> +uint32_t avnd_immnd_comp_evh(struct avnd_cb_tag *, struct
>>>>>> avnd_evt_tag *);
>>>>>>     #endif
>>>>>> diff --git a/osaf/services/saf/amf/amfnd/main.cc
>>>>>> b/osaf/services/saf/amf/amfnd/main.cc
>>>>>> --- a/osaf/services/saf/amf/amfnd/main.cc
>>>>>> +++ b/osaf/services/saf/amf/amfnd/main.cc
>>>>>> @@ -31,11 +31,15 @@
>>>>>>   #include "immutil.h"
>>>>>>   #include "logtrace.h"
>>>>>>   #include "nid_api.h"
>>>>>> +#include "osaf_time.h"
>>>>>>   #include <imm.h>
>>>>>>     #define FD_MBX   0
>>>>>>   #define FD_TERM  1
>>>>>>   #define FD_CLM   2
>>>>>> +#define FD_IMMND_FIFO 3
>>>>>> +#define FD_MAX 4
>>>>>> +
>>>>>>     static const char* internal_version_id_  __attribute__ ((used)) =
>>>>>> "@(#) $Id: " INTERNAL_VERSION_ID " $";
>>>>>>   @@ -121,7 +125,8 @@ extern const AVND_EVT_HDLR g_avnd_func_l
>>>>>>       avnd_evt_pid_exit_evh,    /* AVND_EVT_PID_EXIT */
>>>>>>       avnd_evt_tmr_qscing_cmpl_evh,    /* AVND_EVT_TMR_QSCING_CMPL */
>>>>>>       avnd_evt_ir_evh,    /* AVND_EVT_IR */
>>>>>> -    avnd_amfa_mds_info_evh /* AVND_EVT_AMFA_MDS_VER_INFO*/
>>>>>> +    avnd_amfa_mds_info_evh, /* AVND_EVT_AMFA_MDS_VER_INFO*/
>>>>>> +    avnd_immnd_comp_evh /* AVND_EVT_IMMND_COMP_UP*/
>>>>>>   };
>>>>>>     extern struct ImmutilWrapperProfile immutilWrapperProfile;
>>>>>> @@ -301,6 +306,7 @@ AVND_CB *avnd_cb_create()
>>>>>>       cb->hb_duration_tmr.is_active = false;
>>>>>>       cb->hb_duration_tmr.type = AVND_TMR_HB_DURATION;
>>>>>>       cb->hb_duration = AVSV_DEF_HB_DURATION;
>>>>>> +    cb->imm_comp_state = AMFND_IMM_COMP_BASE;
>>>>>>         if ((val = getenv("AVSV_HB_DURATION")) != nullptr) {
>>>>>>           cb->hb_duration = strtoll(val, nullptr, 0);
>>>>>> @@ -495,6 +501,30 @@ static void sigterm_handler(int sig)
>>>>>>       ncs_sel_obj_ind(&term_sel_obj);
>>>>>>       signal(SIGTERM, SIG_IGN);
>>>>>>   }
>>>>>> +//Thanks to Hans for above function in #1857, I just copied and
>>>>>> renamed it for using with IMM.
>>>>>> +static int open_immnd_fifo() {
>>>>>> +  const std::string fifo_dir = PKGLOCALSTATEDIR;
>>>>>> +  std::string fifo_file = fifo_dir + "/" + "osafimmnd.fifo";
>>>>>> +  int fifo_fd = -1;
>>>>>> +  int retry_cnt = 0;
>>>>>> +
>>>>>> +  if (access(fifo_file.c_str(), F_OK ) != -1 ) {
>>>>>> +    do {
>>>>>> +      if (retry_cnt > 0) {
>>>>>> +        osaf_nanosleep(&kHundredMilliseconds);
>>>>>> +      }
>>>>>> +      fifo_fd = open(fifo_file.c_str(), O_WRONLY|O_NONBLOCK);
>>>>>> +    } while ((fifo_fd == -1) &&
>>>>>> +            (retry_cnt++ < 5 && (errno == EINTR || errno ==
>>>>>> ENXIO)));
>>>>>> +    if (fifo_fd == -1) {
>>>>>> +      LOG_ER("Failed to open %s, error: %s", fifo_file.c_str(),
>>>>>> +             strerror(errno));
>>>>>> +    } else {
>>>>>> +      LOG_NO("Start monitoring IMMND using %s", fifo_file.c_str());
>>>>>> +    }
>>>>>> +  }
>>>>>> +  return fifo_fd;
>>>>>> +}
>>>>>>
>>>>>> /****************************************************************************
>>>>>>
>>>>>>
>>>>>>
>>>>>>     Name          : avnd_main_process
>>>>>> @@ -510,8 +540,9 @@ static void sigterm_handler(int sig)
>>>>>>   void avnd_main_process(void)
>>>>>>   {
>>>>>>       NCS_SEL_OBJ mbx_fd;
>>>>>> -    struct pollfd fds[4];
>>>>>> -    nfds_t nfds = 3;
>>>>>> +    int immnd_fifo_fd = -1;
>>>>>> +    struct pollfd fds[FD_MAX];
>>>>>> +    nfds_t nfds = FD_MAX;
>>>>>>       AVND_EVT *evt;
>>>>>>       SaAisErrorT result = SA_AIS_OK;
>>>>>>       SaAisErrorT rc = SA_AIS_OK;
>>>>>> @@ -542,6 +573,9 @@ void avnd_main_process(void)
>>>>>>       fds[FD_CLM].fd = avnd_cb->clm_sel_obj;
>>>>>>       fds[FD_CLM].events = POLLIN;
>>>>>>   +    immnd_fifo_fd = open_immnd_fifo();
>>>>>> +    fds[FD_IMMND_FIFO].fd = immnd_fifo_fd;
>>>>>> +    fds[FD_IMMND_FIFO].events = POLLIN;
>>>>>>       /* now wait forever */
>>>>>>       while (1) {
>>>>>>           int ret = poll(fds, nfds, -1);
>>>>>> @@ -554,6 +588,29 @@ void avnd_main_process(void)
>>>>>>               LOG_ER("%s: poll failed - %s", __FUNCTION__,
>>>>>> strerror(errno));
>>>>>>               break;
>>>>>>           }
>>>>>> +         if (fds[FD_IMMND_FIFO].revents & POLLERR) {
>>>>>> +            TRACE_1("IMMND crashed");
>>>>>> +            if (!(m_AVND_IS_SHUTTING_DOWN(avnd_cb)) &&
>>>>>> +                (avnd_cb->imm_comp_state == AMFND_IMM_COMP_BASE) &&
>>>>>> +                (avnd_cb->led_state == AVND_LED_STATE_GREEN)) {
>>>>>> +                /* AMFND is last NID started service. If AMFND has
>>>>>> responded to NID
>>>>>> +                   and IMMND has not become AMF component then
>>>>>> reboot
>>>>>> the node because
>>>>>> +                   AMFND does not have configuration to
>>>>>> re-instantiate immnd.
>>>>>> +                   If IMM has become AMF component then AMF will
>>>>>> restart it.
>>>>>> +                   If AMFND has not responded to NID, then it will
>>>>>> take care of IMMND process.
>>>>>> +                   Note:This is overlapping region as AMFND responds
>>>>>> to NID only when all
>>>>>> +                        NoRed components comes in AMF control.
>>>>>> May be
>>>>>> used in future only.
>>>>>> +                */
>>>>>> +                LOG_ER("IMMND crashed and has not become AMF
>>>>>> component. Rebooting node");
>>>>>> +                opensaf_reboot(avnd_cb->node_info.nodeId,
>>>>>> +
>>>>>> osaf_extended_name_borrow(&avnd_cb->node_info.executionEnvironment),
>>>>>> +                        "IMMND crashed and has not become AMF
>>>>>> component. Rebooting node");
>>>>>> +                exit(0);
>>>>>> +            }
>>>>>> +            //It means IMM has become AMF component, poll for IMMND
>>>>>> when it gets started.
>>>>>> +            nfds = FD_MAX - 1;
>>>>>> +            fds[FD_IMMND_FIFO].fd =  -1;
>>>>>> +                }
>>>>>>             if (avnd_cb->clmHandle && (fds[FD_CLM].revents &
>>>>>> POLLIN)) {
>>>>>>               //LOG_NO("DEBUG-> CLM event fd: %d sel_obj: %llu, clm
>>>>>> handle: %llu", fds[FD_CLM].fd, avnd_cb->clm_sel_obj,
>>>>>> avnd_cb->clmHandle);
>>>>>> @@ -573,8 +630,15 @@ void avnd_main_process(void)
>>>>>>           }
>>>>>>             if (fds[FD_MBX].revents & POLLIN) {
>>>>>> -            while (nullptr != (evt = (AVND_EVT
>>>>>> *)ncs_ipc_non_blk_recv(&avnd_cb->mbx)))
>>>>>> +            while (nullptr != (evt = (AVND_EVT
>>>>>> *)ncs_ipc_non_blk_recv(&avnd_cb->mbx))) {
>>>>>> +                if (evt->type == AVND_EVT_IMMND_COMP_UP) {
>>>>>> +                    immnd_fifo_fd = open_immnd_fifo();
>>>>>> +                    fds[FD_IMMND_FIFO].fd = immnd_fifo_fd;
>>>>>> +                    fds[FD_IMMND_FIFO].events = POLLIN;
>>>>>> +                    nfds = FD_MAX;
>>>>>> +                }
>>>>>>                   avnd_evt_process(evt);
>>>>>> +            }
>>>>>>           }
>>>>>>             if (fds[FD_TERM].revents & POLLIN) {
>>>>>
>>>
>

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to