Hi Praveen, please see inline with [HansN].

/Thanks HansN


On 12/16/2016 11:13 AM, praveen malviya wrote:
> Hi Hans,
>
> Please see inline with [Praveen].
>
> Thanks,
> Praveen
>
> On 16-Dec-16 3:14 PM, Hans Nordeback wrote:
>> Hi Praveen,
>>
>> please see some comments/questions inline below.
>>
>> /Thanks HansN
>>
>>
>> On 12/16/2016 07:42 AM, praveen malviya wrote:
>>> Hi Hans,
>>>
>>> Currently, AMFND responds to NID when all MW components are in AMF
>>> control. So there is window.
>> [HansN] you mean 'no window'?
> [Praveen] Sorry I missed 'no'. It is indeed 'no window'.
>>> I have explicitly mentioned in the patch below (see Note in main() in
>>> amfnd.cc below), that this may be usable only in future.
>>> However there is one very weak case, when AMFND is just starting NoRed
>>> components like CPD etc and immnd has still not become AMF component
>>> and also AMFND has not responded to NID. Now at this stage if immnd
>>> process crashes then node will reboot only after CPD instantiation
>>> timer expires which is about 10 seconds.
>> [HansN] with #2204 v2 nid will detect that immnd has crashed and it is
>> configurable in the opensafd script to reboot the node or not at
>> nid failure. If amfnd, with this patch, also detects that immnd has
>> crashed, there may be a raise between amfnd and nid,
>> where amfnd performs a reboot while nid may be configured to not reboot?
> [Praveen] I agree with this. But this puts a restriction on NID 
> recovery that its recovery can be reboot only because if NID wants to 
> restart IMMND process again, then all other MW processes needs to be 
> examined to take this into account and be ready for re-initialization 
> with IMM in case the restarted IMM does not resurrect old handles. Not 
> only handles, there can be other resources also.

[HansN] the configurable restart of e.g. IMMND will not be done after 
nid notify. There will be no restart attempts of e.g IMMND after

nid notify. The FIFOmonitoring will not restart the service, but exit.

> Since #2204 is a defect so, I think, enhance capabilities can be 
> ignored. In that case #2158 becomes almost a duplicate of #2204.
>
>>> But this patch will do an immediate reboot. In this case it can be
>>> argued that NID can also take action but I think since this is the
>>> last service NID will also act very lately. Here other NoRed
>>> components instantiation will fail as, I think, they want to use IMM
>>> which is not available.
>>> I will see if something can be done to make IMM as first Nored comp to
>>> get instantiated using compinstantiationlevel. In that case this patch
>>> may not be needed.
>>>
>>>
>>> Thanks,
>>> Praveen
>>>
>>> On 15-Dec-16 7:45 PM, Hans Nordeback wrote:
>>>> Hi Praveen,
>>>>
>>>> I'll review this ticket tomorrow, but one question in advance, what is
>>>> the use case for this ticket?
>>>>
>>>> Looking at the ticket description, the use case seems to be immnd
>>>> crashes during the nid phase,
>>>>
>>>> but that use case is handled by ticket #2204. Is there a "window" 
>>>> after
>>>> amfnd has called "nid_notify" where
>>>>
>>>> immnd is not monitored by amf?  /Thanks HansN
>>>>
>>>>
>>>> On 12/13/2016 11:48 AM, praveen.malv...@oracle.com wrote:
>>>>> osaf/services/saf/amf/amfnd/clc.cc |  11 +++
>>>>>   osaf/services/saf/amf/amfnd/comp.cc             |  13 ++++
>>>>>   osaf/services/saf/amf/amfnd/evt.cc              |   4 +
>>>>>   osaf/services/saf/amf/amfnd/include/avnd_cb.h   |   6 ++
>>>>>   osaf/services/saf/amf/amfnd/include/avnd_comp.h |   1 +
>>>>>   osaf/services/saf/amf/amfnd/include/avnd_evt.h  |   1 +
>>>>>   osaf/services/saf/amf/amfnd/include/avnd_proc.h |   1 +
>>>>>   osaf/services/saf/amf/amfnd/main.cc             |  72
>>>>> +++++++++++++++++++++++-
>>>>>   8 files changed, 105 insertions(+), 4 deletions(-)
>>>>>
>>>>>
>>>>> If IMMND dies before it becomes AMF component there is no entity to
>>>>> restart it.
>>>>> With ticket #2204, NID will monitor using existing FIFO each of the
>>>>> started process until it itself exits.
>>>>> This patch will also monitor IMMND using FIFO.
>>>>> If IMMND dies before becoming AMF component and after NID exit then
>>>>> with this patch AMFND will
>>>>> reboot the node because AMFND does not have configuration to restart
>>>>> IMMND.
>>>>>
>>>>> diff --git a/osaf/services/saf/amf/amfnd/clc.cc
>>>>> b/osaf/services/saf/amf/amfnd/clc.cc
>>>>> --- a/osaf/services/saf/amf/amfnd/clc.cc
>>>>> +++ b/osaf/services/saf/amf/amfnd/clc.cc
>>>>> @@ -909,6 +909,7 @@ uint32_t avnd_comp_clc_fsm_run(AVND_CB *
>>>>>       TRACE_LEAVE2("%u", rc);
>>>>>       return rc;
>>>>>   }
>>>>> +
>>>>>
>>>>> /****************************************************************************
>>>>>  
>>>>>
>>>>>
>>>>>
>>>>>     Name          : avnd_comp_clc_st_chng_prc
>>>>>    @@ -1486,6 +1487,12 @@ uint32_t avnd_comp_clc_st_chng_prc(AVND_
>>>>>               (comp->su->pres == SA_AMF_PRESENCE_INSTANTIATED))
>>>>>           rc = avnd_compdb_rec_del(cb, comp->name);
>>>>>   +    if ((final_st == SA_AMF_PRESENCE_INSTANTIATED) &&
>>>>> +            (comp->is_comp_immnd() == true)) {
>>>>> +        AVND_EVT *evt = avnd_evt_create(cb, AVND_EVT_IMMND_COMP_UP,
>>>>> 0, 0, 0, 0, 0);
>>>>> +        if (evt)
>>>>> +            rc = avnd_evt_send(cb, evt);
>>>>> +    }
>>>>>    done:
>>>>>       TRACE_LEAVE2("%u", rc);
>>>>>       return rc;
>>>>> @@ -1553,6 +1560,10 @@ uint32_t avnd_comp_clc_uninst_inst_hdler
>>>>>             /* transition to 'instantiating' state */
>>>>>           avnd_comp_pres_state_set(cb, comp,
>>>>> SA_AMF_PRESENCE_INSTANTIATING);
>>>>> +        if (comp->is_comp_immnd() == true) {
>>>>> +          TRACE("immnd is instantiating");
>>>>> +          avnd_cb->imm_comp_state = AMFND_IMM_COMP_INSTANTIATING;
>>>>> +        }
>>>>>       }
>>>>>     done:
>>>>> diff --git a/osaf/services/saf/amf/amfnd/comp.cc
>>>>> b/osaf/services/saf/amf/amfnd/comp.cc
>>>>> --- a/osaf/services/saf/amf/amfnd/comp.cc
>>>>> +++ b/osaf/services/saf/amf/amfnd/comp.cc
>>>>> @@ -3016,3 +3016,16 @@ uint32_t avnd_amfa_mds_info_evh(AVND_CB
>>>>>     return NCSCC_RC_SUCCESS;
>>>>>   }
>>>>>   +bool AVND_COMP::is_comp_immnd() const {
>>>>> +  if ((su->is_ncs == true) &&
>>>>> +    (name.find("safSg=NoRed,safApp=OpenSAF") != 
>>>>> std::string::npos) &&
>>>>> +    (name.find("safComp=IMMND") != std::string::npos))
>>>>> +                return true;
>>>>> +  return false;
>>>>> +}
>>>>> +
>>>>> +uint32_t avnd_immnd_comp_evh(AVND_CB *cb, AVND_EVT *evt) {
>>>>> +    avnd_cb->imm_comp_state = AMFND_IMM_COMP_UP;
>>>>> +    TRACE("IMMND is now AMF component");
>>>>> +    return NCSCC_RC_SUCCESS;
>>>>> +}
>>>>> diff --git a/osaf/services/saf/amf/amfnd/evt.cc
>>>>> b/osaf/services/saf/amf/amfnd/evt.cc
>>>>> --- a/osaf/services/saf/amf/amfnd/evt.cc
>>>>> +++ b/osaf/services/saf/amf/amfnd/evt.cc
>>>>> @@ -185,6 +185,8 @@ AVND_EVT *avnd_evt_create(AVND_CB *cb,
>>>>>         case AVND_EVT_AMFA_MDS_VER_INFO:
>>>>>           break;
>>>>> +    case AVND_EVT_IMMND_COMP_UP:
>>>>> +        break;
>>>>>       default:
>>>>>           delete evt;
>>>>>           evt = nullptr;
>>>>> @@ -316,6 +318,8 @@ void avnd_evt_destroy(AVND_EVT *evt)
>>>>>           break;
>>>>>       case AVND_EVT_AMFA_MDS_VER_INFO:
>>>>>           break;
>>>>> +    case AVND_EVT_IMMND_COMP_UP:
>>>>> +        break;
>>>>>         default:
>>>>>           LOG_NO("%s: unknown event type %u", __FUNCTION__, type);
>>>>> diff --git a/osaf/services/saf/amf/amfnd/include/avnd_cb.h
>>>>> b/osaf/services/saf/amf/amfnd/include/avnd_cb.h
>>>>> --- a/osaf/services/saf/amf/amfnd/include/avnd_cb.h
>>>>> +++ b/osaf/services/saf/amf/amfnd/include/avnd_cb.h
>>>>> @@ -34,6 +34,11 @@
>>>>>   #define AVND_CB_H
>>>>>   #include <map>
>>>>>   +typedef enum {
>>>>> +        AMFND_IMM_COMP_BASE = 1,
>>>>> +        AMFND_IMM_COMP_INSTANTIATING = 2,
>>>>> +        AMFND_IMM_COMP_UP = 3
>>>>> +} AMFND_IMM_COMP_STATUS;
>>>>>     typedef struct avnd_cb_tag {
>>>>>       SYSF_MBX mbx;        /* mailbox on which AvND waits */
>>>>> @@ -121,6 +126,7 @@ typedef struct avnd_cb_tag {
>>>>>       SaTimeT scs_absence_max_duration;
>>>>>       /* the timer for supervision of the absence of SC */
>>>>>       AVND_TMR sc_absence_tmr;
>>>>> +    AMFND_IMM_COMP_STATUS imm_comp_state;
>>>>>   } AVND_CB;
>>>>>     #define AVND_CB_NULL ((AVND_CB *)0)
>>>>> diff --git a/osaf/services/saf/amf/amfnd/include/avnd_comp.h
>>>>> b/osaf/services/saf/amf/amfnd/include/avnd_comp.h
>>>>> --- a/osaf/services/saf/amf/amfnd/include/avnd_comp.h
>>>>> +++ b/osaf/services/saf/amf/amfnd/include/avnd_comp.h
>>>>> @@ -401,6 +401,7 @@ typedef struct avnd_comp_tag {
>>>>>       std::bitset<NumAttrs> *use_comptype_attr;
>>>>>       SaInvocationT term_cbq_inv_value; /* invocation value for
>>>>> termination callback. */
>>>>>       SaVersionT version; //SAF version of comp.
>>>>> +    bool is_comp_immnd() const;
>>>>>   } AVND_COMP;
>>>>>     #define AVND_COMP_NULL ((AVND_COMP *)0)
>>>>> diff --git a/osaf/services/saf/amf/amfnd/include/avnd_evt.h
>>>>> b/osaf/services/saf/amf/amfnd/include/avnd_evt.h
>>>>> --- a/osaf/services/saf/amf/amfnd/include/avnd_evt.h
>>>>> +++ b/osaf/services/saf/amf/amfnd/include/avnd_evt.h
>>>>> @@ -113,6 +113,7 @@ typedef enum avnd_evt_type {
>>>>>       AVND_EVT_TMR_QSCING_CMPL,
>>>>>       AVND_EVT_IR,
>>>>>       AVND_EVT_AMFA_MDS_VER_INFO,
>>>>> +    AVND_EVT_IMMND_COMP_UP,
>>>>>         AVND_EVT_MAX
>>>>>   } AVND_EVT_TYPE;
>>>>> diff --git a/osaf/services/saf/amf/amfnd/include/avnd_proc.h
>>>>> b/osaf/services/saf/amf/amfnd/include/avnd_proc.h
>>>>> --- a/osaf/services/saf/amf/amfnd/include/avnd_proc.h
>>>>> +++ b/osaf/services/saf/amf/amfnd/include/avnd_proc.h
>>>>> @@ -118,5 +118,6 @@ uint32_t avnd_evt_tmr_avd_hb_duration_ev
>>>>>   uint32_t avnd_evt_tmr_sc_absence_evh(struct avnd_cb_tag *, struct
>>>>> avnd_evt_tag *);
>>>>>   uint32_t avnd_evt_avd_reboot_evh(struct avnd_cb_tag *, struct
>>>>> avnd_evt_tag *);
>>>>>   uint32_t avnd_amfa_mds_info_evh(struct avnd_cb_tag *, struct
>>>>> avnd_evt_tag *);
>>>>> +uint32_t avnd_immnd_comp_evh(struct avnd_cb_tag *, struct
>>>>> avnd_evt_tag *);
>>>>>     #endif
>>>>> diff --git a/osaf/services/saf/amf/amfnd/main.cc
>>>>> b/osaf/services/saf/amf/amfnd/main.cc
>>>>> --- a/osaf/services/saf/amf/amfnd/main.cc
>>>>> +++ b/osaf/services/saf/amf/amfnd/main.cc
>>>>> @@ -31,11 +31,15 @@
>>>>>   #include "immutil.h"
>>>>>   #include "logtrace.h"
>>>>>   #include "nid_api.h"
>>>>> +#include "osaf_time.h"
>>>>>   #include <imm.h>
>>>>>     #define FD_MBX   0
>>>>>   #define FD_TERM  1
>>>>>   #define FD_CLM   2
>>>>> +#define FD_IMMND_FIFO 3
>>>>> +#define FD_MAX 4
>>>>> +
>>>>>     static const char* internal_version_id_  __attribute__ ((used)) =
>>>>> "@(#) $Id: " INTERNAL_VERSION_ID " $";
>>>>>   @@ -121,7 +125,8 @@ extern const AVND_EVT_HDLR g_avnd_func_l
>>>>>       avnd_evt_pid_exit_evh,    /* AVND_EVT_PID_EXIT */
>>>>>       avnd_evt_tmr_qscing_cmpl_evh,    /* AVND_EVT_TMR_QSCING_CMPL */
>>>>>       avnd_evt_ir_evh,    /* AVND_EVT_IR */
>>>>> -    avnd_amfa_mds_info_evh /* AVND_EVT_AMFA_MDS_VER_INFO*/
>>>>> +    avnd_amfa_mds_info_evh, /* AVND_EVT_AMFA_MDS_VER_INFO*/
>>>>> +    avnd_immnd_comp_evh /* AVND_EVT_IMMND_COMP_UP*/
>>>>>   };
>>>>>     extern struct ImmutilWrapperProfile immutilWrapperProfile;
>>>>> @@ -301,6 +306,7 @@ AVND_CB *avnd_cb_create()
>>>>>       cb->hb_duration_tmr.is_active = false;
>>>>>       cb->hb_duration_tmr.type = AVND_TMR_HB_DURATION;
>>>>>       cb->hb_duration = AVSV_DEF_HB_DURATION;
>>>>> +    cb->imm_comp_state = AMFND_IMM_COMP_BASE;
>>>>>         if ((val = getenv("AVSV_HB_DURATION")) != nullptr) {
>>>>>           cb->hb_duration = strtoll(val, nullptr, 0);
>>>>> @@ -495,6 +501,30 @@ static void sigterm_handler(int sig)
>>>>>       ncs_sel_obj_ind(&term_sel_obj);
>>>>>       signal(SIGTERM, SIG_IGN);
>>>>>   }
>>>>> +//Thanks to Hans for above function in #1857, I just copied and
>>>>> renamed it for using with IMM.
>>>>> +static int open_immnd_fifo() {
>>>>> +  const std::string fifo_dir = PKGLOCALSTATEDIR;
>>>>> +  std::string fifo_file = fifo_dir + "/" + "osafimmnd.fifo";
>>>>> +  int fifo_fd = -1;
>>>>> +  int retry_cnt = 0;
>>>>> +
>>>>> +  if (access(fifo_file.c_str(), F_OK ) != -1 ) {
>>>>> +    do {
>>>>> +      if (retry_cnt > 0) {
>>>>> +        osaf_nanosleep(&kHundredMilliseconds);
>>>>> +      }
>>>>> +      fifo_fd = open(fifo_file.c_str(), O_WRONLY|O_NONBLOCK);
>>>>> +    } while ((fifo_fd == -1) &&
>>>>> +            (retry_cnt++ < 5 && (errno == EINTR || errno == 
>>>>> ENXIO)));
>>>>> +    if (fifo_fd == -1) {
>>>>> +      LOG_ER("Failed to open %s, error: %s", fifo_file.c_str(),
>>>>> +             strerror(errno));
>>>>> +    } else {
>>>>> +      LOG_NO("Start monitoring IMMND using %s", fifo_file.c_str());
>>>>> +    }
>>>>> +  }
>>>>> +  return fifo_fd;
>>>>> +}
>>>>>
>>>>> /****************************************************************************
>>>>>  
>>>>>
>>>>>
>>>>>
>>>>>     Name          : avnd_main_process
>>>>> @@ -510,8 +540,9 @@ static void sigterm_handler(int sig)
>>>>>   void avnd_main_process(void)
>>>>>   {
>>>>>       NCS_SEL_OBJ mbx_fd;
>>>>> -    struct pollfd fds[4];
>>>>> -    nfds_t nfds = 3;
>>>>> +    int immnd_fifo_fd = -1;
>>>>> +    struct pollfd fds[FD_MAX];
>>>>> +    nfds_t nfds = FD_MAX;
>>>>>       AVND_EVT *evt;
>>>>>       SaAisErrorT result = SA_AIS_OK;
>>>>>       SaAisErrorT rc = SA_AIS_OK;
>>>>> @@ -542,6 +573,9 @@ void avnd_main_process(void)
>>>>>       fds[FD_CLM].fd = avnd_cb->clm_sel_obj;
>>>>>       fds[FD_CLM].events = POLLIN;
>>>>>   +    immnd_fifo_fd = open_immnd_fifo();
>>>>> +    fds[FD_IMMND_FIFO].fd = immnd_fifo_fd;
>>>>> +    fds[FD_IMMND_FIFO].events = POLLIN;
>>>>>       /* now wait forever */
>>>>>       while (1) {
>>>>>           int ret = poll(fds, nfds, -1);
>>>>> @@ -554,6 +588,29 @@ void avnd_main_process(void)
>>>>>               LOG_ER("%s: poll failed - %s", __FUNCTION__,
>>>>> strerror(errno));
>>>>>               break;
>>>>>           }
>>>>> +         if (fds[FD_IMMND_FIFO].revents & POLLERR) {
>>>>> +            TRACE_1("IMMND crashed");
>>>>> +            if (!(m_AVND_IS_SHUTTING_DOWN(avnd_cb)) &&
>>>>> +                (avnd_cb->imm_comp_state == AMFND_IMM_COMP_BASE) &&
>>>>> +                (avnd_cb->led_state == AVND_LED_STATE_GREEN)) {
>>>>> +                /* AMFND is last NID started service. If AMFND has
>>>>> responded to NID
>>>>> +                   and IMMND has not become AMF component then 
>>>>> reboot
>>>>> the node because
>>>>> +                   AMFND does not have configuration to
>>>>> re-instantiate immnd.
>>>>> +                   If IMM has become AMF component then AMF will
>>>>> restart it.
>>>>> +                   If AMFND has not responded to NID, then it will
>>>>> take care of IMMND process.
>>>>> +                   Note:This is overlapping region as AMFND responds
>>>>> to NID only when all
>>>>> +                        NoRed components comes in AMF control. 
>>>>> May be
>>>>> used in future only.
>>>>> +                */
>>>>> +                LOG_ER("IMMND crashed and has not become AMF
>>>>> component. Rebooting node");
>>>>> + opensaf_reboot(avnd_cb->node_info.nodeId,
>>>>> +
>>>>> osaf_extended_name_borrow(&avnd_cb->node_info.executionEnvironment),
>>>>> +                        "IMMND crashed and has not become AMF
>>>>> component. Rebooting node");
>>>>> +                exit(0);
>>>>> +            }
>>>>> +            //It means IMM has become AMF component, poll for IMMND
>>>>> when it gets started.
>>>>> +            nfds = FD_MAX - 1;
>>>>> +            fds[FD_IMMND_FIFO].fd =  -1;
>>>>> +                }
>>>>>             if (avnd_cb->clmHandle && (fds[FD_CLM].revents & 
>>>>> POLLIN)) {
>>>>>               //LOG_NO("DEBUG-> CLM event fd: %d sel_obj: %llu, clm
>>>>> handle: %llu", fds[FD_CLM].fd, avnd_cb->clm_sel_obj,
>>>>> avnd_cb->clmHandle);
>>>>> @@ -573,8 +630,15 @@ void avnd_main_process(void)
>>>>>           }
>>>>>             if (fds[FD_MBX].revents & POLLIN) {
>>>>> -            while (nullptr != (evt = (AVND_EVT
>>>>> *)ncs_ipc_non_blk_recv(&avnd_cb->mbx)))
>>>>> +            while (nullptr != (evt = (AVND_EVT
>>>>> *)ncs_ipc_non_blk_recv(&avnd_cb->mbx))) {
>>>>> +                if (evt->type == AVND_EVT_IMMND_COMP_UP) {
>>>>> +                    immnd_fifo_fd = open_immnd_fifo();
>>>>> +                    fds[FD_IMMND_FIFO].fd = immnd_fifo_fd;
>>>>> +                    fds[FD_IMMND_FIFO].events = POLLIN;
>>>>> +                    nfds = FD_MAX;
>>>>> +                }
>>>>>                   avnd_evt_process(evt);
>>>>> +            }
>>>>>           }
>>>>>             if (fds[FD_TERM].revents & POLLIN) {
>>>>
>>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to