Comments inline:

> -----Original Message-----
> From: Hans Feldt [mailto:hans.fe...@ericsson.com]
> Sent: Monday, November 18, 2013 3:09 PM
> To: SuryaNarayana Garlapati
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [devel] [PATCH 1 of 1] amfnd: Reboot payload when link
> between Controller and Payload flickers [#600]
> 
> 
> > -----Original Message-----
> > From: SuryaNarayana Garlapati
> > [mailto:suryanarayana.garlap...@oracle.com]
> > Sent: den 18 november 2013 10:19
> > To: Hans Feldt
> > Cc: opensaf-devel@lists.sourceforge.net
> > Subject: Re: [devel] [PATCH 1 of 1] amfnd: Reboot payload when link
> > between Controller and Payload flickers [#600]
> >
> >
> > On Monday 18 November 2013 02:18 PM, Hans Feldt wrote:
> > > I don't think you have even read what I wrote below...
> > [Surya] Well, your guess is wrong, but that does not hurt. My
> > question(response) is still valid.
> 
> Sorry for that. Fact is when this happened in real system, only the TIPC link 
> to
> active controller was reset. The link to the standby controller stayed up thus
> the MDS NODE DOWN event never occurred.
> 
> I think if possible the payload could handle this somehow by "self fencing" as
> proposed by this patch.

[Mathi] 

Well, I think this patch is to do this "self fencing". I don't have a problem 
with the patch.
The bigger problem is whether this patch can be tested at all!!!
If we could reproduce the scenario and test this patch, then I ACK this patch.

Cheers,
Mathi.


> But "real" fencing should happen by CLM which it does not seem to do if you
> read my other reply. I think we some architectural issues here that needs to
> be corrected.
> 
> /Hans
> 
> >
> > >>As I said this problem originates from a low memory condition on the
> payload that results in TIPC on that payload resets the link(s).
> >
> > If the memory condition is low , then it will effect all the Link(s) and the
> Payload will resets all the Link(s).
> > Definitely MDS will deliver the DOWN for AMFD. AMFND will then reboot
> the Node.
> >
> > -Surya
> >
> > >
> > >> -----Original Message-----
> > >> From: SuryaNarayana Garlapati
> > >> [mailto:suryanarayana.garlap...@oracle.com]
> > >> Sent: den 18 november 2013 09:41
> > >> To: Nagendra Kumar; Hans Feldt; Hans Nordebäck; Praveen Malviya;
> > >> Mathivanan Naickan Palanivelu
> > >> Cc: opensaf-devel@lists.sourceforge.net
> > >> Subject: Re: [devel] [PATCH 1 of 1] amfnd: Reboot payload when link
> > >> between Controller and Payload flickers [#600]
> > >>
> > >> one basic question:
> > >> If the bearer is dual, tipc flickering on one link should be
> > >> transparent and there  will be no effect to applications as the
> > >> other bearer is still alive and healthy.
> > >> If still the TIPC DOWN's are delivered to the applications, then
> > >> this is purely a protocol problem in TIPC.
> > >>
> > >> If the bearer is single, automatically the AMFND will reboot self
> > >> on Link flap.
> > >>
> > >> So i dont see a practical problem.
> > >>
> > >>
> > >> On Monday 18 November 2013 02:04 PM, Nagendra Kumar wrote:
> > >>>>> Does amfd try to fence the payload when this happens?
> > >>> Amfd reset Amfnd information, but Amfnd lives like an orphan as
> Amfd will not entertain any requests from Amfnd.
> > >>>
> > >>> Thanks
> > >>> -Nagu
> > >>>
> > >>> -----Original Message-----
> > >>> From: Hans Feldt [mailto:hans.fe...@ericsson.com]
> > >>> Sent: 18 November 2013 12:56
> > >>> To: Nagendra Kumar; Suryanarayana Garlapati; Hans Nordebäck;
> > >>> Praveen Malviya; Mathivanan Naickan Palanivelu
> > >>> Cc: opensaf-devel@lists.sourceforge.net
> > >>> Subject: RE: [devel] [PATCH 1 of 1] amfnd: Reboot payload when
> > >>> link between Controller and Payload flickers [#600]
> > >>>
> > >>> As I said this problem originates from a low memory condition on
> > >>> the payload that results in TIPC on that payload resets the
> > link(s).
> > >> Once TIPC links are re-established, the AMF cluster cannot be
> > >> re-established. I think TIPC has now been changed to handle this
> > better
> > >> so the likeliness of this to happen will be dramatically reduced.
> > >>> Other triggers for this could be long network latencies as can happen
> running virtualized without quality of service guarantees.
> > >>>
> > >>> This problem is of course part of the bigger problem of a too simplistic
> cluster management in OpenSAF.
> > >>>
> > >>> Does amfd try to fence the payload when this happens?
> > >>>
> > >>> /Hans
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Nagendra Kumar [mailto:nagendr...@oracle.com]
> > >>>> Sent: den 18 november 2013 07:57
> > >>>> To: Suryanarayana Garlapati; Hans Feldt; Hans Nordebäck; Praveen
> > >>>> Malviya; Mathivanan Naickan Palanivelu
> > >>>> Cc: opensaf-devel@lists.sourceforge.net
> > >>>> Subject: RE: [devel] [PATCH 1 of 1] amfnd: Reboot payload when
> > >>>> link between Controller and Payload flickers [#600]
> > >>>>
> > >>>> Hi Hans,
> > >>>>        Any response ?
> > >>>>
> > >>>> -Nagu
> > >>>> -----Original Message-----
> > >>>> From: Nagendra Kumar
> > >>>> Sent: 15 November 2013 14:39
> > >>>> To: Suryanarayana Garlapati; hans.fe...@ericsson.com;
> > >>>> hans.nordeb...@ericsson.com; Praveen Malviya; Mathivanan
> Naickan
> > >>>> Palanivelu
> > >>>> Cc: opensaf-devel@lists.sourceforge.net
> > >>>> Subject: Re: [devel] [PATCH 1 of 1] amfnd: Reboot payload when
> > >>>> link between Controller and Payload flickers [#600]
> > >>>>
> > >>>> Hi,
> > >>>>
> > >>>> I checked and it looks no easy possibility for detecting link
> > >>>> loss at controller AvD as it gets AvND down and Avnd UP in the
> > >>>> tipc
> > >> flicker.
> > >>>> This is the same even when payload goes down and rejoins. Only
> > >>>> time difference is the differentiator between these two
> > scenario.
> > >>>>
> > >>>> Hans: Can you please check any other possibility at Amfd to detect
> tipc flicker?
> > >>>>
> > >>>> Thanks
> > >>>> -Nagu
> > >>>> -----Original Message-----
> > >>>> From: Suryanarayana Garlapati
> > >>>> Sent: 22 October 2013 14:41
> > >>>> To: Nagendra Kumar; hans.fe...@ericsson.com;
> > >>>> hans.nordeb...@ericsson.com; Praveen Malviya; Mathivanan
> Naickan
> > >>>> Palanivelu
> > >>>> Cc: opensaf-devel@lists.sourceforge.net
> > >>>> Subject: Re: [PATCH 1 of 1] amfnd: Reboot payload when link
> > >>>> between Controller and Payload flickers [#600]
> > >>>>
> > >>>> Hi Nagu,
> > >>>> What is the discrimination point that the link flap has occurred
> > >>>> with only one PLD? The chances of getting link flap with one
> > >>>> Payload is less when compared wiht link flap with all the
> > >>>> payloads. With my suggestion, there will be a failover. but with
> > >>>> the present patch if a link flap happens with the payload nodes,
> > >>>> all the payload
> > nodes
> > >> will go for reboot. So considering in total, i guess we should reboot the
> Active controller only.
> > >>>> Regards
> > >>>> Surya
> > >>>>
> > >>>>
> > >>>> On Monday 21 October 2013 06:40 PM, Nagendra Kumar wrote:
> > >>>>> Hi Surya,
> > >>>>>
> > >>>>> The problem I see with the approach is : because of problem in
> > >>>>> payload, others payload is impacted because of Act controller
> > >>>> failover.
> > >>>>> Thanks
> > >>>>> -Nagu
> > >>>>> -----Original Message-----
> > >>>>> From: Suryanarayana Garlapati
> > >>>>> Sent: 21 October 2013 18:22
> > >>>>> To: Nagendra Kumar; hans.fe...@ericsson.com;
> > >>>>> hans.nordeb...@ericsson.com; Praveen Malviya; Mathivanan
> Naickan
> > >>>>> Palanivelu
> > >>>>> Cc: opensaf-devel@lists.sourceforge.net
> > >>>>> Subject: Re: [PATCH 1 of 1] amfnd: Reboot payload when link
> > >>>>> between Controller and Payload flickers [#600]
> > >>>>>
> > >>>>> Hi Nagu,
> > >>>>> I am not comfortable with this approach.
> > >>>>> I think its better to reboot the active controller if link flaps
> > >>>>> and not the payload node. If the link flaps between the active
> > >>>>> controller
> > >>>> and payload nodes, then there will total payload cluster reset which
> we can avoid by just rebooting the active controller.
> > >>>>> Thoughts?
> > >>>>>
> > >>>>> Regards
> > >>>>> Surya
> > >>>>>
> > >>>>> On Monday 21 October 2013 05:03 PM, nagendr...@oracle.com
> wrote:
> > >>>>>>      osaf/services/saf/amf/amfnd/di.cc             |  13 
> > >>>>>> +++++++++----
> > >>>>>>      osaf/services/saf/amf/amfnd/include/avnd_cb.h |   1 +
> > >>>>>>      osaf/services/saf/amf/amfnd/mds.cc            |  11 +++++++++++
> > >>>>>>      3 files changed, 21 insertions(+), 4 deletions(-)
> > >>>>>>
> > >>>>>>
> > >>>>>> diff --git a/osaf/services/saf/amf/amfnd/di.cc
> > >>>>>> b/osaf/services/saf/amf/amfnd/di.cc
> > >>>>>> --- a/osaf/services/saf/amf/amfnd/di.cc
> > >>>>>> +++ b/osaf/services/saf/amf/amfnd/di.cc
> > >>>>>> @@ -437,13 +437,18 @@ uint32_t
> avnd_evt_mds_avd_dn_evh(AVND_CB
> > >>>>>>
> > >>>>>>              TRACE_ENTER();
> > >>>>>>
> > >>>>>> -    LOG_ER("AMF director unexpectedly crashed");
> > >>>>>> -
> > >>>>>>              /* Don't issue reboot if it has been already issued.*/
> > >>>>>>              if (false == cb->reboot_in_progress) {
> > >>>>>>                      cb->reboot_in_progress = true;
> > >>>>>> -            opensaf_reboot(avnd_cb->node_info.nodeId, (char
> *)avnd_cb->node_info.executionEnvironment.value,
> > >>>>>> -                            "local AVD down(Adest) or both AVD
> down(Vdest) received");
> > >>>>>> +            if(cb->cont_reboot_in_progress == false) {
> > >>>>>> +                    LOG_ER("AMF director unexpectedly
> crashed");
> > >>>>>> +                    opensaf_reboot(avnd_cb-
> >node_info.nodeId, (char *)avnd_cb-
> > >node_info.executionEnvironment.value,
> > >>>>>> +                                    "local AVD down(Adest) or
> both AVD down(Vdest) received");
> > >>>>>> +            } else {
> > >>>>>> +                    opensaf_reboot(avnd_cb-
> >node_info.nodeId, (char *)avnd_cb-
> > >node_info.executionEnvironment.value,
> > >>>>>> +                                    "Link reset with Act
> controller");
> > >>>>>> +            }
> > >>>>>> +
> > >>>>>>              }
> > >>>>>>
> > >>>>>>              TRACE_LEAVE();
> > >>>>>> diff --git a/osaf/services/saf/amf/amfnd/include/avnd_cb.h
> > >>>>>> b/osaf/services/saf/amf/amfnd/include/avnd_cb.h
> > >>>>>> --- a/osaf/services/saf/amf/amfnd/include/avnd_cb.h
> > >>>>>> +++ b/osaf/services/saf/amf/amfnd/include/avnd_cb.h
> > >>>>>> @@ -130,6 +130,7 @@ typedef struct avnd_cb_tag {
> > >>>>>>              SaBoolT first_time_up;
> > >>>>>>              bool reboot_in_progress;
> > >>>>>>              AVND_SU *failed_su;
> > >>>>>> +    bool cont_reboot_in_progress;
> > >>>>>>      } AVND_CB;
> > >>>>>>
> > >>>>>>      #define AVND_CB_NULL ((AVND_CB *)0) diff --git
> > >>>>>> a/osaf/services/saf/amf/amfnd/mds.cc
> > >>>>>> b/osaf/services/saf/amf/amfnd/mds.cc
> > >>>>>> --- a/osaf/services/saf/amf/amfnd/mds.cc
> > >>>>>> +++ b/osaf/services/saf/amf/amfnd/mds.cc
> > >>>>>> @@ -386,6 +386,7 @@ uint32_t avnd_mds_rcv(AVND_CB *cb,
> MDS_C
> > >>>>>>                      if ((AVSV_D2N_NODE_UP_MSG ==
> ((AVSV_DND_MSG *)(rcv_info->i_msg))->msg_type) ||
> > >>>>>>                          (AVSV_D2N_DATA_VERIFY_MSG ==
> ((AVSV_DND_MSG *)(rcv_info->i_msg))->msg_type)) {
> > >>>>>>                              cb->active_avd_adest = 
> > >>>>>> rcv_info->i_fr_dest;
> > >>>>>> +                    avnd_cb->cont_reboot_in_progress = false;
> > >>>>>>                              TRACE_1("Active AVD Adest = %" PRIu64 
> > >>>>>> ,cb-
> >active_avd_adest);
> > >>>>>>                      }
> > >>>>>>
> > >>>>>> @@ -560,6 +561,14 @@ uint32_t avnd_mds_svc_evt(AVND_CB
> *cb, M
> > >>>>>>              case NCSMDS_UP:
> > >>>>>>                      switch (evt_info->i_svc_id) {
> > >>>>>>                      case NCSMDS_SVC_ID_AVD:
> > >>>>>> +
> > >>>>>> +                    if ((m_MDS_DEST_IS_AN_ADEST(evt_info-
> >i_dest) && avnd_cb->cont_reboot_in_progress) &&
> > >>>>>> +
>       (m_NCS_NODE_ID_FROM_MDS_DEST(evt_info->i_dest) == cb-
> >active_avd_adest)) {
> > >>>>>> +                            memset(&cb->avd_dest, 0,
> sizeof(MDS_DEST));
> > >>>>>> +                            evt = avnd_evt_create(cb,
> AVND_EVT_MDS_AVD_DN, 0, &evt_info->i_dest, 0, 0, 0);
> > >>>>>> +                            break;
> > >>>>>> +                    }
> > >>>>>> +
> > >>>>>>                              /* create the mds event */
> > >>>>>>                              evt = avnd_evt_create(cb,
> AVND_EVT_MDS_AVD_UP, 0, &evt_info->i_dest, 0, 0, 0);
> > >>>>>>                              break;
> > >>>>>> @@ -606,6 +615,8 @@ uint32_t avnd_mds_svc_evt(AVND_CB
> *cb, M
> > >>>>>>                                      /* Supervise our node local 
> > >>>>>> director
> */
> > >>>>>>                                      if (evt_info->i_node_id !=
> ncs_get_node_id()) {
> > >>>>>>                                              /* Ignore the other AVD
> Adest Down.*/
> > >>>>>> +
>       if(m_NCS_NODE_ID_FROM_MDS_DEST(evt_info->i_dest) == cb-
> >active_avd_adest)
> > >>>>>> +                                            avnd_cb-
> >cont_reboot_in_progress = true;
> > >>>>>>                                              return rc;
> > >>>>>>                                      }
> > >>>>>>                              }
> > >>>> -----------------------------------------------------------------
> > >>>> -----
> > >>>> -------- DreamFactory - Open Source REST & JSON Services for
> > >>>> HTML5 & Native Apps OAuth, Users, Roles, SQL, NoSQL, BLOB Storage
> > >>>> and External API Access Free app hosting. Or install the open source
> package on any LAMP server.
> > >>>> Sign up and see examples for AngularJS, jQuery, Sencha Touch and
> Native!
> > >>>>
> http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/o
> > >>>> stg.c lktrk
> _______________________________________________
> > >>>> Opensaf-devel mailing list
> > >>>> Opensaf-devel@lists.sourceforge.net
> > >>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
> 
> 
> ------------------------------------------------------------------------------
> DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
> OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access Free
> app hosting. Or install the open source package on any LAMP server.
> Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
> http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clk
> trk
> _______________________________________________
> Opensaf-devel mailing list
> Opensaf-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel

------------------------------------------------------------------------------
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to