Re: [users] Avoid rebooting payload modules after losing system controller

Mathivanan Naickan Palanivelu Tue, 13 Oct 2015 03:24:06 -0700

I don't think this scenario qualifies (not yet) to gain direct relevance to the 
multiple-standby concepts yet(headless sounds a misnomer to me, that apart)!
Could you please explain the following:
- What is the redundancy model of the distributed applications running on those 
3 nodes i.e. (Server, linecard1, linecard2)
- Are your applications running as SA-AWARE or non-SA-AWARE?
- What would be the typical heartbeat interval period.


You would require a customized handling (like delaying or avoiding reboot for a 
while)

Thanks,
Mathi.

> -----Original Message-----
> From: Tony Hart [mailto:[email protected]]
> Sent: Tuesday, October 13, 2015 3:37 PM
> To: Anders Björnerstedt
> Cc: [email protected]
> Subject: Re: [users] Avoid rebooting payload modules after losing system
> controller
> 
> 
> Understood.  The assumption is that this is temporary but we allow the
> payloads to continue to run (with reduced osaf functionality) until a
> replacement controller is found.  At that point they can reboot to get the
> system back into sync.
> 
> Or allow more than 2 controllers in the system so we can have one or more
> usually-payload cards be controllers to reduce the probability of no-
> controllers to an acceptable level.
> 
> 
> > On Oct 12, 2015, at 11:05 AM, Anders Björnerstedt
> <[email protected]> wrote:
> >
> > The headless state is also vulnerable to split-brain scenarios.
> > That is network partitions and joins can occur and will not be detected as
> such and thus not handled properly (isolated) when they occur.
> > Basically you can  not be sure you have a continuously coherent cluster
> while in the headless state.
> >
> > On paper you may get a very resilient system in the sense that It "stays up"
> and replies on ping etc.
> > But typically a customer wants not just availability but reliable behavior
> also.
> >
> > /AndersBj
> >
> >
> > -----Original Message-----
> > From: Anders Björnerstedt [mailto:[email protected]]
> > Sent: den 12 oktober 2015 16:42
> > To: Anders Widell; Tony Hart; [email protected]
> > Subject: Re: [users] Avoid rebooting payload modules after losing
> > system controller
> >
> > Note that this headless variant  is a very questionable feature. This for 
> > the
> reasons explained earlier, i.e. you *will*  get a reduction in service
> availability.
> > It was never accepted into OpenSAF for that reason.
> >
> > On top of that the unreliability will typically not he explicit/handled. 
> > That is
> the operator will probably not even know what is working and what is not
> during the SC absence since the alarm/notification  function is gone. No
> OpenSAF director services are executing.
> >
> > It is truly a headless system, i.e. a zombie system and thus not working at
> full monitoring and availability functionality.
> > It begs the question of what OpenSAF and SAF is there for in the first 
> > place.
> >
> > The SCs don’t have to run any special software and don’t have to have any
> special hardware.
> > They do need file system access, at least for a cluster restart, but not
> necessarily to handle single SC failure.
> > The headless variant when headless is also in that not-able-to-cluster-
> restart also, but with even less functionality.
> >
> > An SC can of course run other (non OpenSAF specific) software.  And the
> two SCs don’t necessarily have to be symmetric in terms of software.
> >
> > Providing file system access via NFS is typically a non issue. They have 
> > three
> nodes. Ergo  they should be able to assign two of them the role of SC in the
> OpensAF domain.
> >
> > /AndersBj
> >
> > -----Original Message-----
> > From: Anders Widell [mailto:[email protected]]
> > Sent: den 12 oktober 2015 16:08
> > To: Tony Hart; [email protected]
> > Subject: Re: [users] Avoid rebooting payload modules after losing
> > system controller
> >
> > We have actually implemented something very similar to what you are
> talking about. With this feature, the payloads can survive without a cluster
> restart even if both system controllers restart (or the single system
> controller, in your case). If you want to try it out, you can clone this 
> Mercurial
> repository:
> >
> > https://sourceforge.net/u/anders-w/opensaf-headless/
> >
> > To enable the feature, set the variable IMMSV_SC_ABSENCE_ALLOWED in
> immd.conf to the amount of seconds you wish the payloads to wait for the
> system controllers to come back. Note: we have only implemented this
> feature for the "core" OpenSAF services (plus CKPT), so you need to disable
> the optional serivces.
> >
> > / Anders Widell
> >
> > On 10/11/2015 02:30 PM, Tony Hart wrote:
> >> We have been using opensaf in our product for a couple of years now.
> One of the issues we have is the fact that payload cards reboot when the
> system controllers are lost.  Although our payload card hardware will
> continue to perform its functions whilst the software is down (which is
> desirable) the functions that the software performs are obviously not
> performed (which is not desirable).
> >>
> >> Why would we loose both controllers, surely that is a rare circumstance?
> Not if you only have one controller to begin with.  Removing the second
> controller is a significant cost saving for us so we want to support a product
> that only has one controller.  The most significant impediment to that is the
> loss of payload software functions when the system controller fails.
> >>
> >> I’m looking for suggestions from this email list as to what could be done
> for this issue.
> >>
> >> One suggestion, that would work for us, is if we could convince the
> payload card to only reboot when the controller reappears after a loss rather
> than when the loss initially occurs.  Is that possible?
> >>
> >> Another possibility is if we could support more than 2 controllers, for
> example if we could support 4 (one active and 3 standbys) that would also
> provide a solution for us (our current payloads would instead become
> controllers).  I know that this is not currently possible with opensaf.
> >>
> >> thanks for any suggestions,
> >> —
> >> tony
> >> ---------------------------------------------------------------------
> >> -
> >> -------- _______________________________________________
> >> Opensaf-users mailing list
> >> [email protected]
> >> https://lists.sourceforge.net/lists/listinfo/opensaf-users
> >
> >
> >
> > ----------------------------------------------------------------------
> > -------- _______________________________________________
> > Opensaf-users mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/opensaf-users
> > ----------------------------------------------------------------------
> > -------- _______________________________________________
> > Opensaf-users mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/opensaf-users
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Opensaf-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-users

------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] Avoid rebooting payload modules after losing system controller

Reply via email to