I don't think this scenario qualifies (not yet) to gain direct relevance to the multiple-standby concepts yet(headless sounds a misnomer to me, that apart)! Could you please explain the following: - What is the redundancy model of the distributed applications running on those 3 nodes i.e. (Server, linecard1, linecard2) - Are your applications running as SA-AWARE or non-SA-AWARE? - What would be the typical heartbeat interval period.
You would require a customized handling (like delaying or avoiding reboot for a while) Thanks, Mathi. > -----Original Message----- > From: Tony Hart [mailto:[email protected]] > Sent: Tuesday, October 13, 2015 3:37 PM > To: Anders Björnerstedt > Cc: [email protected] > Subject: Re: [users] Avoid rebooting payload modules after losing system > controller > > > Understood. The assumption is that this is temporary but we allow the > payloads to continue to run (with reduced osaf functionality) until a > replacement controller is found. At that point they can reboot to get the > system back into sync. > > Or allow more than 2 controllers in the system so we can have one or more > usually-payload cards be controllers to reduce the probability of no- > controllers to an acceptable level. > > > > On Oct 12, 2015, at 11:05 AM, Anders Björnerstedt > <[email protected]> wrote: > > > > The headless state is also vulnerable to split-brain scenarios. > > That is network partitions and joins can occur and will not be detected as > such and thus not handled properly (isolated) when they occur. > > Basically you can not be sure you have a continuously coherent cluster > while in the headless state. > > > > On paper you may get a very resilient system in the sense that It "stays up" > and replies on ping etc. > > But typically a customer wants not just availability but reliable behavior > also. > > > > /AndersBj > > > > > > -----Original Message----- > > From: Anders Björnerstedt [mailto:[email protected]] > > Sent: den 12 oktober 2015 16:42 > > To: Anders Widell; Tony Hart; [email protected] > > Subject: Re: [users] Avoid rebooting payload modules after losing > > system controller > > > > Note that this headless variant is a very questionable feature. This for > > the > reasons explained earlier, i.e. you *will* get a reduction in service > availability. > > It was never accepted into OpenSAF for that reason. > > > > On top of that the unreliability will typically not he explicit/handled. > > That is > the operator will probably not even know what is working and what is not > during the SC absence since the alarm/notification function is gone. No > OpenSAF director services are executing. > > > > It is truly a headless system, i.e. a zombie system and thus not working at > full monitoring and availability functionality. > > It begs the question of what OpenSAF and SAF is there for in the first > > place. > > > > The SCs don’t have to run any special software and don’t have to have any > special hardware. > > They do need file system access, at least for a cluster restart, but not > necessarily to handle single SC failure. > > The headless variant when headless is also in that not-able-to-cluster- > restart also, but with even less functionality. > > > > An SC can of course run other (non OpenSAF specific) software. And the > two SCs don’t necessarily have to be symmetric in terms of software. > > > > Providing file system access via NFS is typically a non issue. They have > > three > nodes. Ergo they should be able to assign two of them the role of SC in the > OpensAF domain. > > > > /AndersBj > > > > -----Original Message----- > > From: Anders Widell [mailto:[email protected]] > > Sent: den 12 oktober 2015 16:08 > > To: Tony Hart; [email protected] > > Subject: Re: [users] Avoid rebooting payload modules after losing > > system controller > > > > We have actually implemented something very similar to what you are > talking about. With this feature, the payloads can survive without a cluster > restart even if both system controllers restart (or the single system > controller, in your case). If you want to try it out, you can clone this > Mercurial > repository: > > > > https://sourceforge.net/u/anders-w/opensaf-headless/ > > > > To enable the feature, set the variable IMMSV_SC_ABSENCE_ALLOWED in > immd.conf to the amount of seconds you wish the payloads to wait for the > system controllers to come back. Note: we have only implemented this > feature for the "core" OpenSAF services (plus CKPT), so you need to disable > the optional serivces. > > > > / Anders Widell > > > > On 10/11/2015 02:30 PM, Tony Hart wrote: > >> We have been using opensaf in our product for a couple of years now. > One of the issues we have is the fact that payload cards reboot when the > system controllers are lost. Although our payload card hardware will > continue to perform its functions whilst the software is down (which is > desirable) the functions that the software performs are obviously not > performed (which is not desirable). > >> > >> Why would we loose both controllers, surely that is a rare circumstance? > Not if you only have one controller to begin with. Removing the second > controller is a significant cost saving for us so we want to support a product > that only has one controller. The most significant impediment to that is the > loss of payload software functions when the system controller fails. > >> > >> I’m looking for suggestions from this email list as to what could be done > for this issue. > >> > >> One suggestion, that would work for us, is if we could convince the > payload card to only reboot when the controller reappears after a loss rather > than when the loss initially occurs. Is that possible? > >> > >> Another possibility is if we could support more than 2 controllers, for > example if we could support 4 (one active and 3 standbys) that would also > provide a solution for us (our current payloads would instead become > controllers). I know that this is not currently possible with opensaf. > >> > >> thanks for any suggestions, > >> — > >> tony > >> --------------------------------------------------------------------- > >> - > >> -------- _______________________________________________ > >> Opensaf-users mailing list > >> [email protected] > >> https://lists.sourceforge.net/lists/listinfo/opensaf-users > > > > > > > > ---------------------------------------------------------------------- > > -------- _______________________________________________ > > Opensaf-users mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/opensaf-users > > ---------------------------------------------------------------------- > > -------- _______________________________________________ > > Opensaf-users mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/opensaf-users > > ------------------------------------------------------------------------------ > _______________________________________________ > Opensaf-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
