For us 4 system-controllers would be a sweet spot. Agreed you would also want headless mode if you had dedicated system controllers. For us however we don’t need dedicated controllers.
> On Oct 13, 2015, at 6:27 AM, Anders Widell <[email protected]> wrote: > > The possibility to have more than two system controllers (one active + > several standby and/or spare controller nodes) is also something that has > been investigated. For scalability reasons, we probably can't turn all nodes > into standby controllers in a large cluster - but it may be feasible to have > a system with one or several standby controllers and the rest of the nodes > are spares that are ready to take an active or standby assignment when needed. > > However, the "headless" feature will still be needed in some systems where > you need dedicated controller node(s). > > / Anders Widell > > On 10/13/2015 12:07 PM, Tony Hart wrote: >> Understood. The assumption is that this is temporary but we allow the >> payloads to continue to run (with reduced osaf functionality) until a >> replacement controller is found. At that point they can reboot to get the >> system back into sync. >> >> Or allow more than 2 controllers in the system so we can have one or more >> usually-payload cards be controllers to reduce the probability of >> no-controllers to an acceptable level. >> >> >>> On Oct 12, 2015, at 11:05 AM, Anders Björnerstedt >>> <[email protected]> wrote: >>> >>> The headless state is also vulnerable to split-brain scenarios. >>> That is network partitions and joins can occur and will not be detected as >>> such and thus not handled properly (isolated) when they occur. >>> Basically you can not be sure you have a continuously coherent cluster >>> while in the headless state. >>> >>> On paper you may get a very resilient system in the sense that It "stays >>> up" and replies on ping etc. >>> But typically a customer wants not just availability but reliable behavior >>> also. >>> >>> /AndersBj >>> >>> >>> -----Original Message----- >>> From: Anders Björnerstedt [mailto:[email protected]] >>> Sent: den 12 oktober 2015 16:42 >>> To: Anders Widell; Tony Hart; [email protected] >>> Subject: Re: [users] Avoid rebooting payload modules after losing system >>> controller >>> >>> Note that this headless variant is a very questionable feature. This for >>> the reasons explained earlier, i.e. you *will* get a reduction in service >>> availability. >>> It was never accepted into OpenSAF for that reason. >>> >>> On top of that the unreliability will typically not he explicit/handled. >>> That is the operator will probably not even know what is working and what >>> is not during the SC absence since the alarm/notification function is >>> gone. No OpenSAF director services are executing. >>> >>> It is truly a headless system, i.e. a zombie system and thus not working at >>> full monitoring and availability functionality. >>> It begs the question of what OpenSAF and SAF is there for in the first >>> place. >>> >>> The SCs don’t have to run any special software and don’t have to have any >>> special hardware. >>> They do need file system access, at least for a cluster restart, but not >>> necessarily to handle single SC failure. >>> The headless variant when headless is also in that >>> not-able-to-cluster-restart also, but with even less functionality. >>> >>> An SC can of course run other (non OpenSAF specific) software. And the two >>> SCs don’t necessarily have to be symmetric in terms of software. >>> >>> Providing file system access via NFS is typically a non issue. They have >>> three nodes. Ergo they should be able to assign two of them the role of SC >>> in the OpensAF domain. >>> >>> /AndersBj >>> >>> -----Original Message----- >>> From: Anders Widell [mailto:[email protected]] >>> Sent: den 12 oktober 2015 16:08 >>> To: Tony Hart; [email protected] >>> Subject: Re: [users] Avoid rebooting payload modules after losing system >>> controller >>> >>> We have actually implemented something very similar to what you are talking >>> about. With this feature, the payloads can survive without a cluster >>> restart even if both system controllers restart (or the single system >>> controller, in your case). If you want to try it out, you can clone this >>> Mercurial repository: >>> >>> https://sourceforge.net/u/anders-w/opensaf-headless/ >>> >>> To enable the feature, set the variable IMMSV_SC_ABSENCE_ALLOWED in >>> immd.conf to the amount of seconds you wish the payloads to wait for the >>> system controllers to come back. Note: we have only implemented this >>> feature for the "core" OpenSAF services (plus CKPT), so you need to disable >>> the optional serivces. >>> >>> / Anders Widell >>> >>> On 10/11/2015 02:30 PM, Tony Hart wrote: >>>> We have been using opensaf in our product for a couple of years now. One >>>> of the issues we have is the fact that payload cards reboot when the >>>> system controllers are lost. Although our payload card hardware will >>>> continue to perform its functions whilst the software is down (which is >>>> desirable) the functions that the software performs are obviously not >>>> performed (which is not desirable). >>>> >>>> Why would we loose both controllers, surely that is a rare circumstance? >>>> Not if you only have one controller to begin with. Removing the second >>>> controller is a significant cost saving for us so we want to support a >>>> product that only has one controller. The most significant impediment to >>>> that is the loss of payload software functions when the system controller >>>> fails. >>>> >>>> I’m looking for suggestions from this email list as to what could be done >>>> for this issue. >>>> >>>> One suggestion, that would work for us, is if we could convince the >>>> payload card to only reboot when the controller reappears after a loss >>>> rather than when the loss initially occurs. Is that possible? >>>> >>>> Another possibility is if we could support more than 2 controllers, for >>>> example if we could support 4 (one active and 3 standbys) that would also >>>> provide a solution for us (our current payloads would instead become >>>> controllers). I know that this is not currently possible with opensaf. >>>> >>>> thanks for any suggestions, >>>> — >>>> tony >>>> ---------------------------------------------------------------------- >>>> -------- _______________________________________________ >>>> Opensaf-users mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/opensaf-users >>> >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Opensaf-users mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/opensaf-users >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Opensaf-users mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/opensaf-users > > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
