Hi Anders, Thanks for the reply. You are right that osaf will not be able to provide its functions full when running in headless mode. The understanding is that this would be a temporary state, rectified once another controller can be brought online. In the meantime we want to keep the payload cards up so that they can continue to perform their (non-osaf) functions, once a new controller appears they can be rebooted then.
This allows us to continue to provide critical software functions whilst waiting for a replacement. To be honest I don’t know why this, or equivalent, feature has not been requested before. It seems like many applications would run into this situation, unless your application is running at the very high end there is always pressure to reduce cost. Either allow more than 2 controllers or as a minimum allow headless-mode-until-replacement. thanks, — tony > On Oct 12, 2015, at 10:42 AM, Anders Björnerstedt > <[email protected]> wrote: > > Note that this headless variant is a very questionable feature. This for the > reasons explained earlier, i.e. you *will* get a reduction in service > availability. > It was never accepted into OpenSAF for that reason. > > On top of that the unreliability will typically not he explicit/handled. That > is the operator will probably not even know what is working and > what is not during the SC absence since the alarm/notification function is > gone. No OpenSAF director services are executing. > > It is truly a headless system, i.e. a zombie system and thus not working at > full monitoring and availability functionality. > It begs the question of what OpenSAF and SAF is there for in the first place. > > The SCs don’t have to run any special software and don’t have to have any > special hardware. > They do need file system access, at least for a cluster restart, but not > necessarily to handle single SC failure. > The headless variant when headless is also in that > not-able-to-cluster-restart also, but with even less functionality. > > An SC can of course run other (non OpenSAF specific) software. And the two > SCs don’t necessarily have to be > symmetric in terms of software. > > Providing file system access via NFS is typically a non issue. They have > three nodes. Ergo they should be able to assign two > of them the role of SC in the OpensAF domain. > > /AndersBj > > -----Original Message----- > From: Anders Widell [mailto:[email protected]] > Sent: den 12 oktober 2015 16:08 > To: Tony Hart; [email protected] > Subject: Re: [users] Avoid rebooting payload modules after losing system > controller > > We have actually implemented something very similar to what you are talking > about. With this feature, the payloads can survive without a cluster restart > even if both system controllers restart (or the single system controller, in > your case). If you want to try it out, you can clone this Mercurial > repository: > > https://sourceforge.net/u/anders-w/opensaf-headless/ > > To enable the feature, set the variable IMMSV_SC_ABSENCE_ALLOWED in immd.conf > to the amount of seconds you wish the payloads to wait for the system > controllers to come back. Note: we have only implemented this feature for the > "core" OpenSAF services (plus CKPT), so you need to disable the optional > serivces. > > / Anders Widell > > On 10/11/2015 02:30 PM, Tony Hart wrote: >> We have been using opensaf in our product for a couple of years now. One of >> the issues we have is the fact that payload cards reboot when the system >> controllers are lost. Although our payload card hardware will continue to >> perform its functions whilst the software is down (which is desirable) the >> functions that the software performs are obviously not performed (which is >> not desirable). >> >> Why would we loose both controllers, surely that is a rare circumstance? >> Not if you only have one controller to begin with. Removing the second >> controller is a significant cost saving for us so we want to support a >> product that only has one controller. The most significant impediment to >> that is the loss of payload software functions when the system controller >> fails. >> >> I’m looking for suggestions from this email list as to what could be done >> for this issue. >> >> One suggestion, that would work for us, is if we could convince the payload >> card to only reboot when the controller reappears after a loss rather than >> when the loss initially occurs. Is that possible? >> >> Another possibility is if we could support more than 2 controllers, for >> example if we could support 4 (one active and 3 standbys) that would also >> provide a solution for us (our current payloads would instead become >> controllers). I know that this is not currently possible with opensaf. >> >> thanks for any suggestions, >> — >> tony >> ---------------------------------------------------------------------- >> -------- _______________________________________________ >> Opensaf-users mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/opensaf-users > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Opensaf-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
