On Thu, Aug 4, 2016 at 8:17 PM, Andy Zhou <az...@ovn.org> wrote: > > On Wed, Jul 27, 2016 at 1:04 PM, Andy Zhou <az...@ovn.org> wrote: > >> >> >> On Tue, Jul 26, 2016 at 6:20 PM, Russell Bryant <russ...@ovn.org> wrote: >> >>> >>> >>> On Tue, Jul 26, 2016 at 3:48 PM, Andy Zhou <az...@ovn.org> wrote: >>> >>>> >>>> >>>> On Tue, Jul 26, 2016 at 11:59 AM, Russell Bryant <russ...@ovn.org> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Jul 26, 2016 at 2:41 PM, Andy Zhou <az...@ovn.org> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Tue, Jul 26, 2016 at 5:37 AM, Russell Bryant <russ...@ovn.org> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Jul 25, 2016 at 8:15 PM, Andy Zhou <az...@ovn.org> wrote: >>>>>>> >>>>>>>> Hi, Rayn and Russell, >>>>>>>> >>>>>>> >>>>>>> Can we move this discussion to the ovs dev mailing list? Feel free >>>>>>> to just add it in a reply if you'd like. >>>>>>> >>>>>> Done. >>>>>> >>>>>>> >>>>>>> >>>>>>>> I am wondering how we can actually use the active/backup feature >>>>>>>> that is now part of >>>>>>>> OVSDB to increase OVN availability. >>>>>>>> >>>>>>> >>>>>>> TO be clear, I haven't actually tried this yet. I'm only speaking >>>>>>> about how I think it should work. >>>>>>> >>>>>>> >>>>>>>> Specifically: >>>>>>>> >>>>>>>> 1. When the active OVSDB server failed, should the back up server >>>>>>>> take over, and allow write transactions? One simpler possibility is to >>>>>>>> allow read only access to the backup serve. >>>>>>>> >>>>>>> >>>>>>> The backup server needs to take over. It's OK if that requires >>>>>>> intervention by an HA manager like Pacemaker. If we can't make the >>>>>>> passive >>>>>>> server take over, I'd say the solution is incomplete. >>>>>>> >>>>>> >>>>>> O.K. make sense. >>>>>> >>>>>> One possible issue with backup server taking over is "split head". >>>>>> In case due to network error, backup server becomes disconnected from the >>>>>> active >>>>>> server, then we may have both server thinking they are active server >>>>>> now. Does Pacemaker help with solving this issue. >>>>>> >>>>> >>>>> It can, yes. I would expect Pacemaker to explicitly configure a node >>>>> to be either the active or passive node. >>>>> >>>> Manual switching is more straight forward. I agree. >>>> >>>>> >>>>>>> >>>>>>>> 2. When a crashed active OVSDB server recovers, should it become >>>>>>>> the new backup, or it should switch back. >>>>>>>> >>>>>>> >>>>>>> Becoming the new backup is fine. Again, this can be orchestrated by >>>>>>> an HA manager (Pacemaker). >>>>>>> >>>>>> I am not familiar with pacemaker. Can I assume it can provide a >>>>>> correct --sync-from argument (pointing to backup server) when relaunch >>>>>> OVSDB server? >>>>>> >>>>> >>>>> Yes. I'd have to consult with some Pacemaker experts on exactly what >>>>> the implementation would look like, but roughly: >>>>> >>>>> Pacemaker manages services using "OCF Resource Agents", which are just >>>>> scripts with a defined set of inputs and outputs for service management. >>>>> I >>>>> would imagine a Pacemaker cluster being told it must have exactly 1 active >>>>> and 1 passive OVSDB service. When the passive OVSDB service is started, >>>>> it >>>>> would include the "sync-from" argument based on where the active OVSDB >>>>> service is currently running. >>>>> >>>>> We really need to prototype this and document it. I'm guessing too >>>>> much. Pacemaker is frequently used to manage active/passive HA, though. >>>>> >>>>> Sounds reasonable, I will work on ovsdb internal changes to support >>>> manual switching, using appctl commands. Then looking into prototyping with >>>> HA systems. I have not used pacemaker in the past, so it may take some >>>> time to ramp up. >>>> >>> >>> I should be able to help. We need to do this work anyway for >>> integration into OpenStack deployment tools. Let me see if I can get some >>> helpful examples to follow. >>> >> >> Thanks for helping out. >> >> Given that, I now plan to work from bottom up, initially focusing on >> ovsdb server changes. >> >> 1. Add a state in ovsdb-server for it to know whether it is an active >> server. Backup server will not accept any connections. Server started with >> --sync-from argument will be put in the back state by default. >> >> 2. Add appctl commands to allow manually switch state. >> >> 3. Add a new table for backup server to register its address and ports. >> OVSDB clients can learn about them at run time. Back up server should issue >> an >> transaction to register its address before issuing the monitoring >> request. This feature is not strictly necessary, and can be pushed to HA >> manager, >> but having it built into ovsdb-server may make it simpler for >> integrationl. >> >> What do you think? >> >> >> > Russell, Would HA manager also manage ovn-controller switch over? >
Yes, indirectly. The way this is typically handled is by using a virtual IP that moves to whatever host is currently the master. -- Russell Bryant _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev