Same is reproducible using SIGABRT (kill -6)

ps aux | grep ovn-controller
root      927884  0.0  0.0  26792   956 ?        S<s  12:03   0:00
ovn-controller: monitoring pid 927885 (healthy)
root      927885  0.0  0.0  27060  2484 ?        S<   12:03   0:00
ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer
-vsyslog:err -vfile:info --no-chdir
--log-file=/var/log/openvswitch/ovn-controller.log
--pidfile=/var/run/openvswitch/ovn-controller.pid --detach --monitor
kill -6 927884
kill -6 927885

service ovn-host restart
2017-04-29T19:46:53Z|00001|unixctl|WARN|failed to connect to
/var/run/openvswitch/ovn-controller.927885.ctl
ovs-appctl: cannot connect to
"/var/run/openvswitch/ovn-controller.927885.ctl" (Connection refused)
 * Starting ovn-controller

We are trying to solve the real use case here . The reason is --monitor
takes care of code crash based on the SIG* mentioned. However, we want to
avoid cases where someone kills the controller pid and provisioning a new
VM will not get the ACLs properly in place since controller died.
Re-spawning at-least ensures that we there is no *control-plane impact*.

Same case for ovs-vswitchd where if someone kills the pid, it brings down
the host(with/without vms on it) since there is no respawn mechanism apart
from code crash which monitor takes care of. Also for production version
,we always choose stable release to avoid such random code crash issues for
which monitor option will handle it by default. Here again re-spawning
helps avoid *data-plane impact*.




On Sat, Apr 29, 2017 at 12:26 PM, Ben Pfaff <b...@ovn.org> wrote:

> Please read the ovs-vswitchd manpage.  It says:
>
>        --monitor
>               Creates an additional process to monitor the  ovs-vswitchd
> dae‐
>               mon.   If  the daemon dies due to a signal that indicates a
> pro‐
>               gramming error (SIGABRT, SIGALRM, SIGBUS, SIGFPE,  SIGILL,
> SIG‐
>               PIPE,  SIGSEGV,  SIGXCPU,  or  SIGXFSZ) then the monitor
> process
>               starts a new copy of it.   If  the  daemon  dies  or  exits
> for
>               another reason, the monitor process exits.
>
>               This  option  is  normally used with --detach, but it also
> func‐
>               tions without it.
>
> SIGKILL (signal 9) does not indicate a bug, so the monitor process does
> not restart OVS.  If you want to test the monitoring feature, use one of
> the signals listed above that indicates a bug.
>
> OVS solves the PID file management problem by holding a lock on the
> pidfile.  The pidfile is only valid if it is locked.
>
> I don't think you're solving real problems.
>
> On Sat, Apr 29, 2017 at 12:10:58PM -0700, Aliasgar Mikail Ginwala wrote:
> > When you say that ovn-controller crashed, what do you mean?
> > I mean if someone kills the pid or it crashes, it never comes back up
> until
> > and unless I do service ovn-host restart.
> >  Do you mean that you killed it? Yes
> >  Which process, and how did you kill it?  Stating the e.g. I posted
> above:
> > ps aux | grep controller
> > root     3639845  0.0  0.0  26792   952 ?        S<s  17:24   0:00
> > ovn-controller: monitoring pid 3639846 (healthy)
> > root     3639846  0.0  0.0  27060  2484 ?        S<   17:24   0:00
> > ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer
> > -vsyslog:err -vfile:info --no-chdir
> > --log-file=/var/log/openvswitch/ovn-controller.log
> > --pidfile=/var/run/openvswitch/ovn-controller.pid --detach --monitor
> >
> > Kill -9 3639845and issuing kill -9 3639846 ofcourse kill the whole
> service.
> >
> > Also we have a known issue for pid file management as it goes stale
> which I
> > already highlighed in the example  and reference @.
> > http://stackoverflow.com/questions/696839/how-do-i-
> write-a-bash-script-to-restart-a-process-if-it-dies
> >
> > My sample service with respawn is as follow ; as soon as you kill the
> pid,
> > it just respawns:
> > ps aux | grep fakeservice
> > root      924307  2.7  0.0 782872 23844 ?        Sl   12:01   0:00
> > /fake/fakeservice --v=10 --fakeservice-resource-point=http://fakeurl
> > kill -9 924307
> > ps aux | grep fakeservice
> > root      924653 12.0  0.0 774420 23728 ?        Sl   12:01   0:00
> > /fake/fakeservice --v=10 --fakeservice-resource-point=http://fakeurl
> >
> > So why can't we get rid of it and just add ovn-host in /etc/init/ and add
> > below lines which immediately respawns?
> > respawn
> > respawn limit x x
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sat, Apr 29, 2017 at 10:04 AM, Ben Pfaff <b...@ovn.org> wrote:
> >
> > > When you say that ovn-controller crashed, what do you mean?  Do you
> mean
> > > that you killed it?  Which process, and how did you kill it?
> > >
> > > On Fri, Apr 28, 2017 at 10:51:04PM -0700, Aliasgar Mikail Ginwala
> wrote:
> > > > Yes:
> > > >
> > > > ps aux | grep controller
> > > > root     3639845  0.0  0.0  26792   952 ?        S<s  17:24   0:00
> > > > ovn-controller: monitoring pid 3639846 (healthy)
> > > > root     3639846  0.0  0.0  27060  2484 ?        S<   17:24   0:00
> > > > ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer
> > > > -vsyslog:err -vfile:info --no-chdir
> > > > --log-file=/var/log/openvswitch/ovn-controller.log
> > > > --pidfile=/var/run/openvswitch/ovn-controller.pid --detach --monitor
> > > > root     4067233  0.0  0.0  11744   936 pts/9    S+   22:46   0:00
> grep
> > > > --color=auto controller
> > > >
> > > >
> > > > /etc/init.d/ovn-host installed via debain that is compiled from
> source
> > > code
> > > > only adds --monitor
> > > >
> > > > On Fri, Apr 28, 2017 at 9:08 PM, Ben Pfaff <b...@ovn.org> wrote:
> > > >
> > > > > Is it running with the --monitor option?  If not, either --monitor
> > > > > should be added or the upstart features should be used.
> > > > >
> > > > > On Fri, Apr 28, 2017 at 05:16:09PM -0700, Aliasgar Mikail Ginwala
> > > wrote:
> > > > > > I did double verify:
> > > > > >
> > > > > > This is what is happening after crashing the ovn pid:
> > > > > >
> > > > > > service ovn-host status
> > > > > > Pidfile for ovn-controller (/var/run/openvswitch/ovn-
> controller.pid)
> > > is
> > > > > > stale
> > > > > >
> > > > > > Works only after manual restart and didn't respawn
> > > > > > service ovn-host restart
> > > > > > 2017-04-29T00:14:37Z|00001|unixctl|WARN|failed to connect to
> > > > > > /var/run/openvswitch/ovn-controller.3623709.ctl
> > > > > > ovs-appctl: cannot connect to
> > > > > > "/var/run/openvswitch/ovn-controller.3623709.ctl" (Connection
> > > refused)
> > > > > >  * Starting ovn-controller
> > > > > >
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Aliasgar
> > > > > >
> > > > > > On Fri, Apr 28, 2017 at 4:50 PM, Ben Pfaff <b...@ovn.org> wrote:
> > > > > >
> > > > > > > On Fri, Apr 28, 2017 at 04:02:26PM -0700, Aliasgar Mikail
> Ginwala
> > > > > wrote:
> > > > > > > > Recently when I was adding monitoring and alerting for ovs
> and
> > > ovn
> > > > > > > version
> > > > > > > > 2.7.0, I found both of the upstart services are missing
> > > *respawn* .
> > > > > Is it
> > > > > > > > on purpose? If it's not then lets handle it as an
> improvement to
> > > add
> > > > > it
> > > > > > > in
> > > > > > > > the upstart. Suggestions welcome.
> > > > > > >
> > > > > > > OVS and OVN already restarts itself, so probably nothing is
> needed.
> > > > > > >
> > > > >
> > >
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to