Re: ospfd: prevent additional ospfd from starting

2018-08-29 Thread Florian Obser
OK florian@

On Tue, Aug 28, 2018 at 01:19:39PM +0200, Remi Locherer wrote:
> On Tue, Aug 28, 2018 at 07:56:43AM +0200, Claudio Jeker wrote:
> > On Mon, Aug 27, 2018 at 11:33:19PM +0200, Remi Locherer wrote:
> > > On Fri, Aug 24, 2018 at 12:21:31PM +0200, Remi Locherer wrote:
> > > > On Fri, Aug 24, 2018 at 08:58:12AM +0200, Claudio Jeker wrote:
> 
> [ snip ]
> 
> > > > > Why are we not checking the control socket in the parent?
> > > > > Also it may be better to create the control socket in the parent and 
> > > > > pass
> > > > > it to the ospfe. This is what bgpd is doing and allows to change the 
> > > > > path
> > > > > during runtime with a config reload.
> > > > 
> > > > This makes sense to me. I'll come up with a new diff once I found some
> > > > time for it.
> > > > 
> > > > But I'm not sure about changing the socket path with a reload. I plan to
> > > > pledge (stdio rpath sendfd wroute) and eventually unveil (read 
> > > > ospfd.conf)
> > > > the main process.
> > > 
> > > New diff below creates the control socket in the main process and passes 
> > > it
> > > to the ospf engine later on. The connect check on the control socket now
> > > happens very early.
> > > 
> > > The diff in action looks like this:
> > > 
> > > typhoon ..sbin/ospfd$ doas obj/ospfd -dv 
> > > startup
> > > control_init: socket in use
> > > fatal in ospfd: control socket setup failed
> > > typhoon 1 ..sbin/ospfd$
> > > 
> > > 
> > > I borrowed the fd passing code from slaacd.
> > > 
> > > > 
> > > > > 
> > > > > Could there be a case where this causes ospfd to hang on start in the
> > > > > connect? Not sure if we can sleep doing a connect() to a AF_UNIX 
> > > > > socket.
> > > 
> > > I never observed a hangin ospfctl which also does a connect on the control
> > > socket. But I could not find the definitiv answer.
> > > 
> 
> [ snip ]
> 
> > I would prefer if the check happens before the daemon() call since then
> > the rc script notice this easily.
> 
> sure
> 
> > Also between here and sending the socket
> > we spawn off the rde and ospfe processes. So currently you are leaking
> > control_fd into those processes.
> > You could probably just add the fd as argument to rde() and ospfe() and
> > not use the fd passing at all. But the moment ospfd is using fork
> > then the fd passing will be needed again.
> 
> How about the new diff below: I moved the check from control_init into its
> own function control_check and call only this before daemon(). control_init
> happens later. With this the childs do not have the control fd.
> 
> The time frame where another process can start using the socket is a little
> bit bigger this way. We can reduce this again when implementing fork
> for ospfd.
> 
> One could also argue that with control_check as separate function fd passing
> is not strictly needed. But I think this a step towards fork
> 
> The diff should also address the other suggestions.
> 
> 
> 
> Index: control.c
> ===
> RCS file: /cvs/src/usr.sbin/ospfd/control.c,v
> retrieving revision 1.44
> diff -u -p -r1.44 control.c
> --- control.c 24 Jan 2017 04:24:25 -  1.44
> +++ control.c 28 Aug 2018 09:42:11 -
> @@ -39,6 +39,32 @@ struct ctl_conn*control_connbypid(pid_t
>  void  control_close(int);
>  
>  int
> +control_check(char *path)
> +{
> + struct sockaddr_un   sun;
> + int  fd;
> +
> + bzero(, sizeof(sun));
> + sun.sun_family = AF_UNIX;
> + strlcpy(sun.sun_path, path, sizeof(sun.sun_path));
> +
> + if ((fd = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) {
> + log_warn("control_check: socket check");
> + return (-1);
> + }
> +
> + if (connect(fd, (struct sockaddr *), sizeof(sun)) == 0) {
> + log_warnx("control_check: socket in use");
> + close(fd);
> + return (-1);
> + }
> +
> + close(fd);
> +
> + return (0);
> +}
> +
> +int
>  control_init(char *path)
>  {
>   struct sockaddr_un   sun;
> @@ -78,9 +104,7 @@ control_init(char *path)
>   return (-1);
>   }
>  
> - control_state.fd = fd;
> -
> - return (0);
> + return (fd);
>  }
>  
>  int
> Index: control.h
> ===
> RCS file: /cvs/src/usr.sbin/ospfd/control.h,v
> retrieving revision 1.6
> diff -u -p -r1.6 control.h
> --- control.h 10 Feb 2015 05:24:48 -  1.6
> +++ control.h 28 Aug 2018 09:43:17 -
> @@ -34,6 +34,7 @@ struct ctl_conn {
>   struct imsgev   iev;
>  };
>  
> +int  control_check(char *);
>  int  control_init(char *);
>  int  control_listen(void);
>  void control_accept(int, short, void *);
> Index: ospfd.c
> ===
> RCS file: /cvs/src/usr.sbin/ospfd/ospfd.c,v
> retrieving revision 1.99
> diff -u -p -r1.99 ospfd.c
> --- ospfd.c   11 Jul 2018 12:09:34 -  1.99
> +++ 

Re: ospfd: prevent additional ospfd from starting

2018-08-28 Thread Claudio Jeker
On Tue, Aug 28, 2018 at 01:19:39PM +0200, Remi Locherer wrote:
> On Tue, Aug 28, 2018 at 07:56:43AM +0200, Claudio Jeker wrote:
> > On Mon, Aug 27, 2018 at 11:33:19PM +0200, Remi Locherer wrote:
> > > On Fri, Aug 24, 2018 at 12:21:31PM +0200, Remi Locherer wrote:
> > > > On Fri, Aug 24, 2018 at 08:58:12AM +0200, Claudio Jeker wrote:
> 
> [ snip ]
> 
> > > > > Why are we not checking the control socket in the parent?
> > > > > Also it may be better to create the control socket in the parent and 
> > > > > pass
> > > > > it to the ospfe. This is what bgpd is doing and allows to change the 
> > > > > path
> > > > > during runtime with a config reload.
> > > > 
> > > > This makes sense to me. I'll come up with a new diff once I found some
> > > > time for it.
> > > > 
> > > > But I'm not sure about changing the socket path with a reload. I plan to
> > > > pledge (stdio rpath sendfd wroute) and eventually unveil (read 
> > > > ospfd.conf)
> > > > the main process.
> > > 
> > > New diff below creates the control socket in the main process and passes 
> > > it
> > > to the ospf engine later on. The connect check on the control socket now
> > > happens very early.
> > > 
> > > The diff in action looks like this:
> > > 
> > > typhoon ..sbin/ospfd$ doas obj/ospfd -dv 
> > > startup
> > > control_init: socket in use
> > > fatal in ospfd: control socket setup failed
> > > typhoon 1 ..sbin/ospfd$
> > > 
> > > 
> > > I borrowed the fd passing code from slaacd.
> > > 
> > > > 
> > > > > 
> > > > > Could there be a case where this causes ospfd to hang on start in the
> > > > > connect? Not sure if we can sleep doing a connect() to a AF_UNIX 
> > > > > socket.
> > > 
> > > I never observed a hangin ospfctl which also does a connect on the control
> > > socket. But I could not find the definitiv answer.
> > > 
> 
> [ snip ]
> 
> > I would prefer if the check happens before the daemon() call since then
> > the rc script notice this easily.
> 
> sure
> 
> > Also between here and sending the socket
> > we spawn off the rde and ospfe processes. So currently you are leaking
> > control_fd into those processes.
> > You could probably just add the fd as argument to rde() and ospfe() and
> > not use the fd passing at all. But the moment ospfd is using fork
> > then the fd passing will be needed again.
> 
> How about the new diff below: I moved the check from control_init into its
> own function control_check and call only this before daemon(). control_init
> happens later. With this the childs do not have the control fd.
> 
> The time frame where another process can start using the socket is a little
> bit bigger this way. We can reduce this again when implementing fork
> for ospfd.
> 
> One could also argue that with control_check as separate function fd passing
> is not strictly needed. But I think this a step towards fork
> 
> The diff should also address the other suggestions.
> 

Looks good to me.
 
> 
> Index: control.c
> ===
> RCS file: /cvs/src/usr.sbin/ospfd/control.c,v
> retrieving revision 1.44
> diff -u -p -r1.44 control.c
> --- control.c 24 Jan 2017 04:24:25 -  1.44
> +++ control.c 28 Aug 2018 09:42:11 -
> @@ -39,6 +39,32 @@ struct ctl_conn*control_connbypid(pid_t
>  void  control_close(int);
>  
>  int
> +control_check(char *path)
> +{
> + struct sockaddr_un   sun;
> + int  fd;
> +
> + bzero(, sizeof(sun));
> + sun.sun_family = AF_UNIX;
> + strlcpy(sun.sun_path, path, sizeof(sun.sun_path));
> +
> + if ((fd = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) {
> + log_warn("control_check: socket check");
> + return (-1);
> + }
> +
> + if (connect(fd, (struct sockaddr *), sizeof(sun)) == 0) {
> + log_warnx("control_check: socket in use");
> + close(fd);
> + return (-1);
> + }
> +
> + close(fd);
> +
> + return (0);
> +}
> +
> +int
>  control_init(char *path)
>  {
>   struct sockaddr_un   sun;
> @@ -78,9 +104,7 @@ control_init(char *path)
>   return (-1);
>   }
>  
> - control_state.fd = fd;
> -
> - return (0);
> + return (fd);
>  }
>  
>  int
> Index: control.h
> ===
> RCS file: /cvs/src/usr.sbin/ospfd/control.h,v
> retrieving revision 1.6
> diff -u -p -r1.6 control.h
> --- control.h 10 Feb 2015 05:24:48 -  1.6
> +++ control.h 28 Aug 2018 09:43:17 -
> @@ -34,6 +34,7 @@ struct ctl_conn {
>   struct imsgev   iev;
>  };
>  
> +int  control_check(char *);
>  int  control_init(char *);
>  int  control_listen(void);
>  void control_accept(int, short, void *);
> Index: ospfd.c
> ===
> RCS file: /cvs/src/usr.sbin/ospfd/ospfd.c,v
> retrieving revision 1.99
> diff -u -p -r1.99 ospfd.c
> --- ospfd.c   11 Jul 2018 12:09:34 -  1.99
> +++ 

Re: ospfd: prevent additional ospfd from starting

2018-08-28 Thread Remi Locherer
On Tue, Aug 28, 2018 at 07:56:43AM +0200, Claudio Jeker wrote:
> On Mon, Aug 27, 2018 at 11:33:19PM +0200, Remi Locherer wrote:
> > On Fri, Aug 24, 2018 at 12:21:31PM +0200, Remi Locherer wrote:
> > > On Fri, Aug 24, 2018 at 08:58:12AM +0200, Claudio Jeker wrote:

[ snip ]

> > > > Why are we not checking the control socket in the parent?
> > > > Also it may be better to create the control socket in the parent and 
> > > > pass
> > > > it to the ospfe. This is what bgpd is doing and allows to change the 
> > > > path
> > > > during runtime with a config reload.
> > > 
> > > This makes sense to me. I'll come up with a new diff once I found some
> > > time for it.
> > > 
> > > But I'm not sure about changing the socket path with a reload. I plan to
> > > pledge (stdio rpath sendfd wroute) and eventually unveil (read ospfd.conf)
> > > the main process.
> > 
> > New diff below creates the control socket in the main process and passes it
> > to the ospf engine later on. The connect check on the control socket now
> > happens very early.
> > 
> > The diff in action looks like this:
> > 
> > typhoon ..sbin/ospfd$ doas obj/ospfd -dv 
> > startup
> > control_init: socket in use
> > fatal in ospfd: control socket setup failed
> > typhoon 1 ..sbin/ospfd$
> > 
> > 
> > I borrowed the fd passing code from slaacd.
> > 
> > > 
> > > > 
> > > > Could there be a case where this causes ospfd to hang on start in the
> > > > connect? Not sure if we can sleep doing a connect() to a AF_UNIX socket.
> > 
> > I never observed a hangin ospfctl which also does a connect on the control
> > socket. But I could not find the definitiv answer.
> > 

[ snip ]

> I would prefer if the check happens before the daemon() call since then
> the rc script notice this easily.

sure

> Also between here and sending the socket
> we spawn off the rde and ospfe processes. So currently you are leaking
> control_fd into those processes.
> You could probably just add the fd as argument to rde() and ospfe() and
> not use the fd passing at all. But the moment ospfd is using fork
> then the fd passing will be needed again.

How about the new diff below: I moved the check from control_init into its
own function control_check and call only this before daemon(). control_init
happens later. With this the childs do not have the control fd.

The time frame where another process can start using the socket is a little
bit bigger this way. We can reduce this again when implementing fork
for ospfd.

One could also argue that with control_check as separate function fd passing
is not strictly needed. But I think this a step towards fork

The diff should also address the other suggestions.



Index: control.c
===
RCS file: /cvs/src/usr.sbin/ospfd/control.c,v
retrieving revision 1.44
diff -u -p -r1.44 control.c
--- control.c   24 Jan 2017 04:24:25 -  1.44
+++ control.c   28 Aug 2018 09:42:11 -
@@ -39,6 +39,32 @@ struct ctl_conn  *control_connbypid(pid_t
 voidcontrol_close(int);
 
 int
+control_check(char *path)
+{
+   struct sockaddr_un   sun;
+   int  fd;
+
+   bzero(, sizeof(sun));
+   sun.sun_family = AF_UNIX;
+   strlcpy(sun.sun_path, path, sizeof(sun.sun_path));
+
+   if ((fd = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) {
+   log_warn("control_check: socket check");
+   return (-1);
+   }
+
+   if (connect(fd, (struct sockaddr *), sizeof(sun)) == 0) {
+   log_warnx("control_check: socket in use");
+   close(fd);
+   return (-1);
+   }
+
+   close(fd);
+
+   return (0);
+}
+
+int
 control_init(char *path)
 {
struct sockaddr_un   sun;
@@ -78,9 +104,7 @@ control_init(char *path)
return (-1);
}
 
-   control_state.fd = fd;
-
-   return (0);
+   return (fd);
 }
 
 int
Index: control.h
===
RCS file: /cvs/src/usr.sbin/ospfd/control.h,v
retrieving revision 1.6
diff -u -p -r1.6 control.h
--- control.h   10 Feb 2015 05:24:48 -  1.6
+++ control.h   28 Aug 2018 09:43:17 -
@@ -34,6 +34,7 @@ struct ctl_conn {
struct imsgev   iev;
 };
 
+intcontrol_check(char *);
 intcontrol_init(char *);
 intcontrol_listen(void);
 void   control_accept(int, short, void *);
Index: ospfd.c
===
RCS file: /cvs/src/usr.sbin/ospfd/ospfd.c,v
retrieving revision 1.99
diff -u -p -r1.99 ospfd.c
--- ospfd.c 11 Jul 2018 12:09:34 -  1.99
+++ ospfd.c 28 Aug 2018 10:34:48 -
@@ -116,6 +116,7 @@ main(int argc, char *argv[])
int  mib[4];
size_t   len;
char*sockname = NULL;
+   int  control_fd;
 
conffile = CONF_FILE;
ospfd_process = PROC_MAIN;
@@ 

Re: ospfd: prevent additional ospfd from starting

2018-08-27 Thread Claudio Jeker
On Mon, Aug 27, 2018 at 11:33:19PM +0200, Remi Locherer wrote:
> On Fri, Aug 24, 2018 at 12:21:31PM +0200, Remi Locherer wrote:
> > On Fri, Aug 24, 2018 at 08:58:12AM +0200, Claudio Jeker wrote:
> > > On Wed, Aug 22, 2018 at 12:12:10AM +0200, Remi Locherer wrote:
> > > > On Tue, Aug 21, 2018 at 05:54:18PM +0100, Stuart Henderson wrote:
> > > > > On 2018/08/21 17:16, Remi Locherer wrote:
> > > > > > Hi tech,
> > > > > > 
> > > > > > recently we had a short outage in our network. A script started an 
> > > > > > additional
> > > > > > ospfd instance because the -n flag for config test was missing.
> > > > > 
> > > > > This is a problem with bgpd as well, last time I did this it killed 
> > > > > one of the
> > > > > *other* routers on the network (i.e. not just the one where I 
> > > > > accidentally ran
> > > > > 2x bgpd...).
> > > > > 
> > > > > > What then happend was not nice:
> > > > > > - The new ospfd unlinked the control socket of the first ospfd
> > > > > > - The new ospfd removed all routes from the first ospfd
> > > > > > - The new ospfd was not able to build up an adjacency and therefore 
> > > > > > could
> > > > > >   not install the routes needed for a recovery.
> > > > > > - Both ospfd instances were running but non-functional.
> > > > > > 
> > > > > > Of course the faulty script is fixed by now. ;-)
> > > > > > 
> > > > > > It would be nice if ospfd could prevent such a situation.
> > > > > > 
> > > > > > Below diff does these things:
> > > > > > - Detect a running ospfd by first doing a connect on the control 
> > > > > > socket.
> > > > > > - Do not delete the control socket on exit.
> > > > > >   - This could delete the socket of another instance.
> > > > > >   - Unlinking the socket on shutdown will be in the way once we add 
> > > > > > pledge
> > > > > > to the main process. It was removed recently from various 
> > > > > > daemons.
> > > > > 
> > > > > This all sounds very sensible.
> > > > > 
> > > > > > - Do not delete routes added by another process even if they have
> > > > > >   prio RTP_OSPF. Without this the new ospfd will remove all the 
> > > > > > routes
> > > > > >   of the first one.
> > > > > 
> > > > > I'm unsure about this, the above changes stop the new ospfd from 
> > > > > running
> > > > > don't they, so that shouldn't be a problem?
> > > > 
> > > > It stops to late. kr_init happens before and kill all existing routes 
> > > > with
> > > > priority 32. And again in the shutdown function of ospfd.
> > > > > 
> > > > > If an ospfd blows up for whatever reason, it would be quite 
> > > > > inconvenient
> > > > > if it needs manual route tweaks rather than just 'rcctl start ospfd' 
> > > > > to fix it ..
> > > > 
> > > > Yes, this is not optimal.
> > > > 
> > > > The new diff below defers kr_init until the ospf engine notifies the 
> > > > parent
> > > > that the control socket is ready. In case the ospf engine exits because 
> > > > the
> > > > control socket is already in use no routes are known that could be 
> > > > removed.
> > > > 
> > > > With this ospfd keeps the behaviour of removing foreign routes with
> > > > priority 32.
> > > > 
> > > > Better?
> > > > 
> > > 
> > > Why are we not checking the control socket in the parent?
> > > Also it may be better to create the control socket in the parent and pass
> > > it to the ospfe. This is what bgpd is doing and allows to change the path
> > > during runtime with a config reload.
> > 
> > This makes sense to me. I'll come up with a new diff once I found some
> > time for it.
> > 
> > But I'm not sure about changing the socket path with a reload. I plan to
> > pledge (stdio rpath sendfd wroute) and eventually unveil (read ospfd.conf)
> > the main process.
> 
> New diff below creates the control socket in the main process and passes it
> to the ospf engine later on. The connect check on the control socket now
> happens very early.
> 
> The diff in action looks like this:
> 
> typhoon ..sbin/ospfd$ doas obj/ospfd -dv 
> startup
> control_init: socket in use
> fatal in ospfd: control socket setup failed
> typhoon 1 ..sbin/ospfd$
> 
> 
> I borrowed the fd passing code from slaacd.
> 
> > 
> > > 
> > > Could there be a case where this causes ospfd to hang on start in the
> > > connect? Not sure if we can sleep doing a connect() to a AF_UNIX socket.
> 
> I never observed a hangin ospfctl which also does a connect on the control
> socket. But I could not find the definitiv answer.
> 
> Remi
> 
> 
> Index: control.c
> ===
> RCS file: /cvs/src/usr.sbin/ospfd/control.c,v
> retrieving revision 1.44
> diff -u -p -r1.44 control.c
> --- control.c 24 Jan 2017 04:24:25 -  1.44
> +++ control.c 27 Aug 2018 21:17:42 -
> @@ -42,19 +42,29 @@ int
>  control_init(char *path)
>  {
>   struct sockaddr_un   sun;
> - int  fd;
> + int  fd, fd_check;
>   mode_t   old_umask;
>  
> + 

Re: ospfd: prevent additional ospfd from starting

2018-08-27 Thread Remi Locherer
On Fri, Aug 24, 2018 at 12:21:31PM +0200, Remi Locherer wrote:
> On Fri, Aug 24, 2018 at 08:58:12AM +0200, Claudio Jeker wrote:
> > On Wed, Aug 22, 2018 at 12:12:10AM +0200, Remi Locherer wrote:
> > > On Tue, Aug 21, 2018 at 05:54:18PM +0100, Stuart Henderson wrote:
> > > > On 2018/08/21 17:16, Remi Locherer wrote:
> > > > > Hi tech,
> > > > > 
> > > > > recently we had a short outage in our network. A script started an 
> > > > > additional
> > > > > ospfd instance because the -n flag for config test was missing.
> > > > 
> > > > This is a problem with bgpd as well, last time I did this it killed one 
> > > > of the
> > > > *other* routers on the network (i.e. not just the one where I 
> > > > accidentally ran
> > > > 2x bgpd...).
> > > > 
> > > > > What then happend was not nice:
> > > > > - The new ospfd unlinked the control socket of the first ospfd
> > > > > - The new ospfd removed all routes from the first ospfd
> > > > > - The new ospfd was not able to build up an adjacency and therefore 
> > > > > could
> > > > >   not install the routes needed for a recovery.
> > > > > - Both ospfd instances were running but non-functional.
> > > > > 
> > > > > Of course the faulty script is fixed by now. ;-)
> > > > > 
> > > > > It would be nice if ospfd could prevent such a situation.
> > > > > 
> > > > > Below diff does these things:
> > > > > - Detect a running ospfd by first doing a connect on the control 
> > > > > socket.
> > > > > - Do not delete the control socket on exit.
> > > > >   - This could delete the socket of another instance.
> > > > >   - Unlinking the socket on shutdown will be in the way once we add 
> > > > > pledge
> > > > > to the main process. It was removed recently from various daemons.
> > > > 
> > > > This all sounds very sensible.
> > > > 
> > > > > - Do not delete routes added by another process even if they have
> > > > >   prio RTP_OSPF. Without this the new ospfd will remove all the routes
> > > > >   of the first one.
> > > > 
> > > > I'm unsure about this, the above changes stop the new ospfd from running
> > > > don't they, so that shouldn't be a problem?
> > > 
> > > It stops to late. kr_init happens before and kill all existing routes with
> > > priority 32. And again in the shutdown function of ospfd.
> > > > 
> > > > If an ospfd blows up for whatever reason, it would be quite inconvenient
> > > > if it needs manual route tweaks rather than just 'rcctl start ospfd' to 
> > > > fix it ..
> > > 
> > > Yes, this is not optimal.
> > > 
> > > The new diff below defers kr_init until the ospf engine notifies the 
> > > parent
> > > that the control socket is ready. In case the ospf engine exits because 
> > > the
> > > control socket is already in use no routes are known that could be 
> > > removed.
> > > 
> > > With this ospfd keeps the behaviour of removing foreign routes with
> > > priority 32.
> > > 
> > > Better?
> > > 
> > 
> > Why are we not checking the control socket in the parent?
> > Also it may be better to create the control socket in the parent and pass
> > it to the ospfe. This is what bgpd is doing and allows to change the path
> > during runtime with a config reload.
> 
> This makes sense to me. I'll come up with a new diff once I found some
> time for it.
> 
> But I'm not sure about changing the socket path with a reload. I plan to
> pledge (stdio rpath sendfd wroute) and eventually unveil (read ospfd.conf)
> the main process.

New diff below creates the control socket in the main process and passes it
to the ospf engine later on. The connect check on the control socket now
happens very early.

The diff in action looks like this:

typhoon ..sbin/ospfd$ doas obj/ospfd -dv 
startup
control_init: socket in use
fatal in ospfd: control socket setup failed
typhoon 1 ..sbin/ospfd$


I borrowed the fd passing code from slaacd.

> 
> > 
> > Could there be a case where this causes ospfd to hang on start in the
> > connect? Not sure if we can sleep doing a connect() to a AF_UNIX socket.

I never observed a hangin ospfctl which also does a connect on the control
socket. But I could not find the definitiv answer.

Remi


Index: control.c
===
RCS file: /cvs/src/usr.sbin/ospfd/control.c,v
retrieving revision 1.44
diff -u -p -r1.44 control.c
--- control.c   24 Jan 2017 04:24:25 -  1.44
+++ control.c   27 Aug 2018 21:17:42 -
@@ -42,19 +42,29 @@ int
 control_init(char *path)
 {
struct sockaddr_un   sun;
-   int  fd;
+   int  fd, fd_check;
mode_t   old_umask;
 
+   bzero(, sizeof(sun));
+   sun.sun_family = AF_UNIX;
+   strlcpy(sun.sun_path, path, sizeof(sun.sun_path));
+
+   if ((fd_check = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) {
+   log_warn("control_init: socket check");
+   return (-1);
+   }
+   if (connect(fd_check, (struct sockaddr *), sizeof(sun)) == 0) {
+  

Re: ospfd: prevent additional ospfd from starting

2018-08-24 Thread Remi Locherer
On Fri, Aug 24, 2018 at 08:58:12AM +0200, Claudio Jeker wrote:
> On Wed, Aug 22, 2018 at 12:12:10AM +0200, Remi Locherer wrote:
> > On Tue, Aug 21, 2018 at 05:54:18PM +0100, Stuart Henderson wrote:
> > > On 2018/08/21 17:16, Remi Locherer wrote:
> > > > Hi tech,
> > > > 
> > > > recently we had a short outage in our network. A script started an 
> > > > additional
> > > > ospfd instance because the -n flag for config test was missing.
> > > 
> > > This is a problem with bgpd as well, last time I did this it killed one 
> > > of the
> > > *other* routers on the network (i.e. not just the one where I 
> > > accidentally ran
> > > 2x bgpd...).
> > > 
> > > > What then happend was not nice:
> > > > - The new ospfd unlinked the control socket of the first ospfd
> > > > - The new ospfd removed all routes from the first ospfd
> > > > - The new ospfd was not able to build up an adjacency and therefore 
> > > > could
> > > >   not install the routes needed for a recovery.
> > > > - Both ospfd instances were running but non-functional.
> > > > 
> > > > Of course the faulty script is fixed by now. ;-)
> > > > 
> > > > It would be nice if ospfd could prevent such a situation.
> > > > 
> > > > Below diff does these things:
> > > > - Detect a running ospfd by first doing a connect on the control socket.
> > > > - Do not delete the control socket on exit.
> > > >   - This could delete the socket of another instance.
> > > >   - Unlinking the socket on shutdown will be in the way once we add 
> > > > pledge
> > > > to the main process. It was removed recently from various daemons.
> > > 
> > > This all sounds very sensible.
> > > 
> > > > - Do not delete routes added by another process even if they have
> > > >   prio RTP_OSPF. Without this the new ospfd will remove all the routes
> > > >   of the first one.
> > > 
> > > I'm unsure about this, the above changes stop the new ospfd from running
> > > don't they, so that shouldn't be a problem?
> > 
> > It stops to late. kr_init happens before and kill all existing routes with
> > priority 32. And again in the shutdown function of ospfd.
> > > 
> > > If an ospfd blows up for whatever reason, it would be quite inconvenient
> > > if it needs manual route tweaks rather than just 'rcctl start ospfd' to 
> > > fix it ..
> > 
> > Yes, this is not optimal.
> > 
> > The new diff below defers kr_init until the ospf engine notifies the parent
> > that the control socket is ready. In case the ospf engine exits because the
> > control socket is already in use no routes are known that could be removed.
> > 
> > With this ospfd keeps the behaviour of removing foreign routes with
> > priority 32.
> > 
> > Better?
> > 
> 
> Why are we not checking the control socket in the parent?
> Also it may be better to create the control socket in the parent and pass
> it to the ospfe. This is what bgpd is doing and allows to change the path
> during runtime with a config reload.

This makes sense to me. I'll come up with a new diff once I found some
time for it.

But I'm not sure about changing the socket path with a reload. I plan to
pledge (stdio rpath sendfd wroute) and eventually unveil (read ospfd.conf)
the main process.

> 
> Could there be a case where this causes ospfd to hang on start in the
> connect? Not sure if we can sleep doing a connect() to a AF_UNIX socket.



Re: ospfd: prevent additional ospfd from starting

2018-08-24 Thread Claudio Jeker
On Wed, Aug 22, 2018 at 12:12:10AM +0200, Remi Locherer wrote:
> On Tue, Aug 21, 2018 at 05:54:18PM +0100, Stuart Henderson wrote:
> > On 2018/08/21 17:16, Remi Locherer wrote:
> > > Hi tech,
> > > 
> > > recently we had a short outage in our network. A script started an 
> > > additional
> > > ospfd instance because the -n flag for config test was missing.
> > 
> > This is a problem with bgpd as well, last time I did this it killed one of 
> > the
> > *other* routers on the network (i.e. not just the one where I accidentally 
> > ran
> > 2x bgpd...).
> > 
> > > What then happend was not nice:
> > > - The new ospfd unlinked the control socket of the first ospfd
> > > - The new ospfd removed all routes from the first ospfd
> > > - The new ospfd was not able to build up an adjacency and therefore could
> > >   not install the routes needed for a recovery.
> > > - Both ospfd instances were running but non-functional.
> > > 
> > > Of course the faulty script is fixed by now. ;-)
> > > 
> > > It would be nice if ospfd could prevent such a situation.
> > > 
> > > Below diff does these things:
> > > - Detect a running ospfd by first doing a connect on the control socket.
> > > - Do not delete the control socket on exit.
> > >   - This could delete the socket of another instance.
> > >   - Unlinking the socket on shutdown will be in the way once we add pledge
> > > to the main process. It was removed recently from various daemons.
> > 
> > This all sounds very sensible.
> > 
> > > - Do not delete routes added by another process even if they have
> > >   prio RTP_OSPF. Without this the new ospfd will remove all the routes
> > >   of the first one.
> > 
> > I'm unsure about this, the above changes stop the new ospfd from running
> > don't they, so that shouldn't be a problem?
> 
> It stops to late. kr_init happens before and kill all existing routes with
> priority 32. And again in the shutdown function of ospfd.
> > 
> > If an ospfd blows up for whatever reason, it would be quite inconvenient
> > if it needs manual route tweaks rather than just 'rcctl start ospfd' to fix 
> > it ..
> 
> Yes, this is not optimal.
> 
> The new diff below defers kr_init until the ospf engine notifies the parent
> that the control socket is ready. In case the ospf engine exits because the
> control socket is already in use no routes are known that could be removed.
> 
> With this ospfd keeps the behaviour of removing foreign routes with
> priority 32.
> 
> Better?
> 

Why are we not checking the control socket in the parent?
Also it may be better to create the control socket in the parent and pass
it to the ospfe. This is what bgpd is doing and allows to change the path
during runtime with a config reload.

Could there be a case where this causes ospfd to hang on start in the
connect? Not sure if we can sleep doing a connect() to a AF_UNIX socket.

> Index: control.c
> ===
> RCS file: /cvs/src/usr.sbin/ospfd/control.c,v
> retrieving revision 1.44
> diff -u -p -r1.44 control.c
> --- control.c 24 Jan 2017 04:24:25 -  1.44
> +++ control.c 17 Aug 2018 22:41:43 -
> @@ -42,19 +42,29 @@ int
>  control_init(char *path)
>  {
>   struct sockaddr_un   sun;
> - int  fd;
> + int  fd, fd_check;
>   mode_t   old_umask;
>  
> + bzero(, sizeof(sun));
> + sun.sun_family = AF_UNIX;
> + strlcpy(sun.sun_path, path, sizeof(sun.sun_path));
> +
> + if ((fd_check = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) {
> + log_warn("control_init: socket check");
> + return (-1);
> + }
> + if (connect(fd_check, (struct sockaddr *), sizeof(sun)) == 0) {
> + log_warnx("control_init: socket in use");
> + return (-1);
> + }
> + close(fd_check);
> +
>   if ((fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC | SOCK_NONBLOCK,
>   0)) == -1) {
>   log_warn("control_init: socket");
>   return (-1);
>   }
>  
> - bzero(, sizeof(sun));
> - sun.sun_family = AF_UNIX;
> - strlcpy(sun.sun_path, path, sizeof(sun.sun_path));
> -
>   if (unlink(path) == -1)
>   if (errno != ENOENT) {
>   log_warn("control_init: unlink %s", path);
> @@ -98,16 +108,6 @@ control_listen(void)
>   evtimer_set(_state.evt, control_accept, NULL);
>  
>   return (0);
> -}
> -
> -void
> -control_cleanup(char *path)
> -{
> - if (path == NULL)
> - return;
> - event_del(_state.ev);
> - event_del(_state.evt);
> - unlink(path);
>  }
>  
>  /* ARGSUSED */
> Index: control.h
> ===
> RCS file: /cvs/src/usr.sbin/ospfd/control.h,v
> retrieving revision 1.6
> diff -u -p -r1.6 control.h
> --- control.h 10 Feb 2015 05:24:48 -  1.6
> +++ control.h 17 Aug 2018 21:02:36 -
> @@ -39,6 +39,5 @@ int 

Re: ospfd: prevent additional ospfd from starting

2018-08-21 Thread Remi Locherer
On Tue, Aug 21, 2018 at 05:54:18PM +0100, Stuart Henderson wrote:
> On 2018/08/21 17:16, Remi Locherer wrote:
> > Hi tech,
> > 
> > recently we had a short outage in our network. A script started an 
> > additional
> > ospfd instance because the -n flag for config test was missing.
> 
> This is a problem with bgpd as well, last time I did this it killed one of the
> *other* routers on the network (i.e. not just the one where I accidentally ran
> 2x bgpd...).
> 
> > What then happend was not nice:
> > - The new ospfd unlinked the control socket of the first ospfd
> > - The new ospfd removed all routes from the first ospfd
> > - The new ospfd was not able to build up an adjacency and therefore could
> >   not install the routes needed for a recovery.
> > - Both ospfd instances were running but non-functional.
> > 
> > Of course the faulty script is fixed by now. ;-)
> > 
> > It would be nice if ospfd could prevent such a situation.
> > 
> > Below diff does these things:
> > - Detect a running ospfd by first doing a connect on the control socket.
> > - Do not delete the control socket on exit.
> >   - This could delete the socket of another instance.
> >   - Unlinking the socket on shutdown will be in the way once we add pledge
> > to the main process. It was removed recently from various daemons.
> 
> This all sounds very sensible.
> 
> > - Do not delete routes added by another process even if they have
> >   prio RTP_OSPF. Without this the new ospfd will remove all the routes
> >   of the first one.
> 
> I'm unsure about this, the above changes stop the new ospfd from running
> don't they, so that shouldn't be a problem?

It stops to late. kr_init happens before and kill all existing routes with
priority 32. And again in the shutdown function of ospfd.
> 
> If an ospfd blows up for whatever reason, it would be quite inconvenient
> if it needs manual route tweaks rather than just 'rcctl start ospfd' to fix 
> it ..

Yes, this is not optimal.

The new diff below defers kr_init until the ospf engine notifies the parent
that the control socket is ready. In case the ospf engine exits because the
control socket is already in use no routes are known that could be removed.

With this ospfd keeps the behaviour of removing foreign routes with
priority 32.

Better?


Index: control.c
===
RCS file: /cvs/src/usr.sbin/ospfd/control.c,v
retrieving revision 1.44
diff -u -p -r1.44 control.c
--- control.c   24 Jan 2017 04:24:25 -  1.44
+++ control.c   17 Aug 2018 22:41:43 -
@@ -42,19 +42,29 @@ int
 control_init(char *path)
 {
struct sockaddr_un   sun;
-   int  fd;
+   int  fd, fd_check;
mode_t   old_umask;
 
+   bzero(, sizeof(sun));
+   sun.sun_family = AF_UNIX;
+   strlcpy(sun.sun_path, path, sizeof(sun.sun_path));
+
+   if ((fd_check = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) {
+   log_warn("control_init: socket check");
+   return (-1);
+   }
+   if (connect(fd_check, (struct sockaddr *), sizeof(sun)) == 0) {
+   log_warnx("control_init: socket in use");
+   return (-1);
+   }
+   close(fd_check);
+
if ((fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC | SOCK_NONBLOCK,
0)) == -1) {
log_warn("control_init: socket");
return (-1);
}
 
-   bzero(, sizeof(sun));
-   sun.sun_family = AF_UNIX;
-   strlcpy(sun.sun_path, path, sizeof(sun.sun_path));
-
if (unlink(path) == -1)
if (errno != ENOENT) {
log_warn("control_init: unlink %s", path);
@@ -98,16 +108,6 @@ control_listen(void)
evtimer_set(_state.evt, control_accept, NULL);
 
return (0);
-}
-
-void
-control_cleanup(char *path)
-{
-   if (path == NULL)
-   return;
-   event_del(_state.ev);
-   event_del(_state.evt);
-   unlink(path);
 }
 
 /* ARGSUSED */
Index: control.h
===
RCS file: /cvs/src/usr.sbin/ospfd/control.h,v
retrieving revision 1.6
diff -u -p -r1.6 control.h
--- control.h   10 Feb 2015 05:24:48 -  1.6
+++ control.h   17 Aug 2018 21:02:36 -
@@ -39,6 +39,5 @@ int   control_listen(void);
 void   control_accept(int, short, void *);
 void   control_dispatch_imsg(int, short, void *);
 intcontrol_imsg_relay(struct imsg *);
-void   control_cleanup(char *);
 
 #endif /* _CONTROL_H_ */
Index: ospfd.c
===
RCS file: /cvs/src/usr.sbin/ospfd/ospfd.c,v
retrieving revision 1.99
diff -u -p -r1.99 ospfd.c
--- ospfd.c 11 Jul 2018 12:09:34 -  1.99
+++ ospfd.c 21 Aug 2018 21:39:23 -
@@ -270,10 +270,6 @@ main(int argc, char *argv[])
iev_rde->handler, iev_rde);
event_add(_rde->ev, NULL);
 
-   if 

Re: ospfd: prevent additional ospfd from starting

2018-08-21 Thread Denis Fondras
On Tue, Aug 21, 2018 at 05:16:47PM +0200, Remi Locherer wrote:
> Hi tech,
> 
> recently we had a short outage in our network. A script started an additional
> ospfd instance because the -n flag for config test was missing.
> 
> What then happend was not nice:
> - The new ospfd unlinked the control socket of the first ospfd
> - The new ospfd removed all routes from the first ospfd
> - The new ospfd was not able to build up an adjacency and therefore could
>   not install the routes needed for a recovery.
> - Both ospfd instances were running but non-functional.
> 
> Of course the faulty script is fixed by now. ;-)
> 
> It would be nice if ospfd could prevent such a situation.
> 
> Below diff does these things:
> - Detect a running ospfd by first doing a connect on the control socket.
> - Do not delete the control socket on exit.
>   - This could delete the socket of another instance.
>   - Unlinking the socket on shutdown will be in the way once we add pledge
> to the main process. It was removed recently from various daemons.
> - Do not delete routes added by another process even if they have
>   prio RTP_OSPF. Without this the new ospfd will remove all the routes
>   of the first one.
> 
> A side effect of this is that alien OSPF routes are now only logged but
> not removed anymore. Should a crashed ospfd leave some routes behind the
> next ospfd does not clean them up anymore. The admin would need to check
> the logs and remove them manually with the route command.
> 
> Does this make sense?
> 

Manually removing routes does not :)



Re: ospfd: prevent additional ospfd from starting

2018-08-21 Thread Stuart Henderson
On 2018/08/21 17:16, Remi Locherer wrote:
> Hi tech,
> 
> recently we had a short outage in our network. A script started an additional
> ospfd instance because the -n flag for config test was missing.

This is a problem with bgpd as well, last time I did this it killed one of the
*other* routers on the network (i.e. not just the one where I accidentally ran
2x bgpd...).

> What then happend was not nice:
> - The new ospfd unlinked the control socket of the first ospfd
> - The new ospfd removed all routes from the first ospfd
> - The new ospfd was not able to build up an adjacency and therefore could
>   not install the routes needed for a recovery.
> - Both ospfd instances were running but non-functional.
> 
> Of course the faulty script is fixed by now. ;-)
> 
> It would be nice if ospfd could prevent such a situation.
> 
> Below diff does these things:
> - Detect a running ospfd by first doing a connect on the control socket.
> - Do not delete the control socket on exit.
>   - This could delete the socket of another instance.
>   - Unlinking the socket on shutdown will be in the way once we add pledge
> to the main process. It was removed recently from various daemons.

This all sounds very sensible.

> - Do not delete routes added by another process even if they have
>   prio RTP_OSPF. Without this the new ospfd will remove all the routes
>   of the first one.

I'm unsure about this, the above changes stop the new ospfd from running
don't they, so that shouldn't be a problem?

If an ospfd blows up for whatever reason, it would be quite inconvenient
if it needs manual route tweaks rather than just 'rcctl start ospfd' to fix it 
..

> A side effect of this is that alien OSPF routes are now only logged but
> not removed anymore. Should a crashed ospfd leave some routes behind the
> next ospfd does not clean them up anymore. The admin would need to check
> the logs and remove them manually with the route command.
> 
> Does this make sense?
> 
> Comments? OK?
> 
> Remi
> 
> 
> Index: control.c
> ===
> RCS file: /cvs/src/usr.sbin/ospfd/control.c,v
> retrieving revision 1.44
> diff -u -p -r1.44 control.c
> --- control.c 24 Jan 2017 04:24:25 -  1.44
> +++ control.c 17 Aug 2018 22:41:43 -
> @@ -42,19 +42,29 @@ int
>  control_init(char *path)
>  {
>   struct sockaddr_un   sun;
> - int  fd;
> + int  fd, fd_check;
>   mode_t   old_umask;
>  
> + bzero(, sizeof(sun));
> + sun.sun_family = AF_UNIX;
> + strlcpy(sun.sun_path, path, sizeof(sun.sun_path));
> +
> + if ((fd_check = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) {
> + log_warn("control_init: socket check");
> + return (-1);
> + }
> + if (connect(fd_check, (struct sockaddr *), sizeof(sun)) == 0) {
> + log_warnx("control_init: socket in use");
> + return (-1);
> + }
> + close(fd_check);
> +
>   if ((fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC | SOCK_NONBLOCK,
>   0)) == -1) {
>   log_warn("control_init: socket");
>   return (-1);
>   }
>  
> - bzero(, sizeof(sun));
> - sun.sun_family = AF_UNIX;
> - strlcpy(sun.sun_path, path, sizeof(sun.sun_path));
> -
>   if (unlink(path) == -1)
>   if (errno != ENOENT) {
>   log_warn("control_init: unlink %s", path);
> @@ -98,16 +108,6 @@ control_listen(void)
>   evtimer_set(_state.evt, control_accept, NULL);
>  
>   return (0);
> -}
> -
> -void
> -control_cleanup(char *path)
> -{
> - if (path == NULL)
> - return;
> - event_del(_state.ev);
> - event_del(_state.evt);
> - unlink(path);
>  }
>  
>  /* ARGSUSED */
> Index: control.h
> ===
> RCS file: /cvs/src/usr.sbin/ospfd/control.h,v
> retrieving revision 1.6
> diff -u -p -r1.6 control.h
> --- control.h 10 Feb 2015 05:24:48 -  1.6
> +++ control.h 17 Aug 2018 21:02:36 -
> @@ -39,6 +39,5 @@ int control_listen(void);
>  void control_accept(int, short, void *);
>  void control_dispatch_imsg(int, short, void *);
>  int  control_imsg_relay(struct imsg *);
> -void control_cleanup(char *);
>  
>  #endif   /* _CONTROL_H_ */
> Index: kroute.c
> ===
> RCS file: /cvs/src/usr.sbin/ospfd/kroute.c,v
> retrieving revision 1.111
> diff -u -p -r1.111 kroute.c
> --- kroute.c  10 Jul 2018 11:49:04 -  1.111
> +++ kroute.c  21 Aug 2018 14:13:27 -
> @@ -263,6 +263,7 @@ kr_change_fib(struct kroute_node *kr, st
>   kn->r.nexthop.s_addr = kroute[i].nexthop.s_addr;
>   kn->r.flags = kroute[i].flags | F_OSPFD_INSERTED;
>   kn->r.priority = RTP_OSPF;
> + kn->r.pid = kr_state.pid;
>   kn->r.ext_tag = kroute[i].ext_tag;
>   

ospfd: prevent additional ospfd from starting

2018-08-21 Thread Remi Locherer
Hi tech,

recently we had a short outage in our network. A script started an additional
ospfd instance because the -n flag for config test was missing.

What then happend was not nice:
- The new ospfd unlinked the control socket of the first ospfd
- The new ospfd removed all routes from the first ospfd
- The new ospfd was not able to build up an adjacency and therefore could
  not install the routes needed for a recovery.
- Both ospfd instances were running but non-functional.

Of course the faulty script is fixed by now. ;-)

It would be nice if ospfd could prevent such a situation.

Below diff does these things:
- Detect a running ospfd by first doing a connect on the control socket.
- Do not delete the control socket on exit.
  - This could delete the socket of another instance.
  - Unlinking the socket on shutdown will be in the way once we add pledge
to the main process. It was removed recently from various daemons.
- Do not delete routes added by another process even if they have
  prio RTP_OSPF. Without this the new ospfd will remove all the routes
  of the first one.

A side effect of this is that alien OSPF routes are now only logged but
not removed anymore. Should a crashed ospfd leave some routes behind the
next ospfd does not clean them up anymore. The admin would need to check
the logs and remove them manually with the route command.

Does this make sense?

Comments? OK?

Remi


Index: control.c
===
RCS file: /cvs/src/usr.sbin/ospfd/control.c,v
retrieving revision 1.44
diff -u -p -r1.44 control.c
--- control.c   24 Jan 2017 04:24:25 -  1.44
+++ control.c   17 Aug 2018 22:41:43 -
@@ -42,19 +42,29 @@ int
 control_init(char *path)
 {
struct sockaddr_un   sun;
-   int  fd;
+   int  fd, fd_check;
mode_t   old_umask;
 
+   bzero(, sizeof(sun));
+   sun.sun_family = AF_UNIX;
+   strlcpy(sun.sun_path, path, sizeof(sun.sun_path));
+
+   if ((fd_check = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) {
+   log_warn("control_init: socket check");
+   return (-1);
+   }
+   if (connect(fd_check, (struct sockaddr *), sizeof(sun)) == 0) {
+   log_warnx("control_init: socket in use");
+   return (-1);
+   }
+   close(fd_check);
+
if ((fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC | SOCK_NONBLOCK,
0)) == -1) {
log_warn("control_init: socket");
return (-1);
}
 
-   bzero(, sizeof(sun));
-   sun.sun_family = AF_UNIX;
-   strlcpy(sun.sun_path, path, sizeof(sun.sun_path));
-
if (unlink(path) == -1)
if (errno != ENOENT) {
log_warn("control_init: unlink %s", path);
@@ -98,16 +108,6 @@ control_listen(void)
evtimer_set(_state.evt, control_accept, NULL);
 
return (0);
-}
-
-void
-control_cleanup(char *path)
-{
-   if (path == NULL)
-   return;
-   event_del(_state.ev);
-   event_del(_state.evt);
-   unlink(path);
 }
 
 /* ARGSUSED */
Index: control.h
===
RCS file: /cvs/src/usr.sbin/ospfd/control.h,v
retrieving revision 1.6
diff -u -p -r1.6 control.h
--- control.h   10 Feb 2015 05:24:48 -  1.6
+++ control.h   17 Aug 2018 21:02:36 -
@@ -39,6 +39,5 @@ int   control_listen(void);
 void   control_accept(int, short, void *);
 void   control_dispatch_imsg(int, short, void *);
 intcontrol_imsg_relay(struct imsg *);
-void   control_cleanup(char *);
 
 #endif /* _CONTROL_H_ */
Index: kroute.c
===
RCS file: /cvs/src/usr.sbin/ospfd/kroute.c,v
retrieving revision 1.111
diff -u -p -r1.111 kroute.c
--- kroute.c10 Jul 2018 11:49:04 -  1.111
+++ kroute.c21 Aug 2018 14:13:27 -
@@ -263,6 +263,7 @@ kr_change_fib(struct kroute_node *kr, st
kn->r.nexthop.s_addr = kroute[i].nexthop.s_addr;
kn->r.flags = kroute[i].flags | F_OSPFD_INSERTED;
kn->r.priority = RTP_OSPF;
+   kn->r.pid = kr_state.pid;
kn->r.ext_tag = kroute[i].ext_tag;
rtlabel_unref(kn->r.rtlabel);   /* for RTM_CHANGE */
kn->r.rtlabel = kroute[i].rtlabel;
@@ -365,7 +366,7 @@ kr_fib_decouple(void)
return;
 
RB_FOREACH(kr, kroute_tree, )
-   if (kr->r.priority == RTP_OSPF)
+   if (kr->r.priority == RTP_OSPF && kr->r.pid == kr_state.pid)
for (kn = kr; kn != NULL; kn = kn->next)
send_rtmsg(kr_state.fd, RTM_DELETE, >r);
 
@@ -1365,7 +1366,7 @@ rtmsg_process(char *buf, size_t len)
u_int8_t prefixlen, prio;
int  flags, mpath;
u_short  ifindex = 0;
-